Re: Latest Release of Receiver based Kafka Consumer for Spark Streaming.

2016-08-25 Thread mdkhajaasmath
Hi Dibyendu,

Looks like it is available in 2.0, we are using older version of spark 1.5 . 
Could you please let me know how to use this with older versions.

Thanks,
Asmath

Sent from my iPhone

> On Aug 25, 2016, at 6:33 AM, Dibyendu Bhattacharya 
>  wrote:
> 
> Hi , 
> 
> Released latest version of Receiver based Kafka Consumer for Spark Streaming.
> 
> Receiver is compatible with Kafka versions 0.8.x, 0.9.x and 0.10.x and All 
> Spark Versions
> 
> Available at Spark Packages : 
> https://spark-packages.org/package/dibbhatt/kafka-spark-consumer
> 
> Also at github  : https://github.com/dibbhatt/kafka-spark-consumer
> 
> Salient Features :
> 
> End to End No Data Loss without Write Ahead Log
> ZK Based offset management for both consumed and processed offset
> No dependency on WAL and Checkpoint
> In-built PID Controller for Rate Limiting and Backpressure management
> Custom Message Interceptor
> Please refer to 
> https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md for 
> more details
> 
> Regards, 
> Dibyendu
> 


suggestion needed on FileInput Path- Spark Streaming

2016-08-10 Thread mdkhajaasmath

what is best practice while processing files from s3 bucket in spark file 
streaming ?? Like I keep on getting files in s3 path, have to process those in 
batch but while processing some other files might come up. In this steaming 
job, should I have to move files after end of our streaming batch to other 
location or is there any other way to do it?

Let's say, batch interval is 15 minutes, and current batch takes more than 15 
minutes.. batch gets started irrespective of the other batch being processed? 
Is there a way that I can control to hold on current batch if other batch is 
under processing ??

Thanks?
Asmath
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: [ASK]:Dataframe number of column limit in Saprk 1.5.2

2016-04-12 Thread mdkhajaasmath
I am also looking for same information . In my case I need to create 190 
columns.. 

Sent from my iPhone

> On Apr 12, 2016, at 9:49 PM, Divya Gehlot  wrote:
> 
> Hi,
> I would like to know does Spark Dataframe API has limit  on creation of 
> number of columns?
> 
> Thanks,
> Divya 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Data frames or Spark sql for partitioned tables

2016-04-06 Thread mdkhajaasmath
Hi,

I am new to spark and trying to implement the solution without using hive. We 
are migrating to new environment where hive is not present intead I need to use 
spark to output files.

I look at case class and maximum number of columns I can use is 22 but I have 
180 columns . In this scenario what is best approach to use spark sql or data 
frame without hive. 

Thanks,
Azmath 

Sent from my iPhone
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Fwd: Question on how to access tuple values in spark

2016-02-06 Thread mdkhajaasmath

> Hi,
> 
> My req is to find max value of revenue per customer so I am using below 
> query. I got this solution from one of tutorial in google but not able to 
> understand how it returns max in this scenario. can anyone hep
> 
> revenuePerDayPerCustomerMap.reduceByKey((x, y) => (if(x._2 >= y._2) x else y))
> 
>  ((2013-12-27 00:00:00.0),(62962,199.98))
>  ((2013-12-27 00:00:00.0),(62962),299.98))
> 
> 
> why doesn't the below statement work to get max?
> 
> x._1>=y._1 ? btw, what is value of x._1,x._2,y._1,y._2 in this scenario. 
> 
> Thanks and waiting for your responses.
> 
> Regards,
> Asmath

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Question on how to access tuple values in spark

2016-02-06 Thread mdkhajaasmath


Sent from my iPhone

> On Feb 6, 2016, at 4:41 PM, KhajaAsmath Mohammed  
> wrote:
> 
> Hi,
> 
> My req is to find max value of revenue per customer so I am using below 
> query. I got this solution from one of tutorial in google but not able to 
> understand how it returns max in this scenario. can anyone hep
> 
> revenuePerDayPerCustomerMap.reduceByKey((x, y) => (if(x._2 >= y._2) x else y))
> 
>  ((2013-12-27 00:00:00.0),(62962,199.98))
>  ((2013-12-27 00:00:00.0),(62962),299.98))
> 
> 
> why doesn't the below statement work to get max?
> 
> x._1>=y._1 ? btw, what is value of x._1,x._2,y._1,y._2 in this scenario. 
> 
> Thanks and waiting for your responses.
> 
> Regards,
> Asmath

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org