Re: Latest Release of Receiver based Kafka Consumer for Spark Streaming.
Hi Dibyendu, Looks like it is available in 2.0, we are using older version of spark 1.5 . Could you please let me know how to use this with older versions. Thanks, Asmath Sent from my iPhone > On Aug 25, 2016, at 6:33 AM, Dibyendu Bhattacharya >wrote: > > Hi , > > Released latest version of Receiver based Kafka Consumer for Spark Streaming. > > Receiver is compatible with Kafka versions 0.8.x, 0.9.x and 0.10.x and All > Spark Versions > > Available at Spark Packages : > https://spark-packages.org/package/dibbhatt/kafka-spark-consumer > > Also at github : https://github.com/dibbhatt/kafka-spark-consumer > > Salient Features : > > End to End No Data Loss without Write Ahead Log > ZK Based offset management for both consumed and processed offset > No dependency on WAL and Checkpoint > In-built PID Controller for Rate Limiting and Backpressure management > Custom Message Interceptor > Please refer to > https://github.com/dibbhatt/kafka-spark-consumer/blob/master/README.md for > more details > > Regards, > Dibyendu >
suggestion needed on FileInput Path- Spark Streaming
what is best practice while processing files from s3 bucket in spark file streaming ?? Like I keep on getting files in s3 path, have to process those in batch but while processing some other files might come up. In this steaming job, should I have to move files after end of our streaming batch to other location or is there any other way to do it? Let's say, batch interval is 15 minutes, and current batch takes more than 15 minutes.. batch gets started irrespective of the other batch being processed? Is there a way that I can control to hold on current batch if other batch is under processing ?? Thanks? Asmath - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: [ASK]:Dataframe number of column limit in Saprk 1.5.2
I am also looking for same information . In my case I need to create 190 columns.. Sent from my iPhone > On Apr 12, 2016, at 9:49 PM, Divya Gehlotwrote: > > Hi, > I would like to know does Spark Dataframe API has limit on creation of > number of columns? > > Thanks, > Divya - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Data frames or Spark sql for partitioned tables
Hi, I am new to spark and trying to implement the solution without using hive. We are migrating to new environment where hive is not present intead I need to use spark to output files. I look at case class and maximum number of columns I can use is 22 but I have 180 columns . In this scenario what is best approach to use spark sql or data frame without hive. Thanks, Azmath Sent from my iPhone - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Fwd: Question on how to access tuple values in spark
> Hi, > > My req is to find max value of revenue per customer so I am using below > query. I got this solution from one of tutorial in google but not able to > understand how it returns max in this scenario. can anyone hep > > revenuePerDayPerCustomerMap.reduceByKey((x, y) => (if(x._2 >= y._2) x else y)) > > ((2013-12-27 00:00:00.0),(62962,199.98)) > ((2013-12-27 00:00:00.0),(62962),299.98)) > > > why doesn't the below statement work to get max? > > x._1>=y._1 ? btw, what is value of x._1,x._2,y._1,y._2 in this scenario. > > Thanks and waiting for your responses. > > Regards, > Asmath - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Question on how to access tuple values in spark
Sent from my iPhone > On Feb 6, 2016, at 4:41 PM, KhajaAsmath Mohammed> wrote: > > Hi, > > My req is to find max value of revenue per customer so I am using below > query. I got this solution from one of tutorial in google but not able to > understand how it returns max in this scenario. can anyone hep > > revenuePerDayPerCustomerMap.reduceByKey((x, y) => (if(x._2 >= y._2) x else y)) > > ((2013-12-27 00:00:00.0),(62962,199.98)) > ((2013-12-27 00:00:00.0),(62962),299.98)) > > > why doesn't the below statement work to get max? > > x._1>=y._1 ? btw, what is value of x._1,x._2,y._1,y._2 in this scenario. > > Thanks and waiting for your responses. > > Regards, > Asmath - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org