Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-07 Thread Sameer Sayyed
hello, Code: ZkState zkState = new ZkState(kafkaConfig); DynamicBrokersReader kafkaBrokerReader = new DynamicBrokersReader(kafkaConfig, zkState); int partionCount = kafkaBrokerReader.getNumPartitions(); SparkConf _sparkConf = new SparkConf().setAppName("KafkaReceiver"); final JavaStreamingContex

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-06 Thread Tathagata Das
You can use SparkSQL for that very easily. You can convert the rdds you get from kafka input stream, convert them to a RDDs of case classes and save as parquet files. More information here. https://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files On Wed, Aug 6, 2014 at 5:23 A

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-06 Thread Mahebub Sayyed
Hello, I have referred link "https://github.com/dibbhatt/kafka-spark-consumer"; and I have successfully consumed tuples from kafka. Tuples are JSON objects and I want to store that objects in HDFS as parque format. Please suggest me any sample example for that. Thanks in advance. On Tue, Aug

RE: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Shao, Saisai
: DStream.foreach { rdd => rdd.saveAsHadoopFile(…) } to specify the OutputFormat you want. Thanks Jerry From: rafeeq s [mailto:rafeeq.ec...@gmail.com] Sent: Tuesday, August 05, 2014 5:37 PM To: Dibyendu Bhattacharya Cc: u...@spark.incubator.apache.org Subject: Re: Spark stream data from kafka topics

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
Thanks Dibyendu. 1. Spark itself have api jar for kafka, still we require manual offset management (using simple consumer concept) and manual consumer ? 2.Kafka Spark Consumer which is implemented in kafka 0.8.0 ,Can we use it for kafka 0.8.1 ? 3.How to use Kafka Spark Consumer to produce output

Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Dibyendu Bhattacharya
You can try this Kafka Spark Consumer which I recently wrote. This uses the Low Level Kafka Consumer https://github.com/dibbhatt/kafka-spark-consumer Dibyendu On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s wrote: > Hi, > > I am new to Apache Spark and Trying to Develop spark streaming program to