Hi Rafeeq,
I think current Spark Streaming api can offer you the ability to fetch data
from Kafka and store to another external store, if you do not care about
management of consumer offset manually, there’s no need to use low level api as
SimpleConsumer.
For Kafka 0.8.1 compatibility, you can try to modify the pom file and rebuild
Spark to try it, mostly I think it can work.
For parquet file, I think if parquet offers its own OutputFormat that is
extended from Hadoop’s OutputFormat, Spark can write data into parquet file,
like sequence file or text file, you can do this as:
DStream.foreach { rdd => rdd.saveAsHadoopFile(…) } to specify the OutputFormat
you want.
Thanks
Jerry
From: rafeeq s [mailto:[email protected]]
Sent: Tuesday, August 05, 2014 5:37 PM
To: Dibyendu Bhattacharya
Cc: [email protected]
Subject: Re: Spark stream data from kafka topics and output as parquet file on
HDFS
Thanks Dibyendu.
1. Spark itself have api jar for kafka, still we require manual offset
management (using simple consumer concept) and manual consumer ?
2.Kafka Spark Consumer which is implemented in kafka 0.8.0 ,Can we use it for
kafka 0.8.1 ?
3.How to use Kafka Spark Consumer to produce output as parquet file on HDFS ?
Please give your suggestion.
Regards,
Rafeeq S
(“What you do is what matters, not what you think or say or plan.” )
On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya
<[email protected]<mailto:[email protected]>> wrote:
You can try this Kafka Spark Consumer which I recently wrote. This uses the Low
Level Kafka Consumer
https://github.com/dibbhatt/kafka-spark-consumer
Dibyendu
On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I am new to Apache Spark and Trying to Develop spark streaming program to
stream data from kafka topics and output as parquet file on HDFS.
Please share the sample reference program to stream data from kafka topics and
output as parquet file on HDFS.
Thanks in Advance.
Regards,
Rafeeq S
(“What you do is what matters, not what you think or say or plan.” )