Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-06 Thread Mahebub Sayyed
Hello,

I have referred link https://github.com/dibbhatt/kafka-spark-consumer; and
I have successfully consumed tuples from kafka.
Tuples are JSON objects and I want to store that objects in HDFS as parque
format.

Please suggest me any sample example for that.
Thanks in advance.





On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.com wrote:

 You can try this Kafka Spark Consumer which I recently wrote. This uses
 the Low Level Kafka Consumer

 https://github.com/dibbhatt/kafka-spark-consumer

 Dibyendu




 On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote:

 Hi,

 I am new to Apache Spark and Trying to Develop spark streaming program
 to  *stream data from kafka topics and output as parquet file on HDFS*.

 Please share the *sample reference* program to stream data from kafka
 topics and output as parquet file on HDFS.

 Thanks in Advance.

 Regards,

 Rafeeq S
 *(“What you do is what matters, not what you think or say or plan.” )*





-- 
*Regards,*
*Mahebub Sayyed*


Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-06 Thread Tathagata Das
You can use SparkSQL for that very easily. You can convert the rdds you get
from kafka input stream, convert them to a RDDs of case classes and save as
parquet files.
More information here.
https://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files


On Wed, Aug 6, 2014 at 5:23 AM, Mahebub Sayyed mahebub...@gmail.com wrote:

 Hello,

 I have referred link https://github.com/dibbhatt/kafka-spark-consumer;
 and I have successfully consumed tuples from kafka.
 Tuples are JSON objects and I want to store that objects in HDFS as parque
 format.

 Please suggest me any sample example for that.
 Thanks in advance.





 On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya 
 dibyendu.bhattach...@gmail.com wrote:

 You can try this Kafka Spark Consumer which I recently wrote. This uses
 the Low Level Kafka Consumer

 https://github.com/dibbhatt/kafka-spark-consumer

 Dibyendu




 On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote:

 Hi,

 I am new to Apache Spark and Trying to Develop spark streaming program
 to  *stream data from kafka topics and output as parquet file on HDFS*.

 Please share the *sample reference* program to stream data from kafka
 topics and output as parquet file on HDFS.

 Thanks in Advance.

 Regards,

 Rafeeq S
 *(“What you do is what matters, not what you think or say or plan.” )*





 --
 *Regards,*
 *Mahebub Sayyed*



Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
Hi,

I am new to Apache Spark and Trying to Develop spark streaming program
to  *stream
data from kafka topics and output as parquet file on HDFS*.

Please share the *sample reference* program to stream data from kafka
topics and output as parquet file on HDFS.

Thanks in Advance.

Regards,

Rafeeq S
*(“What you do is what matters, not what you think or say or plan.” )*


Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Dibyendu Bhattacharya
You can try this Kafka Spark Consumer which I recently wrote. This uses the
Low Level Kafka Consumer

https://github.com/dibbhatt/kafka-spark-consumer

Dibyendu




On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote:

 Hi,

 I am new to Apache Spark and Trying to Develop spark streaming program to  
 *stream
 data from kafka topics and output as parquet file on HDFS*.

 Please share the *sample reference* program to stream data from kafka
 topics and output as parquet file on HDFS.

 Thanks in Advance.

 Regards,

 Rafeeq S
 *(“What you do is what matters, not what you think or say or plan.” )*




Re: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread rafeeq s
Thanks Dibyendu.

1. Spark itself have api jar for kafka, still we require manual offset
management (using simple consumer concept) and manual consumer ?
2.Kafka Spark Consumer which is implemented in kafka 0.8.0 ,Can we use it
for kafka 0.8.1 ?
3.How to use Kafka Spark Consumer to produce output

*as parquet file on HDFS ?*

*Please give your suggestion.*

Regards,

Rafeeq S
*(“What you do is what matters, not what you think or say or plan.” )*



On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.com wrote:

 You can try this Kafka Spark Consumer which I recently wrote. This uses
 the Low Level Kafka Consumer

 https://github.com/dibbhatt/kafka-spark-consumer

 Dibyendu




 On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s rafeeq.ec...@gmail.com wrote:

 Hi,

 I am new to Apache Spark and Trying to Develop spark streaming program
 to  *stream data from kafka topics and output as parquet file on HDFS*.

 Please share the *sample reference* program to stream data from kafka
 topics and output as parquet file on HDFS.

 Thanks in Advance.

 Regards,

 Rafeeq S
 *(“What you do is what matters, not what you think or say or plan.” )*





RE: Spark stream data from kafka topics and output as parquet file on HDFS

2014-08-05 Thread Shao, Saisai
Hi Rafeeq,

I think current Spark Streaming api can offer you the ability to fetch data 
from Kafka and store to another external store, if you do not care about 
management of consumer offset manually, there’s no need to use low level api as 
SimpleConsumer.

For Kafka 0.8.1 compatibility, you can try to modify the pom file and rebuild 
Spark to try it, mostly I think it can work.

For parquet file, I think if parquet offers its own OutputFormat that is 
extended from Hadoop’s OutputFormat, Spark can write data into parquet file, 
like sequence file or text file, you can do this as:

DStream.foreach { rdd = rdd.saveAsHadoopFile(…) } to specify the OutputFormat 
you want.

Thanks
Jerry

From: rafeeq s [mailto:rafeeq.ec...@gmail.com]
Sent: Tuesday, August 05, 2014 5:37 PM
To: Dibyendu Bhattacharya
Cc: u...@spark.incubator.apache.org
Subject: Re: Spark stream data from kafka topics and output as parquet file on 
HDFS

Thanks Dibyendu.
1. Spark itself have api jar for kafka, still we require manual offset 
management (using simple consumer concept) and manual consumer ?
2.Kafka Spark Consumer which is implemented in kafka 0.8.0 ,Can we use it for 
kafka 0.8.1 ?
3.How to use Kafka Spark Consumer to produce output as parquet file on HDFS ?
Please give your suggestion.

Regards,
Rafeeq S
(“What you do is what matters, not what you think or say or plan.” )


On Tue, Aug 5, 2014 at 11:55 AM, Dibyendu Bhattacharya 
dibyendu.bhattach...@gmail.commailto:dibyendu.bhattach...@gmail.com wrote:
You can try this Kafka Spark Consumer which I recently wrote. This uses the Low 
Level Kafka Consumer

https://github.com/dibbhatt/kafka-spark-consumer

Dibyendu



On Tue, Aug 5, 2014 at 12:52 PM, rafeeq s 
rafeeq.ec...@gmail.commailto:rafeeq.ec...@gmail.com wrote:
Hi,

I am new to Apache Spark and Trying to Develop spark streaming program to  
stream data from kafka topics and output as parquet file on HDFS.
Please share the sample reference program to stream data from kafka topics and 
output as parquet file on HDFS.
Thanks in Advance.

Regards,
Rafeeq S
(“What you do is what matters, not what you think or say or plan.” )