Re: [Beginner] How to save Kafka Dstream data to parquet ?

Patrick Alwell Wed, 28 Feb 2018 12:18:06 -0800

I don’t think sql context is “deprecated” in this sense. It’s still accessible 
by earlier versions of Spark.

But yes, at first glance it looks like you are correct. I don’t see a
recordWriter method for parquet outside of the SQL package.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.streaming.DataStreamWriter

Here is an example that uses Sql context. I believe the SQL context is
necessary for strongly typed, self describing, binary, columnar formatted files
like Parquet.
https://community.hortonworks.com/articles/72941/writing-parquet-on-hdfs-using-spark-streaming.html

Otherwise you’ll probably be looking at a customWriter.
https://parquet.apache.org/documentation/latest/

AFAIK,

If you were to implement a custom writer, you still wouldn’t escape the parquet
formatting paradigm the DF API solves. Spark needs a way to map data types for
Parquet conversion.

Hope this helps,

-Pat

On 2/28/18, 11:09 AM, "karthikus" <aswin8...@gmail.com> wrote:

Hi all,

I have a Kafka stream data and I need to save the data in parquet format
without using Structured Streaming (due to the lack of Kafka Message header
support).

val kafkaStream =
KafkaUtils.createDirectStream(
streamingContext,
LocationStrategies.PreferConsistent,
ConsumerStrategies.Subscribe[String, String](
topics,
kafkaParams
)
)
// process the messages
val messages = kafkaStream.map(record => (record.key, record.value))
val lines = messages.map(_._2)

Now, how do I save it as parquet ? All the examples that I have come across
uses SQLContext which is deprecated. ! Any help appreciated !

--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [Beginner] How to save Kafka Dstream data to parquet ?

Reply via email to