There is no good way to save to parquet without causing downstream consistency issues. You could use foreachRDD to get each RDD, convert it to DataFrame/Dataset, and write out as parquet files. But you will later run into issues with partial files caused by failures, etc.
On Wed, Feb 28, 2018 at 11:09 AM, karthikus <aswin8...@gmail.com> wrote: > Hi all, > > I have a Kafka stream data and I need to save the data in parquet format > without using Structured Streaming (due to the lack of Kafka Message header > support). > > val kafkaStream = > KafkaUtils.createDirectStream( > streamingContext, > LocationStrategies.PreferConsistent, > ConsumerStrategies.Subscribe[String, String]( > topics, > kafkaParams > ) > ) > // process the messages > val messages = kafkaStream.map(record => (record.key, record.value)) > val lines = messages.map(_._2) > > Now, how do I save it as parquet ? All the examples that I have come across > uses SQLContext which is deprecated. ! Any help appreciated ! > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >