Re: Structured Stream in Spark

2017-10-27 Thread KhajaAsmath Mohammed
Yes I checked both the output location and console too. It doesnt have any data. link also has the code and question that I have raised with Azure HDInsights. https://github.com/Azure/spark-eventhubs/issues/195 On Fri, Oct 27, 2017 at 3:22 PM, Shixiong(Ryan) Zhu wrote: > The codes in the link

Re: Structured Stream in Spark

2017-10-27 Thread Shixiong(Ryan) Zhu
The codes in the link write the data into files. Did you check the output location? By the way, if you want to see the data on the console, you can use the console sink by changing this line *format("parquet").option("path", outputPath + "/ETL").partitionBy("creationTime").start()* to *format("con

Re: Structured Stream in Spark

2017-10-27 Thread KhajaAsmath Mohammed
Hi TathagataDas, I was trying to use eventhub with spark streaming. Looks like I was able to make connection successfully but cannot see any data on the console. Not sure if eventhub is supported or not. https://github.com/Azure/spark-eventhubs/blob/master/examples/src/main/scala/com/microsoft/sp

Re: Structured Stream in Spark

2017-10-26 Thread KhajaAsmath Mohammed
Thanks TD. On Wed, Oct 25, 2017 at 6:42 PM, Tathagata Das wrote: > Please do not confuse old Spark Streaming (DStreams) with Structured > Streaming. Structured Streaming's offset and checkpoint management is far > more robust than DStreams. > Take a look at my talk - https://spark-summit.org/ >

Re: Structured Stream in Spark

2017-10-25 Thread Tathagata Das
Please do not confuse old Spark Streaming (DStreams) with Structured Streaming. Structured Streaming's offset and checkpoint management is far more robust than DStreams. Take a look at my talk - https://spark-summit.org/2017/speakers/tathagata-das/ On Wed, Oct 25, 2017 at 9:29 PM, KhajaAsmath Moha

Re: Structured Stream in Spark

2017-10-25 Thread KhajaAsmath Mohammed
Thanks Subhash. Have you ever used zero data loss concept with streaming. I am bit worried to use streamig when it comes to data loss. https://blog.cloudera.com/blog/2017/06/offset-management-for-apache-kafka-with-apache-spark-streaming/ does structured streaming handles it internally? On Wed,

Re: Structured Stream in Spark

2017-10-25 Thread Subhash Sriram
No problem! Take a look at this: http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#recovering-from-failures-with-checkpointing Thanks, Subhash On Wed, Oct 25, 2017 at 4:08 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi Sriram, > > Thanks. This is w

Re: Structured Stream in Spark

2017-10-25 Thread KhajaAsmath Mohammed
Hi Sriram, Thanks. This is what I was looking for. one question, where do we need to specify the checkpoint directory in case of structured streaming? Thanks, Asmath On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram wrote: > Hi Asmath, > > Here is an example of using structured streaming to rea

Re: Structured Stream in Spark

2017-10-25 Thread Subhash Sriram
Hi Asmath, Here is an example of using structured streaming to read from Kafka: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredKafkaWordCount.scala In terms of parsing the JSON, there is a from_json function that you can use.

Structured Stream in Spark

2017-10-25 Thread KhajaAsmath Mohammed
Hi, Could anyone provide suggestions on how to parse json data from kafka and load it back in hive. I have read about structured streaming but didn't find any examples. is there any best practise on how to read it and parse it with structured streaming for this use case? Thanks, Asmath