Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-21 Thread Aakash Basu
> and compact data format if CSV isn't required. > > -- > *From:* Aakash Basu <aakash.spark@gmail.com> > *Sent:* Friday, March 16, 2018 9:12:39 AM > *To:* sagar grover > *Cc:* Bowden, Chris; Tathagata Das; Dylan Guedes; Georg Heiler; user; > jagrati.go...@myntra.com

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread Aakash Basu
;>>> >>>> Cool! Shall try it and revert back tomm. >>>> >>>> Thanks a ton! >>>> >>>> On 15-Mar-2018 11:50 PM, "Bowden, Chris" <chris.bow...@microfocus.com> >>>> wrote: >>>> >>>>>

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread Aakash Basu
t;> You got it right. I'm reading a *csv *file from local as mentioned >>>> above, with a console producer on Kafka side. >>>> >>>> So, as it is a csv data with headers, shall I then use from_csv on the >>>> spark side and provid

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread Aakash Basu
t;> offers from_csv out of the box as an expression (although CSV is well >>> supported as a data source). You could implement an expression by reusing a >>> lot of the supporting CSV classes which may result in a better user >>> experience vs. explicitly using split

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-16 Thread sagar grover
may result in a better user >> experience vs. explicitly using split and array indices, etc. In this >> simple example, casting the binary to a string just works because there is >> a common understanding of string's encoded as bytes between Spark and Kafka >> by default. >&g

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
--- > *From:* Aakash Basu <aakash.spark@gmail.com> > *Sent:* Thursday, March 15, 2018 10:48:45 AM > *To:* Bowden, Chris > *Cc:* Tathagata Das; Dylan Guedes; Georg Heiler; user > *Subject:* Re: Multiple Kafka Spark Streaming Dataframe Join query > > Hey Chris, > &g

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
com> Sent: Thursday, March 15, 2018 7:52:28 AM To: Tathagata Das Cc: Dylan Guedes; Georg Heiler; user Subject: Re: Multiple Kafka Spark Streaming Dataframe Join query Hi, And if I run this below piece of code - from pyspark.sql import SparkSession import time class test: spark

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Tathagata Das
; From: Aakash Basu <aakash.spark@gmail.com> > Sent: Thursday, March 15, 2018 7:52:28 AM > To: Tathagata Das > Cc: Dylan Guedes; Georg Heiler; user > Subject: Re: Multiple Kafka Spark Streaming Dataframe Join query > > Hi, > > And if I run this below piece of code - > >

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Hi, And if I run this below piece of code - from pyspark.sql import SparkSession import time class test: spark = SparkSession.builder \ .appName("DirectKafka_Spark_Stream_Stream_Join") \ .getOrCreate() # ssc = StreamingContext(spark, 20) table1_stream =

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Any help on the above? On Thu, Mar 15, 2018 at 3:53 PM, Aakash Basu wrote: > Hi, > > I progressed a bit in the above mentioned topic - > > 1) I am feeding a CSV file into the Kafka topic. > 2) Feeding the Kafka topic as readStream as TD's article suggests. > 3) Then,

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-15 Thread Aakash Basu
Hi, I progressed a bit in the above mentioned topic - 1) I am feeding a CSV file into the Kafka topic. 2) Feeding the Kafka topic as readStream as TD's article suggests. 3) Then, simply trying to do a show on the streaming dataframe, using queryName('XYZ') in the writeStream and writing a sql

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu
Thanks to TD, the savior! Shall look into it. On Thu, Mar 15, 2018 at 1:04 AM, Tathagata Das wrote: > Relevant: https://databricks.com/blog/2018/03/13/ > introducing-stream-stream-joins-in-apache-spark-2-3.html > > This is true stream-stream join which will

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Tathagata Das
Relevant: https://databricks.com/blog/2018/03/13/introducing-stream-stream-joins-in-apache-spark-2-3.html This is true stream-stream join which will automatically buffer delayed data and appropriately join stuff with SQL join semantics. Please check it out :) TD On Wed, Mar 14, 2018 at 12:07

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Dylan Guedes
I misread it, and thought that you question was if pyspark supports kafka lol. Sorry! On Wed, Mar 14, 2018 at 3:58 PM, Aakash Basu wrote: > Hey Dylan, > > Great! > > Can you revert back to my initial and also the latest mail? > > Thanks, > Aakash. > > On 15-Mar-2018

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu
Hey Dylan, Great! Can you revert back to my initial and also the latest mail? Thanks, Aakash. On 15-Mar-2018 12:27 AM, "Dylan Guedes" wrote: > Hi, > > I've been using the Kafka with pyspark since 2.1. > > On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Dylan Guedes
Hi, I've been using the Kafka with pyspark since 2.1. On Wed, Mar 14, 2018 at 3:49 PM, Aakash Basu wrote: > Hi, > > I'm yet to. > > Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package > allows Python? I read somewhere, as of now Scala and Java are

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu
Hi, I'm yet to. Just want to know, when does Spark 2.3 with 0.10 Kafka Spark Package allows Python? I read somewhere, as of now Scala and Java are the languages to be used. Please correct me if am wrong. Thanks, Aakash. On 14-Mar-2018 8:24 PM, "Georg Heiler" wrote:

Re: Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Georg Heiler
Did you try spark 2.3 with structured streaming? There watermarking and plain sql might be really interesting for you. Aakash Basu schrieb am Mi. 14. März 2018 um 14:57: > Hi, > > > > *Info (Using):Spark Streaming Kafka 0.8 package* > > *Spark 2.2.1* > *Kafka 1.0.1* >

Multiple Kafka Spark Streaming Dataframe Join query

2018-03-14 Thread Aakash Basu
Hi, *Info (Using):Spark Streaming Kafka 0.8 package* *Spark 2.2.1* *Kafka 1.0.1* As of now, I am feeding paragraphs in Kafka console producer and my Spark, which is acting as a receiver is printing the flattened words, which is a complete RDD operation. *My motive is to read two tables