Re: Sorting on a streaming dataframe

2018-05-01 Thread Hemant Bhanawat
ur own sink(s). >>>>> That is, just grabbing the parquet sink, etc. isn’t going to work out of >>>>> the box. Alternatively map/flatMapGroupsWithState is probably sufficient >>>>> and requires less working knowledge to make effective reuse of interna

Re: Sorting on a streaming dataframe

2018-04-30 Thread Michael Armbrust
ve reuse of internals. >>>> Just group by foo and then sort accordingly and assign ids. The id counter >>>> can be stateful per group. Sometimes this problem may not need to be solved >>>> at all. For example, if you are using kafka, a proper partitioning scheme >>

Re: Sorting on a streaming dataframe

2018-04-27 Thread Hemant Bhanawat
rnals. Just >>> group by foo and then sort accordingly and assign ids. The id counter can >>> be stateful per group. Sometimes this problem may not need to be solved at >>> all. For example, if you are using kafka, a proper partitioning scheme and >>> message offsets may be “good enough”. >>> -

Re: Sorting on a streaming dataframe

2018-04-26 Thread Michael Armbrust
t; -------------- >> *From:* Hemant Bhanawat >> *Sent:* Thursday, April 12, 2018 11:42:59 PM >> *To:* Reynold Xin >> *Cc:* dev >> *Subject:* Re: Sorting on a streaming dataframe >> >> Well, we want to assign snapshot ids (incrementing counters

Re: Sorting on a streaming dataframe

2018-04-24 Thread Chayapan Khannabha
titioning scheme and message > offsets may be “good enough”. > From: Hemant Bhanawat mailto:hemant9...@gmail.com>> > Sent: Thursday, April 12, 2018 11:42:59 PM > To: Reynold Xin > Cc: dev > Subject: Re: Sorting on a streaming dataframe > > Well, we want to assign snap

Re: Sorting on a streaming dataframe

2018-04-24 Thread Arun Mahadevan
). Thanks, Arun From: Hemant Bhanawat Date: Tuesday, April 24, 2018 at 12:18 AM To: "Bowden, Chris" Cc: Reynold Xin , dev Subject: Re: Sorting on a streaming dataframe Thanks Chris. There are many ways in which I can solve this problem but they are cumbersome. The easiest way would

Re: Sorting on a streaming dataframe

2018-04-24 Thread Hemant Bhanawat
gt; all. For example, if you are using kafka, a proper partitioning scheme and > message offsets may be “good enough”. > -- > *From:* Hemant Bhanawat > *Sent:* Thursday, April 12, 2018 11:42:59 PM > *To:* Reynold Xin > *Cc:* dev > *Subject:* Re: Sorti

Re: Sorting on a streaming dataframe

2018-04-12 Thread Hemant Bhanawat
Well, we want to assign snapshot ids (incrementing counters) to the incoming records. For that, we are zipping the streaming rdds with that counter using a modified version of ZippedWithIndexRDD. We are ok if the records in the streaming dataframe gets counters in random order but the counter shoul

Re: Sorting on a streaming dataframe

2018-04-12 Thread Reynold Xin
Can you describe your use case more? On Thu, Apr 12, 2018 at 11:12 PM Hemant Bhanawat wrote: > Hi Guys, > > Why is sorting on streaming dataframes not supported(unless it is complete > mode)? My downstream needs me to sort the streaming dataframe. > > Hemant >

Sorting on a streaming dataframe

2018-04-12 Thread Hemant Bhanawat
Hi Guys, Why is sorting on streaming dataframes not supported(unless it is complete mode)? My downstream needs me to sort the streaming dataframe. Hemant