Re: window every n elements instead of time based

Michael Allman Tue, 07 Oct 2014 22:20:02 -0700

Hi Andrew,

The use case I have in mind is batch data serialization to HDFS, where sizing 
files to a certain HDFS block size is desired. In my particular use case, I 
want to process 10GB batches of data at a time. I'm not sure this is a sensible 
use case for spark streaming, and I was trying to test it. However, I had 
trouble getting it working and in the end I decided it was more trouble than it 
was worth. So I decided to split my task into two: one streaming job on small, 
time-defined batches of data, and a traditional Spark job aggregating the 
smaller files into a larger whole. In retrospect, I think this is the right way 
to go, even if a count-based window specification was possible. Therefore, I 
can't suggest my use case for a count-based window size.


Cheers,

Michael

On Oct 5, 2014, at 4:03 PM, Andrew Ash <and...@andrewash.com> wrote:

> Hi Michael,
> 
> I couldn't find anything in Jira for it -- 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming
> 
> Could you or Adrian please file a Jira ticket explaining the functionality 
> and maybe a proposed API?  This will help people interested in count-based 
> windowing to understand the state of the feature in Spark Streaming.
> 
> Thanks!
> Andrew
> 
> On Fri, Oct 3, 2014 at 4:09 PM, Michael Allman <mich...@videoamp.com> wrote:
> Hi,
> 
> I also have a use for count-based windowing. I'd like to process data
> batches by size as opposed to time. Is this feature on the development
> roadmap? Is there a JIRA ticket for it?
> 
> Thank you,
> 
> Michael
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/window-every-n-elements-instead-of-time-based-tp2085p15701.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
>

Re: window every n elements instead of time based

Reply via email to