Hi Andrew, The use case I have in mind is batch data serialization to HDFS, where sizing files to a certain HDFS block size is desired. In my particular use case, I want to process 10GB batches of data at a time. I'm not sure this is a sensible use case for spark streaming, and I was trying to test it. However, I had trouble getting it working and in the end I decided it was more trouble than it was worth. So I decided to split my task into two: one streaming job on small, time-defined batches of data, and a traditional Spark job aggregating the smaller files into a larger whole. In retrospect, I think this is the right way to go, even if a count-based window specification was possible. Therefore, I can't suggest my use case for a count-based window size.
Cheers, Michael On Oct 5, 2014, at 4:03 PM, Andrew Ash <and...@andrewash.com> wrote: > Hi Michael, > > I couldn't find anything in Jira for it -- > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20text%20~%20%22window%22%20AND%20component%20%3D%20Streaming > > Could you or Adrian please file a Jira ticket explaining the functionality > and maybe a proposed API? This will help people interested in count-based > windowing to understand the state of the feature in Spark Streaming. > > Thanks! > Andrew > > On Fri, Oct 3, 2014 at 4:09 PM, Michael Allman <mich...@videoamp.com> wrote: > Hi, > > I also have a use for count-based windowing. I'd like to process data > batches by size as opposed to time. Is this feature on the development > roadmap? Is there a JIRA ticket for it? > > Thank you, > > Michael > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/window-every-n-elements-instead-of-time-based-tp2085p15701.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >