Re: Add Sorting Class?

2016-05-27 Thread Robert Bradshaw
Totally agree that orderings of values within a key[-window[-pane]]-grouping being quite useful, and they make total sense in the model (primarily because elements themselves are never partitioned). On Fri, May 27, 2016 at 11:31 AM, Bobby Evans wrote: > If you have

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
Another perspective is to look at other projects in the Hadoop ecosystem. Impala had to have a LIMIT any time you did an ORDER BY. They're since removed this limitation. Hive has two sorting options. ORDER BY does a global order. SORT BY orders everything in that partition. On Thu, May 26, 2016

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
I had a similar thought, but wasn't sure if that violated a tenet of Beam. I'm thinking an ordered sink could wrap around another sink. I could see something like: collection.apply(OrderedSink.Timestamp.write(TextIO.Write.To(...))); On Thu, May 26, 2016 at 12:26 PM Robert Bradshaw

Re: Add Sorting Class?

2016-05-26 Thread Robert Bradshaw
As Frances alluded to, it's also really hard to reconcile the notion of a globally ordered PCollection in the context of a streaming pipeline. Sorting also imposes conditions on partitioning, which we intentionally leave unspecified for maximum flexibility in the runtime. One also gets into the

Re: Add Sorting Class?

2016-05-26 Thread Jesse Anderson
@frances great analysis. I'm hoping this serves as the starting point for the discussion. It really comes down to: is this a nice to have or a show stopping requirement? As you mention, it comes down to the use case. I've taught at large financial companies where (global) sorting was a real and

Add Sorting Class?

2016-05-26 Thread Jesse Anderson
This is somewhat the continuation of my thread "Writing Out List." Right now, the only way to do sorting is with the Top class. This works well, but has the constraint of fitting in memory. A common batch use case is to take a large file and sort it. For example, this would be sorting a large