Re: Dynamic file-based sinks

2017-07-31 Thread Josh
That's great news, thanks Reuven! I will try this out soon. On Sat, Jul 29, 2017 at 2:33 AM, Reuven Lax wrote: > The AvroIO PR is now merged, so you can write to different destinations > based on the value. It's available in head, and will be in Beam 2.2.0. > > On Wed,

Re: Dynamic file-based sinks

2017-07-28 Thread Reuven Lax
The AvroIO PR is now merged, so you can write to different destinations based on the value. It's available in head, and will be in Beam 2.2.0. On Wed, Jul 26, 2017 at 10:00 AM, Reuven Lax wrote: > Yes, there was! TextIO support is already merged into Beam (it missed the > 2.1

Re: Dynamic file-based sinks

2017-07-26 Thread Reuven Lax
Yes, there was! TextIO support is already merged into Beam (it missed the 2.1 cutoff, so it will be in Beam 2.2.0). AvroIO support is in https://github.com/apache/beam/pull/3541. This is almost ready to merge - still waiting for final review from kennknowles on the Beam translation changes.

Re: Dynamic file-based sinks

2017-07-26 Thread Josh
Hi all, Was there any progress on this recently? I am particularly interested in using value-dependent destinations in BigtableIO (writing to a specific table depending on the value) and AvroIO (writing to specific GCS buckets depending on the value). Thanks, Josh On Fri, Jun 9, 2017 at 5:35

Re: Dynamic file-based sinks

2017-06-09 Thread Reuven Lax
I'm putting together a proof-of-concept PR for option 1 to see how it looks. On Thu, Jun 8, 2017 at 4:07 PM, Reuven Lax wrote: > After looking at everyone's comments, I think option 1 is the better > approach - map destinations to a FilenamePolicy. It is a good parallel to >

Re: Dynamic file-based sinks

2017-06-08 Thread Reuven Lax
After looking at everyone's comments, I think option 1 is the better approach - map destinations to a FilenamePolicy. It is a good parallel to what we do in BigQueryIO (the main difference is that we're mapping to a sharded filename, instead of a single destination like in BigQueryIO). The main

Re: Dynamic file-based sinks

2017-05-24 Thread Eugene Kirpichov
Hmm, on one hand this looks syntactically very appealing, on the other hand, it's icky to have a function return a PTransform at runtime, only to have some information be immediately extracted from that transform. Moreover, not all TextIO.Write transforms will be legal to return - e.g. most likely

Re: Dynamic file-based sinks

2017-05-24 Thread Reuven Lax
Did you see that I modified the second proposal so that users can map DestinationT to the actual PTransform (i.e. DestinationT->TextIO or DestinationT->AvroIO). This means that users do not have to deal with FileBasedSink or even know it exists. I prefer the second approach for two reason: 1. It

Re: Dynamic file-based sinks

2017-05-24 Thread Kenneth Knowles
I commented a little in the doc I want to reply on list because this is a really great feature. The two alternatives, as I understand them, both include mapping your elements to an intermediate DestinationT that you can group by before writing. Then the big picture decision is whether to map each