Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
It is quite complicated. See https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java in particular the expand() method. At a high level, it assigns a shard index to every element and then groups by destination and shard index (implicitly also b

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool thanks! How does it work internally? Are all the elements routed to the same path grouped and processed within the same bundle? Thanks! On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov wrote: > It will do its best to throw an exception if duplicate names are produced > within one pane. Ot

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
It will do its best to throw an exception if duplicate names are produced within one pane. Otherwise, it will overwrite. On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso wrote: > Cool, thanks. > > What if the destination is not properly coded and the File naming policy > then produces a duplicated

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool, thanks. What if the destination is not properly coded and the File naming policy then produces a duplicated path? Will it throw an exception? Overwrite? Thanks! On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov wrote: > Dynamic file writes generate 1 set of files (shards) for every pane f

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
Dynamic file writes generate 1 set of files (shards) for every pane firing of every window of every destination. File naming policy is required to produce different names for every combination of (destination, shard index, window, pane) so you never have to append or overwrite. A new element arrivi

How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Hi everyone!! I'm wondering how a TextIO with dynamic routing knows/decides when to finalise a file and what happens if after it is finalised, another element routed for the same file appears. Thanks!