Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
It is quite complicated. See https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java in particular the expand() method. At a high level, it assigns a shard index to every element and then groups by destination and shard index (implicitly also

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool thanks! How does it work internally? Are all the elements routed to the same path grouped and processed within the same bundle? Thanks! On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov wrote: > It will do its best to throw an exception if duplicate names are

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool, thanks. What if the destination is not properly coded and the File naming policy then produces a duplicated path? Will it throw an exception? Overwrite? Thanks! On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov wrote: > Dynamic file writes generate 1 set of files

Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
Dynamic file writes generate 1 set of files (shards) for every pane firing of every window of every destination. File naming policy is required to produce different names for every combination of (destination, shard index, window, pane) so you never have to append or overwrite. A new element