Re: How does TextIO decides when to finalise a file?
It is quite complicated. See https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java in particular the expand() method. At a high level, it assigns a shard index to every element and then groups by destination and shard index (implicitly also by window), and writes each group to its own temporary file (so there's 1 set of temporary files generated for each trigger firing), then renames temporary files. On Tue, Feb 13, 2018 at 12:30 PM Carlos Alonsowrote: > Cool thanks! > > How does it work internally? Are all the elements routed to the same path > grouped and processed within the same bundle? > > Thanks! > > On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov > wrote: > >> It will do its best to throw an exception if duplicate names are produced >> within one pane. Otherwise, it will overwrite. >> >> On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso >> wrote: >> >>> Cool, thanks. >>> >>> What if the destination is not properly coded and the File naming policy >>> then produces a duplicated path? Will it throw an exception? Overwrite? >>> >>> Thanks! >>> >>> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov >>> wrote: >>> Dynamic file writes generate 1 set of files (shards) for every pane firing of every window of every destination. File naming policy is required to produce different names for every combination of (destination, shard index, window, pane) so you never have to append or overwrite. A new element arriving for a destination after something for that destination has already been written will simply be in the next pane, or in a different window. On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso wrote: > Hi everyone!! > > I'm wondering how a TextIO with dynamic routing knows/decides when to > finalise a file and what happens if after it is finalised, another element > routed for the same file appears. > > Thanks! >
Re: How does TextIO decides when to finalise a file?
Cool thanks! How does it work internally? Are all the elements routed to the same path grouped and processed within the same bundle? Thanks! On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichovwrote: > It will do its best to throw an exception if duplicate names are produced > within one pane. Otherwise, it will overwrite. > > On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso > wrote: > >> Cool, thanks. >> >> What if the destination is not properly coded and the File naming policy >> then produces a duplicated path? Will it throw an exception? Overwrite? >> >> Thanks! >> >> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov >> wrote: >> >>> Dynamic file writes generate 1 set of files (shards) for every pane >>> firing of every window of every destination. File naming policy is required >>> to produce different names for every combination of (destination, shard >>> index, window, pane) so you never have to append or overwrite. A new >>> element arriving for a destination after something for that destination has >>> already been written will simply be in the next pane, or in a different >>> window. >>> >>> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso >>> wrote: >>> Hi everyone!! I'm wondering how a TextIO with dynamic routing knows/decides when to finalise a file and what happens if after it is finalised, another element routed for the same file appears. Thanks! >>>
Re: How does TextIO decides when to finalise a file?
Cool, thanks. What if the destination is not properly coded and the File naming policy then produces a duplicated path? Will it throw an exception? Overwrite? Thanks! On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichovwrote: > Dynamic file writes generate 1 set of files (shards) for every pane firing > of every window of every destination. File naming policy is required to > produce different names for every combination of (destination, shard index, > window, pane) so you never have to append or overwrite. A new element > arriving for a destination after something for that destination has already > been written will simply be in the next pane, or in a different window. > > On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso wrote: > >> Hi everyone!! >> >> I'm wondering how a TextIO with dynamic routing knows/decides when to >> finalise a file and what happens if after it is finalised, another element >> routed for the same file appears. >> >> Thanks! >> >
Re: How does TextIO decides when to finalise a file?
Dynamic file writes generate 1 set of files (shards) for every pane firing of every window of every destination. File naming policy is required to produce different names for every combination of (destination, shard index, window, pane) so you never have to append or overwrite. A new element arriving for a destination after something for that destination has already been written will simply be in the next pane, or in a different window. On Tue, Feb 13, 2018, 6:33 AM Carlos Alonsowrote: > Hi everyone!! > > I'm wondering how a TextIO with dynamic routing knows/decides when to > finalise a file and what happens if after it is finalised, another element > routed for the same file appears. > > Thanks! >