Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
It is quite complicated. See
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
in
particular the expand() method. At a high level, it assigns a shard index
to every element and then groups by destination and shard index (implicitly
also by window), and writes each group to its own temporary file (so
there's 1 set of temporary files generated for each trigger firing), then
renames temporary files.

On Tue, Feb 13, 2018 at 12:30 PM Carlos Alonso  wrote:

> Cool thanks!
>
> How does it work internally? Are all the elements routed to the same path
> grouped and processed within the same bundle?
>
> Thanks!
>
> On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov 
> wrote:
>
>> It will do its best to throw an exception if duplicate names are produced
>> within one pane. Otherwise, it will overwrite.
>>
>> On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso 
>> wrote:
>>
>>> Cool, thanks.
>>>
>>> What if the destination is not properly coded and the File naming policy
>>> then produces a duplicated path? Will it throw an exception? Overwrite?
>>>
>>> Thanks!
>>>
>>> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov 
>>> wrote:
>>>
 Dynamic file writes generate 1 set of files (shards) for every pane
 firing of every window of every destination. File naming policy is required
 to produce different names for every combination of (destination, shard
 index, window, pane) so you never have to append or overwrite. A new
 element arriving for a destination after something for that destination has
 already been written will simply be in the next pane, or in a different
 window.

 On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso 
 wrote:

> Hi everyone!!
>
> I'm wondering how a TextIO with dynamic routing knows/decides when to
> finalise a file and what happens if after it is finalised, another element
> routed for the same file appears.
>
> Thanks!
>



Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool thanks!

How does it work internally? Are all the elements routed to the same path
grouped and processed within the same bundle?

Thanks!

On Tue, Feb 13, 2018 at 9:03 PM Eugene Kirpichov 
wrote:

> It will do its best to throw an exception if duplicate names are produced
> within one pane. Otherwise, it will overwrite.
>
> On Tue, Feb 13, 2018 at 11:58 AM Carlos Alonso 
> wrote:
>
>> Cool, thanks.
>>
>> What if the destination is not properly coded and the File naming policy
>> then produces a duplicated path? Will it throw an exception? Overwrite?
>>
>> Thanks!
>>
>> On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov 
>> wrote:
>>
>>> Dynamic file writes generate 1 set of files (shards) for every pane
>>> firing of every window of every destination. File naming policy is required
>>> to produce different names for every combination of (destination, shard
>>> index, window, pane) so you never have to append or overwrite. A new
>>> element arriving for a destination after something for that destination has
>>> already been written will simply be in the next pane, or in a different
>>> window.
>>>
>>> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso 
>>> wrote:
>>>
 Hi everyone!!

 I'm wondering how a TextIO with dynamic routing knows/decides when to
 finalise a file and what happens if after it is finalised, another element
 routed for the same file appears.

 Thanks!

>>>


Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Carlos Alonso
Cool, thanks.

What if the destination is not properly coded and the File naming policy
then produces a duplicated path? Will it throw an exception? Overwrite?

Thanks!

On Tue, Feb 13, 2018 at 6:23 PM Eugene Kirpichov 
wrote:

> Dynamic file writes generate 1 set of files (shards) for every pane firing
> of every window of every destination. File naming policy is required to
> produce different names for every combination of (destination, shard index,
> window, pane) so you never have to append or overwrite. A new element
> arriving for a destination after something for that destination has already
> been written will simply be in the next pane, or in a different window.
>
> On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso  wrote:
>
>> Hi everyone!!
>>
>> I'm wondering how a TextIO with dynamic routing knows/decides when to
>> finalise a file and what happens if after it is finalised, another element
>> routed for the same file appears.
>>
>> Thanks!
>>
>


Re: How does TextIO decides when to finalise a file?

2018-02-13 Thread Eugene Kirpichov
Dynamic file writes generate 1 set of files (shards) for every pane firing
of every window of every destination. File naming policy is required to
produce different names for every combination of (destination, shard index,
window, pane) so you never have to append or overwrite. A new element
arriving for a destination after something for that destination has already
been written will simply be in the next pane, or in a different window.

On Tue, Feb 13, 2018, 6:33 AM Carlos Alonso  wrote:

> Hi everyone!!
>
> I'm wondering how a TextIO with dynamic routing knows/decides when to
> finalise a file and what happens if after it is finalised, another element
> routed for the same file appears.
>
> Thanks!
>