[jira] [Commented] (BEAM-638) Add a Window function to create a bounded PCollection from an unbounded one
[ https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575679#comment-15575679 ] Eugene Kirpichov commented on BEAM-638: --- More discussion currently happening on http://markmail.org/message/se23dgiymob2pgok > Add a Window function to create a bounded PCollection from an unbounded one > --- > > Key: BEAM-638 > URL: https://issues.apache.org/jira/browse/BEAM-638 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Jean-Baptiste Onofré >Assignee: Davor Bonaci > > Today, if the pipeline source is unbounded, and the sink expects a bounded > collection, there's no way to use a single pipeline. Even a window creates a > chunk on the unbounded PCollection, but the "sub" PCollection is still > unbounded. > It would be helpful for users to have a Window function that create a bounded > PCollection (on the window) from an unbounded PCollection coming from the > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-638) Add a Window function to create a bounded PCollection from an unbounded one
[ https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542121#comment-15542121 ] Jean-Baptiste Onofré commented on BEAM-638: --- Let me explain a bit what I have in mind. Actually, all IO write implemented with {{DoFn}} will work with an unbounded collection. The problem is more for the {{TextIO.Write}}. Generally speaking, my concern is that using a Window on an unbounded PCollection is still an unbounded PCollection: some user may expect to have a bounded PCollection for the Window. So, we have three points to address: 1. The Sink (and generally speaking the IO write) should be able to deal with unbounded PCollection 2. A Window on an unbounded PCollection could result on a bounded collection (and not a chunk of the unbounded collection) Thoughts ? > Add a Window function to create a bounded PCollection from an unbounded one > --- > > Key: BEAM-638 > URL: https://issues.apache.org/jira/browse/BEAM-638 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Jean-Baptiste Onofré >Assignee: Davor Bonaci > > Today, if the pipeline source is unbounded, and the sink expects a bounded > collection, there's no way to use a single pipeline. Even a window creates a > chunk on the unbounded PCollection, but the "sub" PCollection is still > unbounded. > It would be helpful for users to have a Window function that create a bounded > PCollection (on the window) from an unbounded PCollection coming from the > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-638) Add a Window function to create a bounded PCollection from an unbounded one
[ https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15542018#comment-15542018 ] Jean-Baptiste Onofré commented on BEAM-638: --- We can have a trigger to create a bounded PCollection for an unbounded. Currently, the IO (especially sink) expect bounded PCollection: so, if the source provides an unbounded PCollection, it's useless. > Add a Window function to create a bounded PCollection from an unbounded one > --- > > Key: BEAM-638 > URL: https://issues.apache.org/jira/browse/BEAM-638 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Jean-Baptiste Onofré >Assignee: Davor Bonaci > > Today, if the pipeline source is unbounded, and the sink expects a bounded > collection, there's no way to use a single pipeline. Even a window creates a > chunk on the unbounded PCollection, but the "sub" PCollection is still > unbounded. > It would be helpful for users to have a Window function that create a bounded > PCollection (on the window) from an unbounded PCollection coming from the > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-638) Add a Window function to create a bounded PCollection from an unbounded one
[ https://issues.apache.org/jira/browse/BEAM-638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507706#comment-15507706 ] Aljoscha Krettek commented on BEAM-638: --- I don't think it's possible to provide such a function. What would the semantics of such a function be? That is, when would it consider that window "done", what happens to the computation upstream from that function/transformation, would it be canceled? Also, which window would be the one window that we take, for example from {{FixedWindows}}? > Add a Window function to create a bounded PCollection from an unbounded one > --- > > Key: BEAM-638 > URL: https://issues.apache.org/jira/browse/BEAM-638 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Jean-Baptiste Onofré >Assignee: Davor Bonaci > > Today, if the pipeline source is unbounded, and the sink expects a bounded > collection, there's no way to use a single pipeline. Even a window creates a > chunk on the unbounded PCollection, but the "sub" PCollection is still > unbounded. > It would be helpful for users to have a Window function that create a bounded > PCollection (on the window) from an unbounded PCollection coming from the > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)