[jira] [Commented] (BEAM-1316) DoFn#startBundle and #finishBundle should not be able to output

2017-01-26 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840324#comment-15840324
 ] 

Kenneth Knowles commented on BEAM-1316:
---

That is a good example that breaks my false dichotomy.

I think between BEAM-1283 and BEAM-1287 there can be a coherent solution. But 
even with all the planned changes if you want decently windowed output you'll 
end up tracking it yourself, unless we introduce per-window finishBundle/flush 
(which was frowned upon some time ago, but maybe makes sense here).

> DoFn#startBundle and #finishBundle should not be able to output
> ---
>
> Key: BEAM-1316
> URL: https://issues.apache.org/jira/browse/BEAM-1316
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>
> While within startBundle and finishBundle, the window in which elements are 
> output is not generally defined. Elements must always be output from within a 
> windowed context, or the {{WindowFn}} used by the {{PCollection}} may not 
> operate appropriately.
> startBundle and finishBundle are suitable for operational duties, similarly 
> to {{setup}} and {{teardown}}, but within the scope of some collection of 
> input elements. This includes actions such as clearing field state within a 
> DoFn and ensuring all live RPCs complete successfully before committing 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1316) DoFn#startBundle and #finishBundle should not be able to output

2017-01-26 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840014#comment-15840014
 ] 

Daniel Halperin commented on BEAM-1316:
---

What if my output includes a list of filenames paired with file sizes, or 
element counts -- aka, information that may only be known after I flush to 
external systems?

> DoFn#startBundle and #finishBundle should not be able to output
> ---
>
> Key: BEAM-1316
> URL: https://issues.apache.org/jira/browse/BEAM-1316
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>
> While within startBundle and finishBundle, the window in which elements are 
> output is not generally defined. Elements must always be output from within a 
> windowed context, or the {{WindowFn}} used by the {{PCollection}} may not 
> operate appropriately.
> startBundle and finishBundle are suitable for operational duties, similarly 
> to {{setup}} and {{teardown}}, but within the scope of some collection of 
> input elements. This includes actions such as clearing field state within a 
> DoFn and ensuring all live RPCs complete successfully before committing 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1316) DoFn#startBundle and #finishBundle should not be able to output

2017-01-26 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840002#comment-15840002
 ] 

Kenneth Knowles commented on BEAM-1316:
---

I suggest very explicitly distinguishing flushing to external systems from 
flushing output to a PCollection. FinishBundle works for the former but for the 
latter requires significant contortions to be correct and even then won't do 
what you want when there are many small bundles.

> DoFn#startBundle and #finishBundle should not be able to output
> ---
>
> Key: BEAM-1316
> URL: https://issues.apache.org/jira/browse/BEAM-1316
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>
> While within startBundle and finishBundle, the window in which elements are 
> output is not generally defined. Elements must always be output from within a 
> windowed context, or the {{WindowFn}} used by the {{PCollection}} may not 
> operate appropriately.
> startBundle and finishBundle are suitable for operational duties, similarly 
> to {{setup}} and {{teardown}}, but within the scope of some collection of 
> input elements. This includes actions such as clearing field state within a 
> DoFn and ensuring all live RPCs complete successfully before committing 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1316) DoFn#startBundle and #finishBundle should not be able to output

2017-01-25 Thread Daniel Halperin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838870#comment-15838870
 ] 

Daniel Halperin commented on BEAM-1316:
---

I think one many need to output in finish bundle using the current "buffer, and 
flush half-full if this is the end of the bundle" pattern.

> DoFn#startBundle and #finishBundle should not be able to output
> ---
>
> Key: BEAM-1316
> URL: https://issues.apache.org/jira/browse/BEAM-1316
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>
> While within startBundle and finishBundle, the window in which elements are 
> output is not generally defined. Elements must always be output from within a 
> windowed context, or the {{WindowFn}} used by the {{PCollection}} may not 
> operate appropriately.
> startBundle and finishBundle are suitable for operational duties, similarly 
> to {{setup}} and {{teardown}}, but within the scope of some collection of 
> input elements. This includes actions such as clearing field state within a 
> DoFn and ensuring all live RPCs complete successfully before committing 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-1316) DoFn#startBundle and #finishBundle should not be able to output

2017-01-25 Thread Thomas Groh (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838761#comment-15838761
 ] 

Thomas Groh commented on BEAM-1316:
---

Forbidding output from startBundle and finishBundle brings the contexts 
received by them in line with setup and teardown

> DoFn#startBundle and #finishBundle should not be able to output
> ---
>
> Key: BEAM-1316
> URL: https://issues.apache.org/jira/browse/BEAM-1316
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Thomas Groh
>
> While within startBundle and finishBundle, the window in which elements are 
> output is not generally defined. Elements must always be output from within a 
> windowed context, or the {{WindowFn}} used by the {{PCollection}} may not 
> operate appropriately.
> startBundle and finishBundle are suitable for operational duties, similarly 
> to {{setup}} and {{teardown}}, but within the scope of some collection of 
> input elements. This includes actions such as clearing field state within a 
> DoFn and ensuring all live RPCs complete successfully before committing 
> inputs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)