[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118985#comment-17118985 ] Pablo Estrada commented on BEAM-1438: - Ah it looks like it's just a matter of removing the check. [https://github.com/apache/beam/pull/11850] is out to fix this. > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: P2 > Fix For: 2.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118980#comment-17118980 ] Pablo Estrada commented on BEAM-1438: - [~reuvenlax] are you able to take a look at this? > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: P2 > Fix For: 2.5.0 > > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118979#comment-17118979 ] Pablo Estrada commented on BEAM-1438: - Reopening this issue, as this will not work on Dataflow, as appropriately pointed out by others. > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: P2 > Fix For: 2.5.0 > > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968788#comment-16968788 ] Amit Kumar commented on BEAM-1438: -- I have also recently seen failure withNumShards(0) for an unbounded source. > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Fix For: 2.5.0 > > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951897#comment-16951897 ] Alexey Romanenko commented on BEAM-1438: Does it actually work with Dataflow for file-based IOs? As I see, every IO that uses {{WriteFiles.wuthNumShards(0)}} will fail for unbounded source because of the check hat Robert mentioned above. Am I mistaken? > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Fix For: 2.5.0 > > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951867#comment-16951867 ] Alexey Romanenko commented on BEAM-1438: +1 to Robert's question. Though, I guess it was only fixed for Dataflow and Flink runners. > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Fix For: 2.5.0 > > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-1438) The default behavior for the Write transform doesn't work well with the Dataflow streaming runner
[ https://issues.apache.org/jira/browse/BEAM-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16951374#comment-16951374 ] Robert Bradshaw commented on BEAM-1438: --- Does this mean that the error at https://github.com/apache/beam/blob/release-2.16.0/sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java#L315 can be removed? > The default behavior for the Write transform doesn't work well with the > Dataflow streaming runner > - > > Key: BEAM-1438 > URL: https://issues.apache.org/jira/browse/BEAM-1438 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Reuven Lax >Assignee: Reuven Lax >Priority: Major > Fix For: 2.5.0 > > > If a Write specifies 0 output shards, that implies the runner should pick an > appropriate sharding. The default behavior is to write one shard per input > bundle. This works well with the Dataflow batch runner, but not with the > streaming runner which produces large numbers of small bundles. -- This message was sent by Atlassian Jira (v8.3.4#803005)