[jira] [Updated] (BEAM-4062) Performance regression in FileBasedSink
[ https://issues.apache.org/jira/browse/BEAM-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-4062: Fix Version/s: 2.5.0 > Performance regression in FileBasedSink > --- > > Key: BEAM-4062 > URL: https://issues.apache.org/jira/browse/BEAM-4062 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.5.0 > > > [https://github.com/apache/beam/pull/4648] has added: > * 3 or more stat() calls per output file (in pre_finalize and > finalize_writes) > * serial unbatched delete()s (in pre_finalize) > Solution will be to list files in a batch operation (match()), and to > delete() in batch mode, or use multiple threads if that's not possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4062) Performance regression in FileBasedSink
[ https://issues.apache.org/jira/browse/BEAM-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-4062: Priority: Blocker (was: Major) > Performance regression in FileBasedSink > --- > > Key: BEAM-4062 > URL: https://issues.apache.org/jira/browse/BEAM-4062 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.5.0 > > > [https://github.com/apache/beam/pull/4648] has added: > * 3 or more stat() calls per output file (in pre_finalize and > finalize_writes) > * serial unbatched delete()s (in pre_finalize) > Solution will be to list files in a batch operation (match()), and to > delete() in batch mode, or use multiple threads if that's not possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4062) Performance regression in FileBasedSink
[ https://issues.apache.org/jira/browse/BEAM-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-4062: Affects Version/s: (was: 2.5.0) > Performance regression in FileBasedSink > --- > > Key: BEAM-4062 > URL: https://issues.apache.org/jira/browse/BEAM-4062 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.5.0 > > > [https://github.com/apache/beam/pull/4648] has added: > * 3 or more stat() calls per output file (in pre_finalize and > finalize_writes) > * serial unbatched delete()s (in pre_finalize) > Solution will be to list files in a batch operation (match()), and to > delete() in batch mode, or use multiple threads if that's not possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-4062) Performance regression in FileBasedSink
[ https://issues.apache.org/jira/browse/BEAM-4062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-4062: Affects Version/s: 2.5.0 > Performance regression in FileBasedSink > --- > > Key: BEAM-4062 > URL: https://issues.apache.org/jira/browse/BEAM-4062 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: Blocker > Fix For: 2.5.0 > > > [https://github.com/apache/beam/pull/4648] has added: > * 3 or more stat() calls per output file (in pre_finalize and > finalize_writes) > * serial unbatched delete()s (in pre_finalize) > Solution will be to list files in a batch operation (match()), and to > delete() in batch mode, or use multiple threads if that's not possible. -- This message was sent by Atlassian JIRA (v7.6.3#76005)