[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=398183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398183 ] ASF GitHub Bot logged work on BEAM-5537: Author: ASF GitHub Bot Created on: 05/Mar/20 07:48 Start Date: 05/Mar/20 07:48 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #11042: [BEAM-5537] Upgrading google-cloud-bigquery to 1.108.0 URL: https://github.com/apache/beam/pull/11042#issuecomment-595076691 Run Java HadoopFormatIO Performance Test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398183) Time Spent: 1.5h (was: 1h 20m) > Beam Dependency Update Request: google-cloud-bigquery > - > > Key: BEAM-5537 > URL: https://issues.apache.org/jira/browse/BEAM-5537 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: 2.9.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > - 2018-10-01 19:15:02.343276 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.5.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:08:29.646271 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:09:25.995486 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:09:52.889923 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:07:44.834195 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.11.2 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-05-27 12:04:51.904457 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.12.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-03 12:02:26.300213 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.13.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-10 12:02:18.370413 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.14.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-17 12:30:09.056719 > -
[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=398184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398184 ] ASF GitHub Bot logged work on BEAM-5537: Author: ASF GitHub Bot Created on: 05/Mar/20 07:48 Start Date: 05/Mar/20 07:48 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #11042: [BEAM-5537] Upgrading google-cloud-bigquery to 1.108.0 URL: https://github.com/apache/beam/pull/11042#issuecomment-595076769 Run BigQueryIO Streaming Performance Test Java This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398184) Time Spent: 1h 40m (was: 1.5h) > Beam Dependency Update Request: google-cloud-bigquery > - > > Key: BEAM-5537 > URL: https://issues.apache.org/jira/browse/BEAM-5537 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: 2.9.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > - 2018-10-01 19:15:02.343276 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.5.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:08:29.646271 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:09:25.995486 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:09:52.889923 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:07:44.834195 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.11.2 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-05-27 12:04:51.904457 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.12.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-03 12:02:26.300213 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.13.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-10 12:02:18.370413 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.14.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-17 12:30:09.056719 >
[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=398186=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398186 ] ASF GitHub Bot logged work on BEAM-5537: Author: ASF GitHub Bot Created on: 05/Mar/20 07:48 Start Date: 05/Mar/20 07:48 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #11042: [BEAM-5537] Upgrading google-cloud-bigquery to 1.108.0 URL: https://github.com/apache/beam/pull/11042#issuecomment-595076867 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398186) Time Spent: 2h (was: 1h 50m) > Beam Dependency Update Request: google-cloud-bigquery > - > > Key: BEAM-5537 > URL: https://issues.apache.org/jira/browse/BEAM-5537 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: 2.9.0 > > Time Spent: 2h > Remaining Estimate: 0h > > - 2018-10-01 19:15:02.343276 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.5.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:08:29.646271 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:09:25.995486 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:09:52.889923 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:07:44.834195 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.11.2 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-05-27 12:04:51.904457 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.12.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-03 12:02:26.300213 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.13.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-10 12:02:18.370413 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.14.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-17 12:30:09.056719 > - > Please
[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=398185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398185 ] ASF GitHub Bot logged work on BEAM-5537: Author: ASF GitHub Bot Created on: 05/Mar/20 07:48 Start Date: 05/Mar/20 07:48 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #11042: [BEAM-5537] Upgrading google-cloud-bigquery to 1.108.0 URL: https://github.com/apache/beam/pull/11042#issuecomment-595076817 Run Dataflow ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398185) Time Spent: 1h 50m (was: 1h 40m) > Beam Dependency Update Request: google-cloud-bigquery > - > > Key: BEAM-5537 > URL: https://issues.apache.org/jira/browse/BEAM-5537 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: 2.9.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > - 2018-10-01 19:15:02.343276 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.5.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:08:29.646271 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:09:25.995486 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:09:52.889923 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:07:44.834195 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.11.2 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-05-27 12:04:51.904457 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.12.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-03 12:02:26.300213 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.13.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-10 12:02:18.370413 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.14.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-17 12:30:09.056719 > - >
[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=398187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398187 ] ASF GitHub Bot logged work on BEAM-5537: Author: ASF GitHub Bot Created on: 05/Mar/20 07:48 Start Date: 05/Mar/20 07:48 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #11042: [BEAM-5537] Upgrading google-cloud-bigquery to 1.108.0 URL: https://github.com/apache/beam/pull/11042#issuecomment-595076897 Run SQL Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398187) Time Spent: 2h 10m (was: 2h) > Beam Dependency Update Request: google-cloud-bigquery > - > > Key: BEAM-5537 > URL: https://issues.apache.org/jira/browse/BEAM-5537 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: 2.9.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > - 2018-10-01 19:15:02.343276 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.5.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:08:29.646271 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:09:25.995486 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:09:52.889923 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:07:44.834195 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.11.2 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-05-27 12:04:51.904457 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.12.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-03 12:02:26.300213 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.13.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-10 12:02:18.370413 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.14.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-17 12:30:09.056719 > - > Please
[jira] [Work logged] (BEAM-5537) Beam Dependency Update Request: google-cloud-bigquery
[ https://issues.apache.org/jira/browse/BEAM-5537?focusedWorklogId=398182=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398182 ] ASF GitHub Bot logged work on BEAM-5537: Author: ASF GitHub Bot Created on: 05/Mar/20 07:47 Start Date: 05/Mar/20 07:47 Worklog Time Spent: 10m Work Description: mwalenia commented on issue #11042: [BEAM-5537] Upgrading google-cloud-bigquery to 1.108.0 URL: https://github.com/apache/beam/pull/11042#issuecomment-595076554 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398182) Time Spent: 1h 20m (was: 1h 10m) > Beam Dependency Update Request: google-cloud-bigquery > - > > Key: BEAM-5537 > URL: https://issues.apache.org/jira/browse/BEAM-5537 > Project: Beam > Issue Type: Bug > Components: dependencies >Reporter: Beam JIRA Bot >Assignee: Tomo Suzuki >Priority: Major > Fix For: 2.9.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > - 2018-10-01 19:15:02.343276 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.5.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-08 12:08:29.646271 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-15 12:09:25.995486 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2018-10-22 12:09:52.889923 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 0.25.0. The latest version is 1.6.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-04-22 12:07:44.834195 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.11.2 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-05-27 12:04:51.904457 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.12.1 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-03 12:02:26.300213 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.13.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-10 12:02:18.370413 > - > Please consider upgrading the dependency google-cloud-bigquery. > The current version is 1.6.1. The latest version is 1.14.0 > cc: [~markflyhigh], > Please refer to [Beam Dependency Guide > |https://beam.apache.org/contribute/dependencies/]for more information. > Do Not Modify The Description Above. > - 2019-06-17 12:30:09.056719 > - > Please
[jira] [Commented] (BEAM-9422) Missing comma in worker_pool_main causing syntax error whenever worker is started
[ https://issues.apache.org/jira/browse/BEAM-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051853#comment-17051853 ] Ismaël Mejía commented on BEAM-9422: [~robertwb] [~altay] can you please check or assign to someone who can. > Missing comma in worker_pool_main causing syntax error whenever worker is > started > - > > Key: BEAM-9422 > URL: https://issues.apache.org/jira/browse/BEAM-9422 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Frederik B >Priority: Critical > Labels: newbie > Original Estimate: 1h > Remaining Estimate: 1h > > Missing comma in worker_pool_main causing syntax error when python worker is > started, resulting in: > "Starting worker with command ['python', '-c', 'from > apache_beam.runners.worker.sdk_worker import SdkHarness; > SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run()'] > File "", line 1 > from apache_beam.runners.worker.sdk_worker import SdkHarness; > SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run() > ^ > SyntaxError: invalid syntax" > > Should be trivial to fix; add comma to line 116 in worker_pool_main.py. > > Note. I am not a Beam contributer, I am merely a user. I would appreciate if > anyone could fix this. Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9422) Missing comma in worker_pool_main causing syntax error whenever worker is started
[ https://issues.apache.org/jira/browse/BEAM-9422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9422: --- Priority: Critical (was: Blocker) > Missing comma in worker_pool_main causing syntax error whenever worker is > started > - > > Key: BEAM-9422 > URL: https://issues.apache.org/jira/browse/BEAM-9422 > Project: Beam > Issue Type: Bug > Components: sdk-py-harness >Reporter: Frederik B >Priority: Critical > Labels: newbie > Original Estimate: 1h > Remaining Estimate: 1h > > Missing comma in worker_pool_main causing syntax error when python worker is > started, resulting in: > "Starting worker with command ['python', '-c', 'from > apache_beam.runners.worker.sdk_worker import SdkHarness; > SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run()'] > File "", line 1 > from apache_beam.runners.worker.sdk_worker import SdkHarness; > SdkHarness("localhost:33083",worker_id="1-1",state_cache_size=0data_buffer_time_limit_ms=-1).run() > ^ > SyntaxError: invalid syntax" > > Should be trivial to fix; add comma to line 116 in worker_pool_main.py. > > Note. I am not a Beam contributer, I am merely a user. I would appreciate if > anyone could fix this. Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9443) support direct_num_workers=0
[ https://issues.apache.org/jira/browse/BEAM-9443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9443: --- Status: Open (was: Triage Needed) > support direct_num_workers=0 > - > > Key: BEAM-9443 > URL: https://issues.apache.org/jira/browse/BEAM-9443 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.21.0 > > > when direct_num_workers=0, set it to number of cores. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9444) Shall we use GCP Libraries BOM to specify Google-related library versions?
[ https://issues.apache.org/jira/browse/BEAM-9444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9444: --- Status: Open (was: Triage Needed) > Shall we use GCP Libraries BOM to specify Google-related library versions? > -- > > Key: BEAM-9444 > URL: https://issues.apache.org/jira/browse/BEAM-9444 > Project: Beam > Issue Type: Task > Components: build-system >Reporter: Tomo Suzuki >Assignee: Tomo Suzuki >Priority: Major > > Shall we use GCP Libraries BOM to specify Google-related library versions? > > I've been working on Beam's dependency upgrades in the past few months. I > think it's time to consider a long-term solution to keep the libraries > up-to-date with small maintenance effort. To achieve that, I propose Beam to > use GCP Libraries BOM to set the Google-related library versions, rather than > trying to make changes in each of ~30 Google libraries. > > h1. Background > A BOM is pom.xml that provides dependencyManagement to importing projects. > > GCP Libraries BOM is a BOM that includes many Google Cloud related libraries > + gRPC + protobuf. We (Google Cloud Java Diamond Dependency team) maintain > the BOM so that the set of the libraries are compatible with each other. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9447) Generalize the InteractiveRunner StreamingCache
[ https://issues.apache.org/jira/browse/BEAM-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-9447: --- Status: Open (was: Triage Needed) > Generalize the InteractiveRunner StreamingCache > --- > > Key: BEAM-9447 > URL: https://issues.apache.org/jira/browse/BEAM-9447 > Project: Beam > Issue Type: Bug > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > > The InteractiveRunner's StreamingCache is only file based for now. This > should be generalized to work across more different source and sink types and > ported to other runners. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398176 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 07:13 Start Date: 05/Mar/20 07:13 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595065101 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398176) Time Spent: 94.5h (was: 94h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9121) Bump vendored calcite to 1.21.0
[ https://issues.apache.org/jira/browse/BEAM-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Jiang resolved BEAM-9121. - Fix Version/s: 2.20.0 Resolution: Won't Fix > Bump vendored calcite to 1.21.0 > --- > > Key: BEAM-9121 > URL: https://issues.apache.org/jira/browse/BEAM-9121 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > Fix For: 2.20.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9121) Bump vendored calcite to 1.21.0
[ https://issues.apache.org/jira/browse/BEAM-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Jiang updated BEAM-9121: Summary: Bump vendored calcite to 1.21.0 (was: Bump vendored calcite to 1.22.0) > Bump vendored calcite to 1.21.0 > --- > > Key: BEAM-9121 > URL: https://issues.apache.org/jira/browse/BEAM-9121 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9121) Bump vendored calcite to 1.22.0
[ https://issues.apache.org/jira/browse/BEAM-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Jiang reassigned BEAM-9121: --- Assignee: Kai Jiang > Bump vendored calcite to 1.22.0 > --- > > Key: BEAM-9121 > URL: https://issues.apache.org/jira/browse/BEAM-9121 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Kai Jiang >Assignee: Kai Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9121) Bump vendored calcite to 1.22.0
[ https://issues.apache.org/jira/browse/BEAM-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Jiang updated BEAM-9121: Summary: Bump vendored calcite to 1.22.0 (was: Bump vendored calcite to 1.21.0) > Bump vendored calcite to 1.22.0 > --- > > Key: BEAM-9121 > URL: https://issues.apache.org/jira/browse/BEAM-9121 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Kai Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9121) Bump vendored calcite to 1.21.0
[ https://issues.apache.org/jira/browse/BEAM-9121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Jiang reassigned BEAM-9121: --- Assignee: (was: Kai Jiang) > Bump vendored calcite to 1.21.0 > --- > > Key: BEAM-9121 > URL: https://issues.apache.org/jira/browse/BEAM-9121 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Kai Jiang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=398169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398169 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 05/Mar/20 06:58 Start Date: 05/Mar/20 06:58 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-595060073 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398169) Time Spent: 4h 20m (was: 4h 10m) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 4h 20m > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398153 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 05/Mar/20 05:49 Start Date: 05/Mar/20 05:49 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#discussion_r388089432 ## File path: sdks/go/pkg/beam/core/graph/fn.go ## @@ -209,21 +209,58 @@ func (f *DoFn) RestrictionT() *reflect.Type { // a KV or not based on the other signatures (unless we're more loose about which // sideinputs are present). Bind should respect that. +// Constants so we can avoid magic numbers in validation. Represent number of +// DoFn main inputs based on what kind of input the DoFn has. +const ( + unknownInNum = -1 // Used when we don't know the number of main inputs. + singleInNum = 1 + kvInNum = 2 +) + // NewDoFn constructs a DoFn from the given value, if possible. func NewDoFn(fn interface{}) (*DoFn, error) { ret, err := NewFn(fn) if err != nil { return nil, errors.WithContext(errors.Wrapf(err, "invalid DoFn"), "constructing DoFn") } - return AsDoFn(ret) + return AsDoFn(ret, unknownInNum) } -// AsDoFn converts a Fn to a DoFn, if possible. -func AsDoFn(fn *Fn) (*DoFn, error) { +// NewDoFnKv constructs a DoFn from the given value, if possible, with +// improved validation from knowing whether the DoFn's main input is a KV or +// single element. +func NewDoFnKv(fn interface{}, mainKv bool) (*DoFn, error) { + ret, err := NewFn(fn) + if err != nil { + return nil, errors.WithContext(errors.Wrapf(err, "invalid DoFn"), "constructing DoFn") + } + + if mainKv { + return AsDoFn(ret, kvInNum) + } else { + return AsDoFn(ret, singleInNum) + } +} + +// AsDoFn converts a Fn to a DoFn, if possible. numMainIn specifies how many +// main inputs are expected in the DoFn's method signatures. Valid values are +// -1 (unknown), 1 (single elements), or 2 (KVs). If the value is unknown then +// validation is done by best effort and may miss some edge cases. +func AsDoFn(fn *Fn, numMainIn int) (*DoFn, error) { addContext := func(err error, fn *Fn) error { return errors.WithContextf(err, "graph.AsDoFn: for Fn named %v", fn.Name()) } + // Validate numMainIn. This check should match this method's comment. + if numMainIn != unknownInNum && + numMainIn != singleInNum && + numMainIn != kvInNum { + err := errors.Errorf("invalid number of main inputs given. "+ + "Got: %v, Want: One of the following: %v", + processElementName, []int{unknownInNum, singleInNum, kvInNum}) + return nil, addContext(err, fn) + } Review comment: I like that much better, done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398153) Time Spent: 2h 20m (was: 2h 10m) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3301) Go SplittableDoFn support
[ https://issues.apache.org/jira/browse/BEAM-3301?focusedWorklogId=398152=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398152 ] ASF GitHub Bot logged work on BEAM-3301: Author: ASF GitHub Bot Created on: 05/Mar/20 05:48 Start Date: 05/Mar/20 05:48 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #10991: [BEAM-3301] Refactor DoFn validation & allow specifying main inputs. URL: https://github.com/apache/beam/pull/10991#discussion_r388089313 ## File path: sdks/go/pkg/beam/core/graph/fn.go ## @@ -209,21 +209,58 @@ func (f *DoFn) RestrictionT() *reflect.Type { // a KV or not based on the other signatures (unless we're more loose about which // sideinputs are present). Bind should respect that. +// Constants so we can avoid magic numbers in validation. Represent number of +// DoFn main inputs based on what kind of input the DoFn has. +const ( + unknownInNum = -1 // Used when we don't know the number of main inputs. + singleInNum = 1 + kvInNum = 2 +) + // NewDoFn constructs a DoFn from the given value, if possible. func NewDoFn(fn interface{}) (*DoFn, error) { ret, err := NewFn(fn) if err != nil { return nil, errors.WithContext(errors.Wrapf(err, "invalid DoFn"), "constructing DoFn") } - return AsDoFn(ret) + return AsDoFn(ret, unknownInNum) } -// AsDoFn converts a Fn to a DoFn, if possible. -func AsDoFn(fn *Fn) (*DoFn, error) { +// NewDoFnKv constructs a DoFn from the given value, if possible, with +// improved validation from knowing whether the DoFn's main input is a KV or +// single element. +func NewDoFnKv(fn interface{}, mainKv bool) (*DoFn, error) { Review comment: Done, went with the variadic options made of functions approach. If anyone else is reading this, based it off this article: https://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398152) Time Spent: 2h 10m (was: 2h) > Go SplittableDoFn support > - > > Key: BEAM-3301 > URL: https://issues.apache.org/jira/browse/BEAM-3301 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Reporter: Henning Rohde >Assignee: Daniel Oliveira >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > SDFs will be the only way to add streaming and liquid sharded IO for Go. > Design doc: https://s.apache.org/splittable-do-fn -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=398145=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398145 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 05/Mar/20 05:26 Start Date: 05/Mar/20 05:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869#issuecomment-595035574 lgtm. thanks @regadas This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398145) Time Spent: 2h 50m (was: 2h 40m) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=398146=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398146 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 05/Mar/20 05:26 Start Date: 05/Mar/20 05:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398146) Time Spent: 3h (was: 2h 50m) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398148=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398148 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 05:26 Start Date: 05/Mar/20 05:26 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-595035694 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398148) Time Spent: 94h 20m (was: 94h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=398142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398142 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 05/Mar/20 05:19 Start Date: 05/Mar/20 05:19 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11049: [BEAM-9288] Not bundle conscrypt in gRPC vendor in META-INF URL: https://github.com/apache/beam/pull/11049#issuecomment-595034134 @lukecwik all green. Please feel free to merge this PR and start vendored gRPC release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398142) Time Spent: 5h 50m (was: 5h 40m) > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398140 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 05:11 Start Date: 05/Mar/20 05:11 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-595032438 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398140) Time Spent: 6h 10m (was: 6h) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h 10m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=398112=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398112 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 05/Mar/20 03:58 Start Date: 05/Mar/20 03:58 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-595016244 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398112) Time Spent: 4h 10m (was: 4h) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 4h 10m > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner
[ https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chamikara Madhusanka Jayalath resolved BEAM-9304. - Resolution: Fixed > beam-sdks-java-io-google-cloud-platform imports conflicting versions for > BigTable and Spanner > - > > Key: BEAM-9304 > URL: https://issues.apache.org/jira/browse/BEAM-9304 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.18.0 >Reporter: Knut Olav Loite >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Fix For: 2.20.0 > > Attachments: SpannerRead.java, pom.xml > > > If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a > project and try to use `SpannerIO`, the exception > `java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems > to be caused by conflicting versions of `io.opencensus:opencensus-api` being > included by the BigTable client and the Spanner client. BigTable imports > version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level > in the dependency tree and BigTable is defined first, version 0.15.0 is used. > > The workaround for this issue is to exclude the BigTable client in the > project pom in order to be able to use SpannerIO. > > An example pom and simple Java class are included. If the commented exclusion > of the BigTable client is removed, the example will run without problems. The > example will also run without problems on Beam version 2.17 without the > exclusion. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner
[ https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051778#comment-17051778 ] Chamikara Madhusanka Jayalath commented on BEAM-9304: - Resolving. [~koloite] please re-open if you think this is not fixed for 2.20.0 somehow. > beam-sdks-java-io-google-cloud-platform imports conflicting versions for > BigTable and Spanner > - > > Key: BEAM-9304 > URL: https://issues.apache.org/jira/browse/BEAM-9304 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.18.0 >Reporter: Knut Olav Loite >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Fix For: 2.20.0 > > Attachments: SpannerRead.java, pom.xml > > > If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a > project and try to use `SpannerIO`, the exception > `java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems > to be caused by conflicting versions of `io.opencensus:opencensus-api` being > included by the BigTable client and the Spanner client. BigTable imports > version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level > in the dependency tree and BigTable is defined first, version 0.15.0 is used. > > The workaround for this issue is to exclude the BigTable client in the > project pom in order to be able to use SpannerIO. > > An example pom and simple Java class are included. If the commented exclusion > of the BigTable client is removed, the example will run without problems. The > example will also run without problems on Beam version 2.17 without the > exclusion. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398076=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398076 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 02:25 Start Date: 05/Mar/20 02:25 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#discussion_r388045249 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -63,6 +63,14 @@ def path(self): """Returns the path the sink leads to.""" return self._path + @property + def size_in_bytes(self): +"""Returns the space usage in bytes of the sink.""" +try: + return os.stat(self._path).st_size +except: + return 0 Review comment: Maybe log here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398076) Time Spent: 94h (was: 93h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398078=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398078 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 02:25 Start Date: 05/Mar/20 02:25 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#discussion_r388045784 ## File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py ## @@ -88,7 +88,22 @@ def capture_duration(self, value): """ self.capture_control._capture_duration = value - # TODO(BEAM-8335): add capture_size options when they are supported. + @property + def capture_size(self): Review comment: Shoulds this have "limit" or "max" in its name, or something to indicate that this is an upper limit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398078) Time Spent: 94h 10m (was: 94h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398080 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 02:25 Start Date: 05/Mar/20 02:25 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#discussion_r388046153 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -259,21 +269,31 @@ def is_source_to_cache_changed( # The computation of extract_unbounded_source_signature is expensive, track on # change by default. if is_changed and update_cached_source_signature: -if ie.current_env().options.enable_capture_replay: - # TODO(BEAM-8335): display rather than logging when is_in_notebook. +options = ie.current_env().options +# No info needed when capture replay is disabled. +if options.enable_capture_replay: + + def sizeof_fmt(num, suffix='B'): Review comment: Should this be moved inside the if below? Is it used anywhere else? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398080) Time Spent: 94h 10m (was: 94h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398079=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398079 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 02:25 Start Date: 05/Mar/20 02:25 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#discussion_r388045279 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -63,6 +63,14 @@ def path(self): """Returns the path the sink leads to.""" return self._path + @property + def size_in_bytes(self): +"""Returns the space usage in bytes of the sink.""" +try: + return os.stat(self._path).st_size Review comment: Does this work in all OSes? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398079) Time Spent: 94h 10m (was: 94h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398077=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398077 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 02:25 Start Date: 05/Mar/20 02:25 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#discussion_r388046234 ## File path: sdks/python/apache_beam/runners/interactive/background_caching_job.py ## @@ -202,10 +202,20 @@ def has_source_to_cache(user_pipeline): if has_cache: if not isinstance(ie.current_env().cache_manager(), streaming_cache.StreamingCache): - # TODO(BEAM-8335): convert the cache manager into a streaming cache - # manager. Note this does not invalidate the current cache including the - # source data capture. - pass + # Wrap the cache manager into a streaming cache manager. Note this + # does not invalidate the current cache manager. + def is_cache_complete(): +job = ie.current_env().get_background_caching_job(user_pipeline) +is_done = job and job.is_done() +cache_changed = is_source_to_cache_changed( +user_pipeline, update_cached_source_signature=False) +return is_done and not cache_changed + + ie.current_env().set_cache_manager( + streaming_cache.StreamingCache( + ie.current_env().cache_manager()._cache_dir, + is_cache_complete=is_cache_complete, Review comment: Do you mean to pass `is_cache_complete()` or `is_cache_complete` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398077) Time Spent: 94h 10m (was: 94h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 94h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9449) Consider passing pipeline options for expansion service.
Robert Bradshaw created BEAM-9449: - Summary: Consider passing pipeline options for expansion service. Key: BEAM-9449 URL: https://issues.apache.org/jira/browse/BEAM-9449 Project: Beam Issue Type: New Feature Components: beam-model Reporter: Robert Bradshaw -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=398070=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398070 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 05/Mar/20 02:04 Start Date: 05/Mar/20 02:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11020: [BEAM-7926] Update Data Visualization URL: https://github.com/apache/beam/pull/11020#issuecomment-594988636 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398070) Time Spent: 55h 20m (was: 55h 10m) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 55h 20m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=398068=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398068 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 05/Mar/20 02:02 Start Date: 05/Mar/20 02:02 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11049: [BEAM-9288] Not bundle conscrypt in gRPC vendor in META-INF URL: https://github.com/apache/beam/pull/11049#issuecomment-594988191 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398068) Time Spent: 5h 40m (was: 5.5h) > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398059 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 01:30 Start Date: 05/Mar/20 01:30 Worklog Time Spent: 10m Work Description: angoenka commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052#issuecomment-594980292 If I understand correctly, this is needed because we don't go though job api for submission. In job api based submission, runner options are automatically pulled from the runner? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398059) Time Spent: 20m (was: 10m) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 20m > Remaining Estimate: 0h > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398055=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398055 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 01:24 Start Date: 05/Mar/20 01:24 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594978741 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398055) Time Spent: 6h (was: 5h 50m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?focusedWorklogId=398053=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398053 ] ASF GitHub Bot logged work on BEAM-9446: Author: ASF GitHub Bot Created on: 05/Mar/20 01:18 Start Date: 05/Mar/20 01:18 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args. URL: https://github.com/apache/beam/pull/11052 I'm not thrilled about manually copying these over, later I might look into a long-term solution to this problem. But this works for now. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=398048=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398048 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 05/Mar/20 01:14 Start Date: 05/Mar/20 01:14 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869#issuecomment-594975916 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398048) Time Spent: 2h 40m (was: 2.5h) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=398047=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398047 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 05/Mar/20 01:12 Start Date: 05/Mar/20 01:12 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-594975611 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398047) Time Spent: 4h (was: 3h 50m) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 4h > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8841) Add ability to perform BigQuery file loads using avro
[ https://issues.apache.org/jira/browse/BEAM-8841?focusedWorklogId=398036=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398036 ] ASF GitHub Bot logged work on BEAM-8841: Author: ASF GitHub Bot Created on: 05/Mar/20 00:55 Start Date: 05/Mar/20 00:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10979: [BEAM-8841] Support writing data to BigQuery via Avro in Python SDK URL: https://github.com/apache/beam/pull/10979#issuecomment-594970995 Run Python 3.6 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398036) Time Spent: 5h 50m (was: 5h 40m) > Add ability to perform BigQuery file loads using avro > - > > Key: BEAM-8841 > URL: https://issues.apache.org/jira/browse/BEAM-8841 > Project: Beam > Issue Type: Improvement > Components: io-py-gcp >Reporter: Chun Yang >Assignee: Chun Yang >Priority: Minor > Time Spent: 5h 50m > Remaining Estimate: 0h > > Currently, JSON format is used for file loads into BigQuery in the Python > SDK. JSON has some disadvantages including size of serialized data and > inability to represent NaN and infinity float values. > BigQuery supports loading files in avro format, which can overcome these > disadvantages. The Java SDK already supports loading files using avro format > (BEAM-2879) so it makes sense to support it in the Python SDK as well. > The change will be somewhere around > [{{BigQueryBatchFileLoads}}|https://github.com/apache/beam/blob/3e7865ee6c6a56e51199515ec5b4b16de1ddd166/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L554]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9321) BigQuery avro write logical type support
[ https://issues.apache.org/jira/browse/BEAM-9321?focusedWorklogId=398035=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398035 ] ASF GitHub Bot logged work on BEAM-9321: Author: ASF GitHub Bot Created on: 05/Mar/20 00:55 Start Date: 05/Mar/20 00:55 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10869: [BEAM-9321] Add BigQuery Avro logical type support on write URL: https://github.com/apache/beam/pull/10869#issuecomment-594970778 Run Java Precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398035) Time Spent: 2.5h (was: 2h 20m) > BigQuery avro write logical type support > > > Key: BEAM-9321 > URL: https://issues.apache.org/jira/browse/BEAM-9321 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.19.0 >Reporter: Filipe Regadas >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > With 2.18.0 we are able to write GenericRecords to BigQuery. However, writing > does not respect Avro <-> BigQuery data type conversion > ([docs|https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-avro#logical_types]) > we need to set the useAvroLogicalTypes option. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9448) Misleading log line: says "downloading" when using cache
[ https://issues.apache.org/jira/browse/BEAM-9448?focusedWorklogId=398033=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398033 ] ASF GitHub Bot logged work on BEAM-9448: Author: ASF GitHub Bot Created on: 05/Mar/20 00:46 Start Date: 05/Mar/20 00:46 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #11051: [BEAM-9448] Fix log message for job server cache. URL: https://github.com/apache/beam/pull/11051 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398032=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398032 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:44 Start Date: 05/Mar/20 00:44 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388018812 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -19,15 +19,298 @@ from __future__ import absolute_import +import itertools +import os +import shutil +import tempfile +import time + +import apache_beam as beam +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.cache_manager import CacheManager +from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -class StreamingCache(object): +class StreamingCacheSink(beam.PTransform): + """A PTransform that writes TestStreamFile(Header|Records)s to file. + + This transform takes in an arbitrary element stream and writes the best-effort + list of TestStream events (as TestStreamFileRecords) to file. + + Note that this PTransform is assumed to be only run on a single machine where + the following assumptions are correct: elements come in ordered, no two + transforms are writing to the same file. This PTransform is assumed to only + run correctly with the DirectRunner. + """ + def __init__( + self, + cache_dir, + filename, + sample_resolution_sec, + coder=SafeFastPrimitivesCoder()): +self._cache_dir = cache_dir +self._filename = filename +self._sample_resolution_sec = sample_resolution_sec +self._coder = coder +self._path = os.path.join(self._cache_dir, self._filename) + + @property + def path(self): +"""Returns the path the sink leads to.""" +return self._path + + def expand(self, pcoll): +class StreamingWriteToText(beam.DoFn): + """DoFn that performs the writing. + + Note that the other file writing methods cannot be used in streaming + contexts. + """ + def __init__(self, full_path, coder=SafeFastPrimitivesCoder()): +self._full_path = full_path +self._coder = coder + +# Try and make the given path. +os.makedirs(os.path.dirname(full_path), exist_ok=True) + + def start_bundle(self): +# Open the file for 'append-mode' and writing 'bytes'. +self._fh = open(self._full_path, 'ab') + + def finish_bundle(self): +self._fh.close() + + def process(self, e): +"""Appends the given element to the file. +""" +self._fh.write(self._coder.encode(e)) +self._fh.write(b'\n') + +return ( +pcoll +| ReverseTestStream( +output_tag=self._filename, +sample_resolution_sec=self._sample_resolution_sec, +output_format=ReverseTestStream.Format. +SERIALIZED_TEST_STREAM_FILE_RECORDS, +coder=self._coder) +| beam.ParDo( +StreamingWriteToText(full_path=self._path, coder=self._coder))) + + +class StreamingCacheSource: + """A class that reads and parses TestStreamFile(Header|Reader)s. + + This source operates in the following way: + +1. Wait for up to `timeout_secs` for the file to be available. +2. Read, parse, and emit the entire contents of the file +3. Wait for more events to come or until `is_cache_complete` returns True +4. If there are more events, then go to 2 +5. Otherwise, stop emitting. + + This class is used to read from file and send its to the TestStream via the + StreamingCacheManager.Reader. + """ + def __init__( + self, + cache_dir, + labels, + is_cache_complete=None, + coder=SafeFastPrimitivesCoder()): +self._cache_dir = cache_dir +self._coder = coder +self._labels = labels +self._is_cache_complete = ( +is_cache_complete if is_cache_complete else lambda: True) + + def _wait_until_file_exists(self, timeout_secs=30): +"""Blocks until the file exists for a maximum of timeout_secs. +""" +f = None +now_secs = time.time() +timeout_timestamp_secs = now_secs + timeout_secs + +# Wait for up to `timeout_secs` for the file to be available. +while f is None and now_secs < timeout_timestamp_secs: + now_secs = time.time() + try: +path =
[jira] [Created] (BEAM-9448) Misleading log line: says "downloading" when using cache
Kyle Weaver created BEAM-9448: - Summary: Misleading log line: says "downloading" when using cache Key: BEAM-9448 URL: https://issues.apache.org/jira/browse/BEAM-9448 Project: Beam Issue Type: Bug Components: runner-flink Reporter: Kyle Weaver Assignee: Kyle Weaver https://github.com/apache/beam/blob/8d253ac99d78ef5345245ed71c7cf34328c55d9f/sdks/python/apache_beam/utils/subprocess_server.py#L197 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398031=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398031 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:43 Start Date: 05/Mar/20 00:43 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#issuecomment-594967107 R: @rohdesamuel R: @aaltay Data captured from sources are stored in cache just like intermediate PCollections that are assigned to variables. The capture_size limit is only applied to disk usage of data captured from sources. The implementation of getting a capture cache file's size `os.stat(self._path).st_size` The implementation of summing up all capture cache file's sizes `sum([sink.size_in_bytes for _, sink in self._capture_sinks.items()])` They both locate in [this](https://github.com/apache/beam/pull/11050/commits/a6d9e2382eeea148b3f667726f8e8e8933a7196c#diff-e15d1558a3154511b759ef711deeaddb) change. Everything else is wiring, logging and testing. The first commit is a patch from Sam's ongoing PR, there is no need to review diff of it. Please directly review [diff](https://github.com/apache/beam/pull/11050/commits/a6d9e2382eeea148b3f667726f8e8e8933a7196c) of the second commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398031) Time Spent: 93h 40m (was: 93.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 93h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398030=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398030 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:43 Start Date: 05/Mar/20 00:43 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388018492 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -19,15 +19,298 @@ from __future__ import absolute_import +import itertools +import os +import shutil +import tempfile +import time + +import apache_beam as beam +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.cache_manager import CacheManager +from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -class StreamingCache(object): +class StreamingCacheSink(beam.PTransform): + """A PTransform that writes TestStreamFile(Header|Records)s to file. + + This transform takes in an arbitrary element stream and writes the best-effort + list of TestStream events (as TestStreamFileRecords) to file. + + Note that this PTransform is assumed to be only run on a single machine where + the following assumptions are correct: elements come in ordered, no two + transforms are writing to the same file. This PTransform is assumed to only + run correctly with the DirectRunner. Review comment: Done, made [BEAM-9447](https://issues.apache.org/jira/browse/BEAM-9447) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398030) Time Spent: 93.5h (was: 93h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 93.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9448) Misleading log line: says "downloading" when using cache
[ https://issues.apache.org/jira/browse/BEAM-9448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9448: -- Status: Open (was: Triage Needed) > Misleading log line: says "downloading" when using cache > > > Key: BEAM-9448 > URL: https://issues.apache.org/jira/browse/BEAM-9448 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Trivial > Labels: portability-flink > > https://github.com/apache/beam/blob/8d253ac99d78ef5345245ed71c7cf34328c55d9f/sdks/python/apache_beam/utils/subprocess_server.py#L197 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398028=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398028 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:42 Start Date: 05/Mar/20 00:42 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#issuecomment-594967107 R: @rohdesamuel R: @aaltay Data captured from sources are stored in cache just like intermediate PCollections that are assigned to variables. The capture_size limit is only applied to disk usage of data captured from sources. The implementation of getting a capture cache file's size `os.stat(self._path).st_size` The implementation of summing up all capture cache file's sizes `sum([sink.size_in_bytes for _, sink in self._capture_sinks.items()])` They both locate in [this](https://github.com/apache/beam/pull/11050/commits/a6d9e2382eeea148b3f667726f8e8e8933a7196c#diff-e15d1558a3154511b759ef711deeaddb) change. Everything else is wiring, logging and testing. The first commit is a patch from Sam's ongoing PR, there is no need to review diff of it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398028) Time Spent: 93h 10m (was: 93h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 93h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398029=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398029 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:42 Start Date: 05/Mar/20 00:42 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388018492 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -19,15 +19,298 @@ from __future__ import absolute_import +import itertools +import os +import shutil +import tempfile +import time + +import apache_beam as beam +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.cache_manager import CacheManager +from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -class StreamingCache(object): +class StreamingCacheSink(beam.PTransform): + """A PTransform that writes TestStreamFile(Header|Records)s to file. + + This transform takes in an arbitrary element stream and writes the best-effort + list of TestStream events (as TestStreamFileRecords) to file. + + Note that this PTransform is assumed to be only run on a single machine where + the following assumptions are correct: elements come in ordered, no two + transforms are writing to the same file. This PTransform is assumed to only + run correctly with the DirectRunner. Review comment: Done, made BEAM-9447 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398029) Time Spent: 93h 20m (was: 93h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 93h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398027=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398027 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:41 Start Date: 05/Mar/20 00:41 Worklog Time Spent: 10m Work Description: KevinGG commented on issue #11050: [BEAM-8335] Implemented Capture Size limitation URL: https://github.com/apache/beam/pull/11050#issuecomment-594967107 R: @rohdesamuel R: @aaltay Data captured from sources are stored in cache just like intermediate PCollections that are assigned to variables. The capture_size limit is only applied to disk usage of data captured from sources. The implementation of getting a capture cache file's size `os.stat(self._path).st_size` The implementation of summing up all capture cache file's sizes `sum([sink.size_in_bytes for _, sink in self._capture_sinks.items()])` Everything else is wiring, logging and testing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398027) Time Spent: 93h (was: 92h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 93h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9447) Generalize the InteractiveRunner StreamingCache
[ https://issues.apache.org/jira/browse/BEAM-9447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Rohde reassigned BEAM-9447: --- Assignee: Sam Rohde > Generalize the InteractiveRunner StreamingCache > --- > > Key: BEAM-9447 > URL: https://issues.apache.org/jira/browse/BEAM-9447 > Project: Beam > Issue Type: Bug > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > > The InteractiveRunner's StreamingCache is only file based for now. This > should be generalized to work across more different source and sink types and > ported to other runners. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9447) Generalize the InteractiveRunner StreamingCache
Sam Rohde created BEAM-9447: --- Summary: Generalize the InteractiveRunner StreamingCache Key: BEAM-9447 URL: https://issues.apache.org/jira/browse/BEAM-9447 Project: Beam Issue Type: Bug Components: runner-py-interactive Reporter: Sam Rohde The InteractiveRunner's StreamingCache is only file based for now. This should be generalized to work across more different source and sink types and ported to other runners. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
[ https://issues.apache.org/jira/browse/BEAM-9446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kyle Weaver updated BEAM-9446: -- Status: Open (was: Triage Needed) > FlinkRunner discards parallelism and execution_mode_for_batch pipeline options > -- > > Key: BEAM-9446 > URL: https://issues.apache.org/jira/browse/BEAM-9446 > Project: Beam > Issue Type: Bug > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > > I need these options for TFX, but they're being discarded (I believe they are > normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9446) FlinkRunner discards parallelism and execution_mode_for_batch pipeline options
Kyle Weaver created BEAM-9446: - Summary: FlinkRunner discards parallelism and execution_mode_for_batch pipeline options Key: BEAM-9446 URL: https://issues.apache.org/jira/browse/BEAM-9446 Project: Beam Issue Type: Bug Components: runner-flink Reporter: Kyle Weaver Assignee: Kyle Weaver I need these options for TFX, but they're being discarded (I believe they are normally supplied by the job server). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=398021=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398021 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 05/Mar/20 00:32 Start Date: 05/Mar/20 00:32 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-594964330 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398021) Time Spent: 3h 50m (was: 3h 40m) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3h 50m > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9360) Schema FieldType should not consider metadata for equivalence
[ https://issues.apache.org/jira/browse/BEAM-9360?focusedWorklogId=398018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398018 ] ASF GitHub Bot logged work on BEAM-9360: Author: ASF GitHub Bot Created on: 05/Mar/20 00:24 Start Date: 05/Mar/20 00:24 Worklog Time Spent: 10m Work Description: alexvanboxel commented on pull request #10943: [BEAM-9360] Fix equivalence check for FieldType URL: https://github.com/apache/beam/pull/10943 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398018) Time Spent: 1h (was: 50m) > Schema FieldType should not consider metadata for equivalence > - > > Key: BEAM-9360 > URL: https://issues.apache.org/jira/browse/BEAM-9360 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.19.0 >Reporter: Jozef Vilcek >Assignee: Jozef Vilcek >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > FieldType `equivalent()` check should not require exact match in fields > metadata. > Discussion in dev mailing list: > [https://lists.apache.org/list.html?d...@beam.apache.org:lte=1M:Schema%20Convert%20transform%20fails%20on%20type%20metadata] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8898) Enable WriteToBigQuery to perform range partitioning
[ https://issues.apache.org/jira/browse/BEAM-8898?focusedWorklogId=398016=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398016 ] ASF GitHub Bot logged work on BEAM-8898: Author: ASF GitHub Bot Created on: 05/Mar/20 00:20 Start Date: 05/Mar/20 00:20 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10319: [BEAM-8898] Upgrade Python google-cloud-bigquery dependency URL: https://github.com/apache/beam/pull/10319#issuecomment-594959312 this is an old PR that had postcommit errors, so I didn't move it forward. Someone else ended up upgrading the dependency in master, so this is no longer needed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398016) Time Spent: 2h 40m (was: 2.5h) > Enable WriteToBigQuery to perform range partitioning > > > Key: BEAM-8898 > URL: https://issues.apache.org/jira/browse/BEAM-8898 > Project: Beam > Issue Type: New Feature > Components: io-py-gcp >Reporter: Saman Vaisipour >Assignee: Pablo Estrada >Priority: Minor > Time Spent: 2h 40m > Remaining Estimate: 0h > > BigQuery team recently [released > 1.22.0|https://github.com/googleapis/google-cloud-python/releases/tag/bigquery-1.22.0] > which includes the range partitioning feature (here is [an > example|https://github.com/googleapis/google-cloud-python/blob/c4a69d44ccea9635b3d9d316b3f545f16538dafe/bigquery/samples/create_table_range_partitioned.py]). > > WriteToBigQuery uses > [`additional_bq_parameters`|https://github.com/apache/beam/blob/c1719476b74ec6f68fabea392087607adafc70ef/sdks/python/apache_beam/io/gcp/bigquery.py#L177] > to create tables with date partitioning and clustering. It would be great if > the same would be possible to create tables with range partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398014=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398014 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:13 Start Date: 05/Mar/20 00:13 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388009916 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -19,15 +19,298 @@ from __future__ import absolute_import +import itertools +import os +import shutil +import tempfile +import time + +import apache_beam as beam +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.cache_manager import CacheManager +from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -class StreamingCache(object): +class StreamingCacheSink(beam.PTransform): + """A PTransform that writes TestStreamFile(Header|Records)s to file. + + This transform takes in an arbitrary element stream and writes the best-effort + list of TestStream events (as TestStreamFileRecords) to file. + + Note that this PTransform is assumed to be only run on a single machine where + the following assumptions are correct: elements come in ordered, no two + transforms are writing to the same file. This PTransform is assumed to only + run correctly with the DirectRunner. + """ + def __init__( + self, + cache_dir, + filename, + sample_resolution_sec, + coder=SafeFastPrimitivesCoder()): +self._cache_dir = cache_dir +self._filename = filename +self._sample_resolution_sec = sample_resolution_sec +self._coder = coder +self._path = os.path.join(self._cache_dir, self._filename) + + @property + def path(self): +"""Returns the path the sink leads to.""" +return self._path + + def expand(self, pcoll): +class StreamingWriteToText(beam.DoFn): + """DoFn that performs the writing. + + Note that the other file writing methods cannot be used in streaming + contexts. + """ + def __init__(self, full_path, coder=SafeFastPrimitivesCoder()): +self._full_path = full_path +self._coder = coder + +# Try and make the given path. +os.makedirs(os.path.dirname(full_path), exist_ok=True) + + def start_bundle(self): +# Open the file for 'append-mode' and writing 'bytes'. +self._fh = open(self._full_path, 'ab') + + def finish_bundle(self): +self._fh.close() + + def process(self, e): +"""Appends the given element to the file. +""" +self._fh.write(self._coder.encode(e)) +self._fh.write(b'\n') + +return ( +pcoll +| ReverseTestStream( +output_tag=self._filename, +sample_resolution_sec=self._sample_resolution_sec, +output_format=ReverseTestStream.Format. +SERIALIZED_TEST_STREAM_FILE_RECORDS, +coder=self._coder) +| beam.ParDo( +StreamingWriteToText(full_path=self._path, coder=self._coder))) + + +class StreamingCacheSource: + """A class that reads and parses TestStreamFile(Header|Reader)s. + + This source operates in the following way: + +1. Wait for up to `timeout_secs` for the file to be available. +2. Read, parse, and emit the entire contents of the file +3. Wait for more events to come or until `is_cache_complete` returns True +4. If there are more events, then go to 2 +5. Otherwise, stop emitting. + + This class is used to read from file and send its to the TestStream via the + StreamingCacheManager.Reader. + """ + def __init__( + self, + cache_dir, + labels, + is_cache_complete=None, + coder=SafeFastPrimitivesCoder()): +self._cache_dir = cache_dir +self._coder = coder +self._labels = labels +self._is_cache_complete = ( +is_cache_complete if is_cache_complete else lambda: True) + + def _wait_until_file_exists(self, timeout_secs=30): +"""Blocks until the file exists for a maximum of timeout_secs. +""" +f = None +now_secs = time.time() +timeout_timestamp_secs = now_secs + timeout_secs + +# Wait for up to `timeout_secs` for the file to be available. +while f is None and now_secs < timeout_timestamp_secs: + now_secs = time.time() + try: +path =
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398015=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398015 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:13 Start Date: 05/Mar/20 00:13 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388010028 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -19,15 +19,298 @@ from __future__ import absolute_import +import itertools +import os +import shutil +import tempfile +import time + +import apache_beam as beam +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.cache_manager import CacheManager +from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -class StreamingCache(object): +class StreamingCacheSink(beam.PTransform): + """A PTransform that writes TestStreamFile(Header|Records)s to file. + + This transform takes in an arbitrary element stream and writes the best-effort + list of TestStream events (as TestStreamFileRecords) to file. + + Note that this PTransform is assumed to be only run on a single machine where + the following assumptions are correct: elements come in ordered, no two + transforms are writing to the same file. This PTransform is assumed to only + run correctly with the DirectRunner. + """ + def __init__( + self, + cache_dir, + filename, + sample_resolution_sec, + coder=SafeFastPrimitivesCoder()): +self._cache_dir = cache_dir +self._filename = filename +self._sample_resolution_sec = sample_resolution_sec +self._coder = coder +self._path = os.path.join(self._cache_dir, self._filename) + + @property + def path(self): +"""Returns the path the sink leads to.""" +return self._path + + def expand(self, pcoll): +class StreamingWriteToText(beam.DoFn): + """DoFn that performs the writing. + + Note that the other file writing methods cannot be used in streaming + contexts. + """ + def __init__(self, full_path, coder=SafeFastPrimitivesCoder()): +self._full_path = full_path +self._coder = coder + +# Try and make the given path. +os.makedirs(os.path.dirname(full_path), exist_ok=True) + + def start_bundle(self): +# Open the file for 'append-mode' and writing 'bytes'. +self._fh = open(self._full_path, 'ab') + + def finish_bundle(self): +self._fh.close() + + def process(self, e): +"""Appends the given element to the file. +""" +self._fh.write(self._coder.encode(e)) +self._fh.write(b'\n') + +return ( +pcoll +| ReverseTestStream( +output_tag=self._filename, +sample_resolution_sec=self._sample_resolution_sec, +output_format=ReverseTestStream.Format. +SERIALIZED_TEST_STREAM_FILE_RECORDS, +coder=self._coder) +| beam.ParDo( +StreamingWriteToText(full_path=self._path, coder=self._coder))) + + +class StreamingCacheSource: + """A class that reads and parses TestStreamFile(Header|Reader)s. + + This source operates in the following way: + +1. Wait for up to `timeout_secs` for the file to be available. +2. Read, parse, and emit the entire contents of the file +3. Wait for more events to come or until `is_cache_complete` returns True +4. If there are more events, then go to 2 +5. Otherwise, stop emitting. + + This class is used to read from file and send its to the TestStream via the + StreamingCacheManager.Reader. + """ + def __init__( + self, + cache_dir, + labels, + is_cache_complete=None, + coder=SafeFastPrimitivesCoder()): +self._cache_dir = cache_dir +self._coder = coder +self._labels = labels +self._is_cache_complete = ( +is_cache_complete if is_cache_complete else lambda: True) + + def _wait_until_file_exists(self, timeout_secs=30): +"""Blocks until the file exists for a maximum of timeout_secs. +""" +f = None +now_secs = time.time() +timeout_timestamp_secs = now_secs + timeout_secs + +# Wait for up to `timeout_secs` for the file to be available. +while f is None and now_secs < timeout_timestamp_secs: + now_secs = time.time() + try: +path =
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398013 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:12 Start Date: 05/Mar/20 00:12 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388009717 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -19,15 +19,298 @@ from __future__ import absolute_import +import itertools +import os +import shutil +import tempfile +import time + +import apache_beam as beam +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileHeader +from apache_beam.portability.api.beam_interactive_api_pb2 import TestStreamFileRecord from apache_beam.portability.api.beam_runner_api_pb2 import TestStreamPayload +from apache_beam.runners.interactive.cache_manager import CacheManager +from apache_beam.runners.interactive.cache_manager import SafeFastPrimitivesCoder +from apache_beam.testing.test_stream import ReverseTestStream from apache_beam.utils import timestamp -class StreamingCache(object): +class StreamingCacheSink(beam.PTransform): + """A PTransform that writes TestStreamFile(Header|Records)s to file. + + This transform takes in an arbitrary element stream and writes the best-effort Review comment: Clarified in the comment that it is best-effort replay not best-effort writing. The original comment was confusing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398013) Time Spent: 92.5h (was: 92h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 92.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398012=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398012 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:12 Start Date: 05/Mar/20 00:12 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388009673 ## File path: sdks/python/apache_beam/runners/interactive/cache_manager.py ## @@ -167,20 +177,35 @@ def load_pcoder(self, *labels): self._saved_pcoders[self._path(*labels)]) def read(self, *labels): +# Return an iterator to an empty list if it doesn't exist. if not self.exists(*labels): - return [], -1 + return [].__iter__(), -1 -source = self.source(*labels) +# Otherwise, return a generator to the cached PCollection. +source = self._source(*labels) range_tracker = source.get_range_tracker(None, None) -result = list(source.read(range_tracker)) +reader = source.read(range_tracker) version = self._latest_version(*labels) -return result, version +return reader, version + + def write(self, values, *labels): +sink = self._sink(*labels) +path = self._path(*labels) +with open(path, 'wb') as f: + for v in values: +sink.write_record(f, v) Review comment: Cool, thank you. Changed to the above code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398012) Time Spent: 92h 20m (was: 92h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 92h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=398010=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398010 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 05/Mar/20 00:11 Start Date: 05/Mar/20 00:11 Worklog Time Spent: 10m Work Description: alexvanboxel commented on pull request #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#discussion_r388009528 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java ## @@ -953,6 +1012,338 @@ public int hashCode() { } } + public static class Options implements Serializable { +private Map options; + +@Override +public String toString() { + TreeMap sorted = new TreeMap(options); + return "{" + sorted + '}'; +} + +Map getAllOptions() { + return options; +} + +public Set getOptionNames() { + return options.keySet(); +} + +public boolean hasOptions() { + return options.size() > 0; +} + +@Override +public boolean equals(Object o) { + if (this == o) { +return true; + } + if (o == null || getClass() != o.getClass()) { +return false; + } + Options options1 = (Options) o; + if (!options.keySet().equals(options1.options.keySet())) { +return false; + } + for (Map.Entry optionEntry : options.entrySet()) { +Option thisOption = optionEntry.getValue(); +Option otherOption = options1.options.get(optionEntry.getKey()); +if (!thisOption.getType().equals(otherOption.getType())) { + return false; +} +switch (thisOption.getType().getTypeName()) { + case BYTE: + case INT16: + case INT32: + case INT64: + case DECIMAL: + case FLOAT: + case DOUBLE: + case STRING: + case DATETIME: + case BOOLEAN: + case ARRAY: + case ITERABLE: + case MAP: + case ROW: + case LOGICAL_TYPE: +if (!thisOption.getValue().equals(otherOption.getValue())) { + return false; +} +break; + case BYTES: +if (!Arrays.equals((byte[]) thisOption.getValue(), otherOption.getValue())) { + return false; +} +} + } + return true; +} + +@Override +public int hashCode() { + return Objects.hash(options); +} + +static class Option implements Serializable { + Option(FieldType type, Object value) { +this.type = type; +this.value = value; + } + + private FieldType type; + private Object value; + + @SuppressWarnings("TypeParameterUnusedInFormals") + T getValue() { +return (T) value; + } + + FieldType getType() { +return type; + } + + @Override + public String toString() { +return "Option{type=" + type + ", value=" + value + '}'; + } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} +if (o == null || getClass() != o.getClass()) { + return false; +} +Option option = (Option) o; +return Objects.equals(type, option.type) && Objects.equals(value, option.value); + } + + @Override + public int hashCode() { +return Objects.hash(type, value); + } +} + +public static class Builder { + private Map options; + + Builder(Map init) { +this.options = new HashMap<>(init); + } + + Builder() { +this(new HashMap<>()); + } + + public Builder setByteOption(String optionName, Byte value) { +setOption(optionName, FieldType.BYTE, value); +return this; + } + + public Builder setBytesOption(String optionName, byte[] value) { +setOption(optionName, FieldType.BYTES, value); +return this; + } + + public Builder setInt16Option(String optionName, Short value) { +setOption(optionName, FieldType.INT16, value); +return this; + } + + public Builder setInt32Option(String optionName, Integer value) { +setOption(optionName, FieldType.INT32, value); +return this; + } + + public Builder setInt64Option(String optionName, Long value) { +setOption(optionName, FieldType.INT64, value); +return this; + } + + public Builder setDecimalOption(String optionName, BigDecimal value) { +setOption(optionName, FieldType.DECIMAL, value); +return this; + } + + public Builder setFloatOption(String optionName, Float value) { +setOption(optionName, FieldType.FLOAT, value); +return this; + } + +
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=398011=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398011 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 05/Mar/20 00:11 Start Date: 05/Mar/20 00:11 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11005: [BEAM-8335] Modify the StreamingCache to subclass the CacheManager URL: https://github.com/apache/beam/pull/11005#discussion_r388009579 ## File path: sdks/python/apache_beam/runners/interactive/cache_manager.py ## @@ -167,20 +177,35 @@ def load_pcoder(self, *labels): self._saved_pcoders[self._path(*labels)]) def read(self, *labels): +# Return an iterator to an empty list if it doesn't exist. if not self.exists(*labels): - return [], -1 + return [].__iter__(), -1 Review comment: Gotcha, didn't know about this. Changed to `iter([])` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398011) Time Spent: 92h 10m (was: 92h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 92h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=398007=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398007 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 05/Mar/20 00:10 Start Date: 05/Mar/20 00:10 Worklog Time Spent: 10m Work Description: alexvanboxel commented on pull request #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#discussion_r388008991 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java ## @@ -953,6 +1012,338 @@ public int hashCode() { } } + public static class Options implements Serializable { +private Map options; + +@Override +public String toString() { + TreeMap sorted = new TreeMap(options); + return "{" + sorted + '}'; +} + +Map getAllOptions() { + return options; +} + +public Set getOptionNames() { + return options.keySet(); +} + +public boolean hasOptions() { + return options.size() > 0; +} + +@Override +public boolean equals(Object o) { + if (this == o) { +return true; + } + if (o == null || getClass() != o.getClass()) { +return false; + } + Options options1 = (Options) o; + if (!options.keySet().equals(options1.options.keySet())) { +return false; + } + for (Map.Entry optionEntry : options.entrySet()) { +Option thisOption = optionEntry.getValue(); +Option otherOption = options1.options.get(optionEntry.getKey()); +if (!thisOption.getType().equals(otherOption.getType())) { + return false; +} +switch (thisOption.getType().getTypeName()) { + case BYTE: + case INT16: + case INT32: + case INT64: + case DECIMAL: + case FLOAT: + case DOUBLE: + case STRING: + case DATETIME: + case BOOLEAN: + case ARRAY: + case ITERABLE: + case MAP: + case ROW: + case LOGICAL_TYPE: +if (!thisOption.getValue().equals(otherOption.getValue())) { + return false; +} +break; + case BYTES: +if (!Arrays.equals((byte[]) thisOption.getValue(), otherOption.getValue())) { + return false; +} +} + } + return true; +} + +@Override +public int hashCode() { + return Objects.hash(options); +} + +static class Option implements Serializable { + Option(FieldType type, Object value) { +this.type = type; +this.value = value; + } + + private FieldType type; + private Object value; + + @SuppressWarnings("TypeParameterUnusedInFormals") + T getValue() { +return (T) value; + } + + FieldType getType() { +return type; + } + + @Override + public String toString() { +return "Option{type=" + type + ", value=" + value + '}'; + } + + @Override + public boolean equals(Object o) { +if (this == o) { + return true; +} +if (o == null || getClass() != o.getClass()) { + return false; +} +Option option = (Option) o; +return Objects.equals(type, option.type) && Objects.equals(value, option.value); + } + + @Override + public int hashCode() { +return Objects.hash(type, value); + } +} + +public static class Builder { + private Map options; + + Builder(Map init) { +this.options = new HashMap<>(init); + } + + Builder() { +this(new HashMap<>()); + } + + public Builder setByteOption(String optionName, Byte value) { Review comment: Thru, I can remove this. Most people will just use the getters anyway. The setters will be used by proto/avro/ etc implementations. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398007) Time Spent: 5h 20m (was: 5h 10m) > BIP-1: Typed options for Row Schema and Fields > -- > > Key: BEAM-9035 > URL: https://issues.apache.org/jira/browse/BEAM-9035 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > Fix For: 2.19.0 > > Time Spent: 5h 20m >
[jira] [Work logged] (BEAM-9250) Improve beam release script based on 2.19.0 release experience
[ https://issues.apache.org/jira/browse/BEAM-9250?focusedWorklogId=398005=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-398005 ] ASF GitHub Bot logged work on BEAM-9250: Author: ASF GitHub Bot Created on: 05/Mar/20 00:08 Start Date: 05/Mar/20 00:08 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #10782: [BEAM-9250] Update verify_release_build script to run python tests with dev version. URL: https://github.com/apache/beam/pull/10782 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 398005) Time Spent: 2h 10m (was: 2h) > Improve beam release script based on 2.19.0 release experience > -- > > Key: BEAM-9250 > URL: https://issues.apache.org/jira/browse/BEAM-9250 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Fix For: Not applicable > > Time Spent: 2h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=397999=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397999 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 05/Mar/20 00:05 Start Date: 05/Mar/20 00:05 Worklog Time Spent: 10m Work Description: alexvanboxel commented on pull request #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#discussion_r388007611 ## File path: sdks/java/core/src/main/java/org/apache/beam/sdk/values/SchemaVerification.java ## @@ -0,0 +1,255 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ Review comment: Because validation is used by both setOption as the Row builder. It's the same validation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397999) Time Spent: 5h 10m (was: 5h) > BIP-1: Typed options for Row Schema and Fields > -- > > Key: BEAM-9035 > URL: https://issues.apache.org/jira/browse/BEAM-9035 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > Fix For: 2.19.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > This is the first issue of a multipart commit: this ticket implements the > basic infrastructure of options on row and field. > Full explanation: > Introduce the concept of Options in Beam Schema’s to add extra context to > fields and schema. In contracts to metadata, options would be added to > fields, logical types and rows. In the options schema convertors can add > options/annotations/decorators that were in the original schema, this context > can be used in the rest of the pipeline for specific transformations or > augment the end schema in the target output. > Examples of options are: > * informational: like the source of the data, ... > * drive decisions further in the pipeline: flatten a row into another, > rename a field, ... > * influence something in the output: like cluster index, primary key, ... > * logical type information > And option is a key/typed value combination. The advantages of having the > value types is: > * Having strongly typed options would give a *portable way of Logical Types* > to have structured information that could be shared over different languages. > * This could keep the type intact when mapping from a formats that have > strongly typed options (example: Protobuf). > This is part of a multi ticket implementation. The following tickets are > related: > # Typed options for Row Schema and Fields > # Convert Proto Options to Beam Schema options > # Convert Avro extra information for Beam string options > # Replace meta data with Logical Type options > # Extract meta data in Calcite SQL to Beam options > # Extract meta data in Zeta SQL to Beam options > # Add java example of using option in a transform > This feature is discussed with Reuven Lax, Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9035) BIP-1: Typed options for Row Schema and Fields
[ https://issues.apache.org/jira/browse/BEAM-9035?focusedWorklogId=397998=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397998 ] ASF GitHub Bot logged work on BEAM-9035: Author: ASF GitHub Bot Created on: 05/Mar/20 00:04 Start Date: 05/Mar/20 00:04 Worklog Time Spent: 10m Work Description: alexvanboxel commented on pull request #10413: [BEAM-9035] Typed options for Row Schema and Field URL: https://github.com/apache/beam/pull/10413#discussion_r388007440 ## File path: model/pipeline/src/main/proto/schema.proto ## @@ -31,6 +31,7 @@ option java_outer_classname = "SchemaApi"; message Schema { repeated Field fields = 1; string id = 2; + map options = 3; Review comment: Looking back to the history I see that it was a repeated option. I don't have a preference. I can set it back to repeated option. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397998) Time Spent: 5h (was: 4h 50m) > BIP-1: Typed options for Row Schema and Fields > -- > > Key: BEAM-9035 > URL: https://issues.apache.org/jira/browse/BEAM-9035 > Project: Beam > Issue Type: Sub-task > Components: sdk-java-core >Reporter: Alex Van Boxel >Assignee: Alex Van Boxel >Priority: Major > Fix For: 2.19.0 > > Time Spent: 5h > Remaining Estimate: 0h > > This is the first issue of a multipart commit: this ticket implements the > basic infrastructure of options on row and field. > Full explanation: > Introduce the concept of Options in Beam Schema’s to add extra context to > fields and schema. In contracts to metadata, options would be added to > fields, logical types and rows. In the options schema convertors can add > options/annotations/decorators that were in the original schema, this context > can be used in the rest of the pipeline for specific transformations or > augment the end schema in the target output. > Examples of options are: > * informational: like the source of the data, ... > * drive decisions further in the pipeline: flatten a row into another, > rename a field, ... > * influence something in the output: like cluster index, primary key, ... > * logical type information > And option is a key/typed value combination. The advantages of having the > value types is: > * Having strongly typed options would give a *portable way of Logical Types* > to have structured information that could be shared over different languages. > * This could keep the type intact when mapping from a formats that have > strongly typed options (example: Protobuf). > This is part of a multi ticket implementation. The following tickets are > related: > # Typed options for Row Schema and Fields > # Convert Proto Options to Beam Schema options > # Convert Avro extra information for Beam string options > # Replace meta data with Logical Type options > # Extract meta data in Calcite SQL to Beam options > # Extract meta data in Zeta SQL to Beam options > # Add java example of using option in a transform > This feature is discussed with Reuven Lax, Brian Hulette -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397995=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397995 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 23:59 Start Date: 04/Mar/20 23:59 Worklog Time Spent: 10m Work Description: pabloem commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-594948530 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397995) Time Spent: 92h (was: 91h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 92h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9418) Support ANY_VALUE aggregation functions
[ https://issues.apache.org/jira/browse/BEAM-9418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated BEAM-9418: --- Status: Open (was: Triage Needed) > Support ANY_VALUE aggregation functions > --- > > Key: BEAM-9418 > URL: https://issues.apache.org/jira/browse/BEAM-9418 > Project: Beam > Issue Type: Task > Components: dsl-sql >Reporter: Rui Wang >Priority: Major > > Support the following functionality in BeamSQL: > {code:java} > "select t.key, ANY_VALUE(t.column) from t group by t.key"; > {code} > Spec link: > https://cloud.google.com/bigquery/docs/reference/standard-sql/aggregate_functions#any_value -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=397994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397994 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 04/Mar/20 23:55 Start Date: 04/Mar/20 23:55 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11049: [BEAM-9288] Not bundle conscrypt in gRPC vendor in META-INF URL: https://github.com/apache/beam/pull/11049#issuecomment-594946667 Run CommunityMetrics PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397994) Time Spent: 5.5h (was: 5h 20m) > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=397980=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397980 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 04/Mar/20 23:40 Start Date: 04/Mar/20 23:40 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11049: [BEAM-9288] Not bundle conscrypt in gRPC vendor in META-INF URL: https://github.com/apache/beam/pull/11049#issuecomment-594938392 This PR further excludes conscrypt from vendored grpc jar, which is a subsequent work of https://github.com/apache/beam/pull/10940 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397980) Time Spent: 5h 20m (was: 5h 10m) > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9317) PostCommit PVR failures
[ https://issues.apache.org/jira/browse/BEAM-9317?focusedWorklogId=397979=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397979 ] ASF GitHub Bot logged work on BEAM-9317: Author: ASF GitHub Bot Created on: 04/Mar/20 23:38 Start Date: 04/Mar/20 23:38 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #10868: [BEAM-9317] Fix portable Dataflow tests to not perform SplittableDoFn expansion. URL: https://github.com/apache/beam/pull/10868#issuecomment-594937282 `"BigQuery source must be split before being read"` appears in release branch. We might can create a Jira to block 2.20.0 release if there isn't one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397979) Time Spent: 3h 40m (was: 3.5h) > PostCommit PVR failures > --- > > Key: BEAM-9317 > URL: https://issues.apache.org/jira/browse/BEAM-9317 > Project: Beam > Issue Type: Bug > Components: test-failures >Reporter: Brian Hulette >Assignee: Luke Cwik >Priority: Major > Fix For: 2.20.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/4104/ > https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/2088/ > https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/4092/ > Seems to have started with https://github.com/apache/beam/pull/10576 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9383) Staging Dataflow artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9383?focusedWorklogId=397968=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397968 ] ASF GitHub Bot logged work on BEAM-9383: Author: ASF GitHub Bot Created on: 04/Mar/20 23:30 Start Date: 04/Mar/20 23:30 Worklog Time Spent: 10m Work Description: ihji commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts from environment URL: https://github.com/apache/beam/pull/11039#issuecomment-594932376 This PR depends on #10621. needs to be rebased before merging. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397968) Time Spent: 20m (was: 10m) > Staging Dataflow artifacts from environment > --- > > Key: BEAM-9383 > URL: https://issues.apache.org/jira/browse/BEAM-9383 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Staging Dataflow artifacts from environment -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=397967=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397967 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 04/Mar/20 23:29 Start Date: 04/Mar/20 23:29 Worklog Time Spent: 10m Work Description: ihji commented on issue #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#issuecomment-594931864 PR for dataflow runner: #11039 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397967) Time Spent: 5h 20m (was: 5h 10m) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-3925) Support ValueProviders in KafkaIO
[ https://issues.apache.org/jira/browse/BEAM-3925?focusedWorklogId=397964=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397964 ] ASF GitHub Bot logged work on BEAM-3925: Author: ASF GitHub Bot Created on: 04/Mar/20 23:25 Start Date: 04/Mar/20 23:25 Worklog Time Spent: 10m Work Description: rangadi commented on issue #6636: [BEAM-3925] [DO NOT MERGE] KafkaIO : Value provider support for reader configuration. URL: https://github.com/apache/beam/pull/6636#issuecomment-594928908 It might be a bit more than than 'just' :) We can't shrink the splits since the compute graph is already constructed, I think. We would know only inside the reader during reader initialization what the total number of partitions we need to consumer. Each reader has to decide which partitions to consume based on reader id. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397964) Time Spent: 8h 10m (was: 8h) > Support ValueProviders in KafkaIO > - > > Key: BEAM-3925 > URL: https://issues.apache.org/jira/browse/BEAM-3925 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Sameer Abhyankar >Assignee: Pramod Upamanyu >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > > Add ValueProvider support for the various methods in KafkaIO. This would > allow us to use KafkaIO in reusable pipeline templates. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner
[ https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051684#comment-17051684 ] Chamikara Madhusanka Jayalath commented on BEAM-9304: - Yeah, seems like this has already been addressed based on Tomo's comment above. > beam-sdks-java-io-google-cloud-platform imports conflicting versions for > BigTable and Spanner > - > > Key: BEAM-9304 > URL: https://issues.apache.org/jira/browse/BEAM-9304 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.18.0 >Reporter: Knut Olav Loite >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Fix For: 2.20.0 > > Attachments: SpannerRead.java, pom.xml > > > If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a > project and try to use `SpannerIO`, the exception > `java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems > to be caused by conflicting versions of `io.opencensus:opencensus-api` being > included by the BigTable client and the Spanner client. BigTable imports > version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level > in the dependency tree and BigTable is defined first, version 0.15.0 is used. > > The workaround for this issue is to exclude the BigTable client in the > project pom in order to be able to use SpannerIO. > > An example pom and simple Java class are included. If the commented exclusion > of the BigTable client is removed, the example will run without problems. The > example will also run without problems on Beam version 2.17 without the > exclusion. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9445) Preoptimize causes error with environment setup
Kyle Weaver created BEAM-9445: - Summary: Preoptimize causes error with environment setup Key: BEAM-9445 URL: https://issues.apache.org/jira/browse/BEAM-9445 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Kyle Weaver Setting --experiments=pre_optimize=all causes an error with StatisticsGen in the TFX taxi example pipeline [1]: File "[redacted]/apache_beam/runners/portability/fn_api_runner_transforms.py", line 250, in executable_stage_transform environment=components.environments[self.environment], TypeError: None has type NoneType, but expected one of: bytes, unicode [while running 'Run[StatisticsGen]'] cc [~angoenka] [1] https://github.com/tensorflow/tfx/blob/ff314a6803675548c89a016a5110a91e5bf98024/tfx/examples/chicago_taxi_pipeline/taxi_pipeline_portable_beam.py#L155 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9304) beam-sdks-java-io-google-cloud-platform imports conflicting versions for BigTable and Spanner
[ https://issues.apache.org/jira/browse/BEAM-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051682#comment-17051682 ] Rui Wang commented on BEAM-9304: Ok guys. If you agree, I will not make this Jira block 2.20.0 RC0 cut. Please let me know if you don't agree. > beam-sdks-java-io-google-cloud-platform imports conflicting versions for > BigTable and Spanner > - > > Key: BEAM-9304 > URL: https://issues.apache.org/jira/browse/BEAM-9304 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.18.0 >Reporter: Knut Olav Loite >Assignee: Chamikara Madhusanka Jayalath >Priority: Critical > Fix For: 2.20.0 > > Attachments: SpannerRead.java, pom.xml > > > If I include `beam-sdks-java-io-google-cloud-platform` version 2.18.0 in a > project and try to use `SpannerIO`, the exception > `java.lang.NoClassDefFoundError: io/opencensus/trace/Tracestate`. This seems > to be caused by conflicting versions of `io.opencensus:opencensus-api` being > included by the BigTable client and the Spanner client. BigTable imports > version 0.15.0. Spanner depends on 0.18.0, but as they are at the same level > in the dependency tree and BigTable is defined first, version 0.15.0 is used. > > The workaround for this issue is to exclude the BigTable client in the > project pom in order to be able to use SpannerIO. > > An example pom and simple Java class are included. If the commented exclusion > of the BigTable client is removed, the example will run without problems. The > example will also run without problems on Beam version 2.17 without the > exclusion. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397962=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397962 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 23:20 Start Date: 04/Mar/20 23:20 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387993084 ## File path: sdks/python/apache_beam/runners/direct/test_stream_impl.py ## @@ -226,17 +237,53 @@ def expand(self, pcoll): def _infer_output_coder(self, input_type=None, input_coder=None): return self.coder - def _events_from_script(self, index): -yield self._events[index] - - def events(self, index): -return self._events_from_script(index) - - def begin(self): -return 0 - - def end(self, index): -return index >= len(self._events) + @staticmethod + def events_from_script(events): +"""Yields the in-memory events. +""" +return itertools.chain(events) Review comment: I added an assert in the TestStream constructor so that this shouldn't happen at pipeline run-time but construction time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397962) Time Spent: 91h 50m (was: 91h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8898) Enable WriteToBigQuery to perform range partitioning
[ https://issues.apache.org/jira/browse/BEAM-8898?focusedWorklogId=397963=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397963 ] ASF GitHub Bot logged work on BEAM-8898: Author: ASF GitHub Bot Created on: 04/Mar/20 23:20 Start Date: 04/Mar/20 23:20 Worklog Time Spent: 10m Work Description: aaltay commented on issue #10319: [BEAM-8898] Upgrade Python google-cloud-bigquery dependency URL: https://github.com/apache/beam/pull/10319#issuecomment-594926293 Why we did not merge this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397963) Time Spent: 2.5h (was: 2h 20m) > Enable WriteToBigQuery to perform range partitioning > > > Key: BEAM-8898 > URL: https://issues.apache.org/jira/browse/BEAM-8898 > Project: Beam > Issue Type: New Feature > Components: io-py-gcp >Reporter: Saman Vaisipour >Assignee: Pablo Estrada >Priority: Minor > Time Spent: 2.5h > Remaining Estimate: 0h > > BigQuery team recently [released > 1.22.0|https://github.com/googleapis/google-cloud-python/releases/tag/bigquery-1.22.0] > which includes the range partitioning feature (here is [an > example|https://github.com/googleapis/google-cloud-python/blob/c4a69d44ccea9635b3d9d316b3f545f16538dafe/bigquery/samples/create_table_range_partitioned.py]). > > WriteToBigQuery uses > [`additional_bq_parameters`|https://github.com/apache/beam/blob/c1719476b74ec6f68fabea392087607adafc70ef/sdks/python/apache_beam/io/gcp/bigquery.py#L177] > to create tables with date partitioning and clustering. It would be great if > the same would be possible to create tables with range partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397958 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 23:19 Start Date: 04/Mar/20 23:19 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387992663 ## File path: sdks/python/apache_beam/runners/direct/transform_evaluator.py ## @@ -527,9 +548,21 @@ def process_element(self, element): for event in self.test_stream._set_up(self.test_stream.output_tags): events.append(event) -events += [e for e in self.test_stream.events(self.current_index)] +# Index into the global state of all the different TestStream event streams. +# Retrieve this TestStream's event stream and read from it. +try: + events = [next(self.test_stream_events[self.event_index])] Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397958) Time Spent: 91h 20m (was: 91h 10m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h 20m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397960 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 23:19 Start Date: 04/Mar/20 23:19 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387992755 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -236,14 +235,19 @@ class TestStream(PTransform): """ def __init__( - self, coder=coders.FastPrimitivesCoder(), events=None, output_tags=None): + self, + coder=coders.FastPrimitivesCoder(), + events=None, + output_tags=None, + endpoint=None): Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397960) Time Spent: 91h 40m (was: 91.5h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h 40m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=397961=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397961 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 04/Mar/20 23:19 Start Date: 04/Mar/20 23:19 Worklog Time Spent: 10m Work Description: chadrik commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-594925685 Sorry, I have two local git repos -- one for typing and one for other stuff -- and I guess I've been accidentally force-pushing multiple branches at once, and this one is out of date in one of the repos. Should be fixed on my end now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397961) Time Spent: 3h 40m (was: 3.5h) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397959 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 23:19 Start Date: 04/Mar/20 23:19 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387992717 ## File path: sdks/python/apache_beam/testing/test_stream_service.py ## @@ -52,5 +56,9 @@ def stop(self): def Events(self, request, context): """Streams back all of the events from the streaming cache.""" -for e in self._events: +# TODO(srohde): Once we get rid of the CacheManager, get rid of this 'full' +# label. +reader = self._reader.read_multiple([['full', key] Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397959) Time Spent: 91.5h (was: 91h 20m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91.5h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=397957=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397957 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 04/Mar/20 23:16 Start Date: 04/Mar/20 23:16 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #11049: [BEAM-9288] Not bundle conscrypt in gRPC vendor in META-INF/ URL: https://github.com/apache/beam/pull/11049#issuecomment-594923827 Tested by running ``` jar tvf vendor/grpc-1_26_0/build/libs/beam-vendor-grpc-1_26_0-0.3.jar | grep conscrypt ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397957) Time Spent: 5h 10m (was: 5h) > Conscrypt shaded dependency > --- > > Key: BEAM-9288 > URL: https://issues.apache.org/jira/browse/BEAM-9288 > Project: Beam > Issue Type: Bug > Components: build-system >Reporter: Esun Kim >Assignee: sunjincheng >Priority: Critical > Fix For: 2.20.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Conscrypt is not designed to be shaded properly mainly because of so files. I > happened to see BEAM-9030 (*1) creating a new vendored gRPC shading Conscrypt > (*2) in it. I think this could make a problem when new Conscrypt is brought > by new gcsio depending on gRPC-alts (*4) in a dependency chain. (*5) In this > case, it may have a conflict when finding proper so files for Conscrypt. > *1: https://issues.apache.org/jira/browse/BEAM-9030 > *2: > [https://github.com/apache/beam/blob/e24d1e51cbabe27cb3cc381fd95b334db639c45d/buildSrc/src/main/groovy/org/apache/beam/gradle/GrpcVendoring_1_26_0.groovy#L78] > *3: https://issues.apache.org/jira/browse/BEAM-6136 > *4: [https://mvnrepository.com/artifact/io.grpc/grpc-alts/1.27.0] > *5: https://issues.apache.org/jira/browse/BEAM-8889 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9288) Conscrypt shaded dependency
[ https://issues.apache.org/jira/browse/BEAM-9288?focusedWorklogId=397956=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397956 ] ASF GitHub Bot logged work on BEAM-9288: Author: ASF GitHub Bot Created on: 04/Mar/20 23:16 Start Date: 04/Mar/20 23:16 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #11049: [BEAM-9288] Not bundle conscrypt in gRPC vendor in META-INF/ URL: https://github.com/apache/beam/pull/11049 R: @lukecwik @sunjincheng121 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8898) Enable WriteToBigQuery to perform range partitioning
[ https://issues.apache.org/jira/browse/BEAM-8898?focusedWorklogId=397955=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397955 ] ASF GitHub Bot logged work on BEAM-8898: Author: ASF GitHub Bot Created on: 04/Mar/20 23:09 Start Date: 04/Mar/20 23:09 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10319: [BEAM-8898] Upgrade Python google-cloud-bigquery dependency URL: https://github.com/apache/beam/pull/10319 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397955) Time Spent: 2h 20m (was: 2h 10m) > Enable WriteToBigQuery to perform range partitioning > > > Key: BEAM-8898 > URL: https://issues.apache.org/jira/browse/BEAM-8898 > Project: Beam > Issue Type: New Feature > Components: io-py-gcp >Reporter: Saman Vaisipour >Assignee: Pablo Estrada >Priority: Minor > Time Spent: 2h 20m > Remaining Estimate: 0h > > BigQuery team recently [released > 1.22.0|https://github.com/googleapis/google-cloud-python/releases/tag/bigquery-1.22.0] > which includes the range partitioning feature (here is [an > example|https://github.com/googleapis/google-cloud-python/blob/c4a69d44ccea9635b3d9d316b3f545f16538dafe/bigquery/samples/create_table_range_partitioned.py]). > > WriteToBigQuery uses > [`additional_bq_parameters`|https://github.com/apache/beam/blob/c1719476b74ec6f68fabea392087607adafc70ef/sdks/python/apache_beam/io/gcp/bigquery.py#L177] > to create tables with date partitioning and clustering. It would be great if > the same would be possible to create tables with range partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9274) Support running yapf in a git pre-commit hook
[ https://issues.apache.org/jira/browse/BEAM-9274?focusedWorklogId=397954=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397954 ] ASF GitHub Bot logged work on BEAM-9274: Author: ASF GitHub Bot Created on: 04/Mar/20 23:07 Start Date: 04/Mar/20 23:07 Worklog Time Spent: 10m Work Description: udim commented on issue #10810: [BEAM-9274] Support running yapf in a git pre-commit hook URL: https://github.com/apache/beam/pull/10810#issuecomment-594919289 > Whoops. I think that was my mistake. Fixed. tox.ini reverted again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397954) Time Spent: 3.5h (was: 3h 20m) > Support running yapf in a git pre-commit hook > - > > Key: BEAM-9274 > URL: https://issues.apache.org/jira/browse/BEAM-9274 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 3.5h > Remaining Estimate: 0h > > As a developer I want to be able to automatically run yapf before I make a > commit so that I don't waste time with failures on jenkins. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8898) Enable WriteToBigQuery to perform range partitioning
[ https://issues.apache.org/jira/browse/BEAM-8898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051675#comment-17051675 ] Pablo Estrada commented on BEAM-8898: - Nevermind. I see that Beam is already updated[1] to the latest BQ library version[2]. This should be working on the latest sink on master. [1] [https://github.com/apache/beam/blob/master/sdks/python/setup.py#L206] [2] [https://pypi.org/project/google-cloud-bigquery/] > Enable WriteToBigQuery to perform range partitioning > > > Key: BEAM-8898 > URL: https://issues.apache.org/jira/browse/BEAM-8898 > Project: Beam > Issue Type: New Feature > Components: io-py-gcp >Reporter: Saman Vaisipour >Assignee: Pablo Estrada >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > BigQuery team recently [released > 1.22.0|https://github.com/googleapis/google-cloud-python/releases/tag/bigquery-1.22.0] > which includes the range partitioning feature (here is [an > example|https://github.com/googleapis/google-cloud-python/blob/c4a69d44ccea9635b3d9d316b3f545f16538dafe/bigquery/samples/create_table_range_partitioned.py]). > > WriteToBigQuery uses > [`additional_bq_parameters`|https://github.com/apache/beam/blob/c1719476b74ec6f68fabea392087607adafc70ef/sdks/python/apache_beam/io/gcp/bigquery.py#L177] > to create tables with date partitioning and clustering. It would be great if > the same would be possible to create tables with range partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9433) Create an expansion service artifact for common IOs
[ https://issues.apache.org/jira/browse/BEAM-9433?focusedWorklogId=397952=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397952 ] ASF GitHub Bot logged work on BEAM-9433: Author: ASF GitHub Bot Created on: 04/Mar/20 23:00 Start Date: 04/Mar/20 23:00 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #11048: [BEAM-9433] Create expansion service artifact for common Java IOs. URL: https://github.com/apache/beam/pull/11048 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
[jira] [Work logged] (BEAM-7926) Show PCollection with Interactive Beam in a data-centric user flow
[ https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=397950=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397950 ] ASF GitHub Bot logged work on BEAM-7926: Author: ASF GitHub Bot Created on: 04/Mar/20 22:59 Start Date: 04/Mar/20 22:59 Worklog Time Spent: 10m Work Description: aaltay commented on issue #11020: [BEAM-7926] Update Data Visualization URL: https://github.com/apache/beam/pull/11020#issuecomment-594915596 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397950) Time Spent: 55h 10m (was: 55h) > Show PCollection with Interactive Beam in a data-centric user flow > -- > > Key: BEAM-7926 > URL: https://issues.apache.org/jira/browse/BEAM-7926 > Project: Beam > Issue Type: New Feature > Components: runner-py-interactive >Reporter: Ning Kang >Assignee: Ning Kang >Priority: Major > Time Spent: 55h 10m > Remaining Estimate: 0h > > Support auto plotting / charting of materialized data of a given PCollection > with Interactive Beam. > Say an Interactive Beam pipeline defined as > > {code:java} > p = beam.Pipeline(InteractiveRunner()) > pcoll = p | 'Transform' >> transform() > pcoll2 = ... > pcoll3 = ...{code} > The use can call a single function and get auto-magical charting of the data. > e.g., > {code:java} > show(pcoll, pcoll2) > {code} > Throughout the process, a pipeline fragment is built to include only > transforms necessary to produce the desired pcolls (pcoll and pcoll2) and > execute that fragment. > This makes the Interactive Beam user flow data-centric. > > Detailed > [design|https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit#heading=h.v6k2o3roarzz]. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=397949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397949 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 04/Mar/20 22:55 Start Date: 04/Mar/20 22:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on issue #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#issuecomment-594913505 Robert, do you have any additional comments for this ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397949) Time Spent: 5h 10m (was: 5h) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397945=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397945 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 22:54 Start Date: 04/Mar/20 22:54 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387966654 ## File path: sdks/python/apache_beam/testing/test_stream_service.py ## @@ -52,5 +56,9 @@ def stop(self): def Events(self, request, context): """Streams back all of the events from the streaming cache.""" -for e in self._events: +# TODO(srohde): Once we get rid of the CacheManager, get rid of this 'full' +# label. +reader = self._reader.read_multiple([['full', key] Review comment: ```suggestion reader = self._reader.read_multiple([('full', key) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397945) Time Spent: 91h (was: 90h 50m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397946=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397946 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 22:54 Start Date: 04/Mar/20 22:54 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387979245 ## File path: sdks/python/apache_beam/testing/test_stream_service.py ## @@ -52,5 +56,9 @@ def stop(self): def Events(self, request, context): """Streams back all of the events from the streaming cache.""" -for e in self._events: +# TODO(srohde): Once we get rid of the CacheManager, get rid of this 'full' +# label. +reader = self._reader.read_multiple([['full', key] Review comment: s/key/tag/ ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397946) Time Spent: 91h 10m (was: 91h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397947=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397947 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 22:54 Start Date: 04/Mar/20 22:54 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387967821 ## File path: sdks/python/apache_beam/testing/test_stream.py ## @@ -236,14 +235,19 @@ class TestStream(PTransform): """ def __init__( - self, coder=coders.FastPrimitivesCoder(), events=None, output_tags=None): + self, + coder=coders.FastPrimitivesCoder(), + events=None, + output_tags=None, + endpoint=None): Review comment: please document inputs This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397947) Time Spent: 91h 10m (was: 91h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397948=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397948 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 22:54 Start Date: 04/Mar/20 22:54 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#discussion_r387976936 ## File path: sdks/python/apache_beam/runners/direct/test_stream_impl.py ## @@ -226,17 +237,53 @@ def expand(self, pcoll): def _infer_output_coder(self, input_type=None, input_coder=None): return self.coder - def _events_from_script(self, index): -yield self._events[index] - - def events(self, index): -return self._events_from_script(index) - - def begin(self): -return 0 - - def end(self, index): -return index >= len(self._events) + @staticmethod + def events_from_script(events): +"""Yields the in-memory events. +""" +return itertools.chain(events) Review comment: Let's add an assert to make sure that this is only called when we have in-memory events vs rpc events (also in the method below). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397948) Time Spent: 91h 10m (was: 91h) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 91h 10m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8335) Add streaming support to Interactive Beam
[ https://issues.apache.org/jira/browse/BEAM-8335?focusedWorklogId=397943=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397943 ] ASF GitHub Bot logged work on BEAM-8335: Author: ASF GitHub Bot Created on: 04/Mar/20 22:53 Start Date: 04/Mar/20 22:53 Worklog Time Spent: 10m Work Description: rohdesamuel commented on issue #10994: [BEAM-8335] TeststreamService integration with DirectRunner URL: https://github.com/apache/beam/pull/10994#issuecomment-594912381 ugh: `ERROR: Could not install packages due to an EnvironmentError: [Errno 28] No space left on device` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397943) Time Spent: 90h 50m (was: 90h 40m) > Add streaming support to Interactive Beam > - > > Key: BEAM-8335 > URL: https://issues.apache.org/jira/browse/BEAM-8335 > Project: Beam > Issue Type: Improvement > Components: runner-py-interactive >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: Major > Time Spent: 90h 50m > Remaining Estimate: 0h > > This issue tracks the work items to introduce streaming support to the > Interactive Beam experience. This will allow users to: > * Write and run a streaming job in IPython > * Automatically cache records from unbounded sources > * Add a replay experience that replays all cached records to simulate the > original pipeline execution > * Add controls to play/pause/stop/step individual elements from the cached > records > * Add ability to inspect/visualize unbounded PCollections -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9056) Staging artifacts from environment
[ https://issues.apache.org/jira/browse/BEAM-9056?focusedWorklogId=397942=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-397942 ] ASF GitHub Bot logged work on BEAM-9056: Author: ASF GitHub Bot Created on: 04/Mar/20 22:51 Start Date: 04/Mar/20 22:51 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #10621: [BEAM-9056] Staging artifacts from environment URL: https://github.com/apache/beam/pull/10621#discussion_r387982425 ## File path: runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java ## @@ -261,14 +263,20 @@ public String registerCoder(Coder coder) throws IOException { * return the same unique ID. */ public String registerEnvironment(Environment env) { +String environmentId; String existing = environmentIds.get(env); if (existing != null) { - return existing; + environmentId = existing; +} else { + String name = uniqify(env.getUrn(), environmentIds.values()); + environmentIds.put(env, name); + componentsBuilder.putEnvironments(name, env); + environmentId = name; } -String name = uniqify(env.getUrn(), environmentIds.values()); -environmentIds.put(env, name); -componentsBuilder.putEnvironments(name, env); -return name; +if (defaultEnvironmentId == null) { Review comment: Why should this be a separate ticket ? Can't we just change method to registerEnvironment(Environment env, boolean default) and change above to sdkComponents.registerEnvironment( Environments.createOrGetDefaultEnvironment(portablePipelineOptions), true); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 397942) Time Spent: 5h (was: 4h 50m) > Staging artifacts from environment > -- > > Key: BEAM-9056 > URL: https://issues.apache.org/jira/browse/BEAM-9056 > Project: Beam > Issue Type: Sub-task > Components: java-fn-execution >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > staging artifacts from artifact information embedded in environment proto. > detail: > https://docs.google.com/document/d/1L7MJcfyy9mg2Ahfw5XPhUeBe-dyvAPMOYOiFA1-kAog -- This message was sent by Atlassian Jira (v8.3.4#803005)