[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438563 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 29/May/20 04:32 Start Date: 29/May/20 04:32 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635752434 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438563) Time Spent: 1h 20m (was: 1h 10m) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1h 20m > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438564 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 29/May/20 04:32 Start Date: 29/May/20 04:32 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635752517 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438564) Time Spent: 2h 20m (was: 2h 10m) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 2h 20m > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438562 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 29/May/20 04:31 Start Date: 29/May/20 04:31 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635752368 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438562) Time Spent: 1h 10m (was: 1h) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438560=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438560 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 29/May/20 04:31 Start Date: 29/May/20 04:31 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635752138 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438560) Time Spent: 2h (was: 1h 50m) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 2h > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438561 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 29/May/20 04:31 Start Date: 29/May/20 04:31 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635752197 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438561) Time Spent: 2h 10m (was: 2h) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 2h 10m > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows
[ https://issues.apache.org/jira/browse/BEAM-10005?focusedWorklogId=438554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438554 ] ASF GitHub Bot logged work on BEAM-10005: - Author: ASF GitHub Bot Created on: 29/May/20 03:22 Start Date: 29/May/20 03:22 Worklog Time Spent: 10m Work Description: darshanj commented on pull request #11855: URL: https://github.com/apache/beam/pull/11855#issuecomment-635734663 R: @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438554) Time Spent: 20m (was: 10m) > Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when > inputs not windowed by GlobalWindows > > > Key: BEAM-10005 > URL: https://issues.apache.org/jira/browse/BEAM-10005 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally > with input windowed not using GlobalWindows. > To make it run we need to set either > {code:java} > .withoutDefaults() > {code} > or > {code:java} > .asSingletonView() > {code} > Currently we can't call any of the above on > ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not > return underlying Combine.globally, but PTransform or Globally in case of > ApproximateUnique. > Example failing case: > {code:java} > PCollection elements = p.apply(GenerateSequence.from(0).to(100) > .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new)); > PCollection> input = elements > > .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1 > .apply(ApproximateQuantiles.globally(17)); > {code} > It throws expected error from internal Combine.globally() transform: > {code:java} > Default values are not supported in Combine.globally() if the input > PCollection is not windowed by GlobalWindows. Instead, use > Combine.globally().withoutDefaults() to output an empty PCollection if the > input PCollection is empty, or Combine.globally().asSingletonView() to get > the default output of the CombineFn if the input PCollection is empty. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows
[ https://issues.apache.org/jira/browse/BEAM-10005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119239#comment-17119239 ] Darshan Jani commented on BEAM-10005: - Hi Kenneth, I have created a PR for this. Please review. Thanks > Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when > inputs not windowed by GlobalWindows > > > Key: BEAM-10005 > URL: https://issues.apache.org/jira/browse/BEAM-10005 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Affects Versions: 2.20.0 >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Time Spent: 20m > Remaining Estimate: 0h > > Unable to use ApproximateQuantiles.globally or ApproximateUnique.globally > with input windowed not using GlobalWindows. > To make it run we need to set either > {code:java} > .withoutDefaults() > {code} > or > {code:java} > .asSingletonView() > {code} > Currently we can't call any of the above on > ApproximateQuantiles.globally()/ApproximateUnique.globally as it does not > return underlying Combine.globally, but PTransform or Globally in case of > ApproximateUnique. > Example failing case: > {code:java} > PCollection elements = p.apply(GenerateSequence.from(0).to(100) > .withRate(1,Duration.millis(1)).withTimestampFn(Instant::new)); > PCollection> input = elements > > .apply(Window.into(SlidingWindows.of(Duration.millis(3)).every(Duration.millis(1 > .apply(ApproximateQuantiles.globally(17)); > {code} > It throws expected error from internal Combine.globally() transform: > {code:java} > Default values are not supported in Combine.globally() if the input > PCollection is not windowed by GlobalWindows. Instead, use > Combine.globally().withoutDefaults() to output an empty PCollection if the > input PCollection is empty, or Combine.globally().asSingletonView() to get > the default output of the CombineFn if the input PCollection is empty. > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10005) Unable to use ApproximateQuantiles.globally/ApproximateUnique.globally when inputs not windowed by GlobalWindows
[ https://issues.apache.org/jira/browse/BEAM-10005?focusedWorklogId=438553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438553 ] ASF GitHub Bot logged work on BEAM-10005: - Author: ASF GitHub Bot Created on: 29/May/20 03:21 Start Date: 29/May/20 03:21 Worklog Time Spent: 10m Work Description: darshanj opened a new pull request #11855: URL: https://github.com/apache/beam/pull/11855 Added api combineFn that can be used for inputs windowed with non global windows **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Assigned] (BEAM-1754) Will Dataflow ever support Node.js with an SDK similar to Java or Python?
[ https://issues.apache.org/jira/browse/BEAM-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwin Ramaswami reassigned BEAM-1754: -- Assignee: Ashwin Ramaswami > Will Dataflow ever support Node.js with an SDK similar to Java or Python? > - > > Key: BEAM-1754 > URL: https://issues.apache.org/jira/browse/BEAM-1754 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas >Reporter: Diego Zuluaga >Assignee: Ashwin Ramaswami >Priority: P2 > Labels: node.js > > I like the philosophy behind DataFlow and found the Java and Python samples > highly comprehensible. However, I have to admit that for most Node.js > developers who have little background on typed languages and are used to get > up to speed with frameworks incredibly fast, learning Dataflow might take > some learning curve that they/we're not used to. So, I wonder if at any point > in time Dataflow will provide a Node.js SDK. Maybe this is out of the > question, but I wanted to run it by the team as it would be awesome to have > something along these lines! > Thanks, > Diego > Question originaly posted in SO: > http://stackoverflow.com/questions/42893436/will-dataflow-ever-support-node-js-with-and-sdk-similar-to-java-or-python -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438551=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438551 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 29/May/20 02:38 Start Date: 29/May/20 02:38 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635722845 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438551) Time Spent: 3.5h (was: 3h 20m) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 3.5h > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds
[ https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kenneth Knowles updated BEAM-10115: --- Status: Open (was: Triage Needed) > Staging requirements.txt fails but staging setup.py succeeds > > > Key: BEAM-10115 > URL: https://issues.apache.org/jira/browse/BEAM-10115 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Kenneth Knowles >Priority: P2 > > User reports on StackOverflow: > https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python > The issue appears to be a problem with staging, and a difference between > using `requirements.txt` and `setup.py` for some reason. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438540 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 29/May/20 01:21 Start Date: 29/May/20 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635702305 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438540) Time Spent: 40m (was: 0.5h) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 40m > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438541=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438541 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 29/May/20 01:21 Start Date: 29/May/20 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635702360 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438541) Time Spent: 50m (was: 40m) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 50m > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9869) adding self-contained Kafka service jar for testing
[ https://issues.apache.org/jira/browse/BEAM-9869?focusedWorklogId=438543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438543 ] ASF GitHub Bot logged work on BEAM-9869: Author: ASF GitHub Bot Created on: 29/May/20 01:21 Start Date: 29/May/20 01:21 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635702404 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438543) Time Spent: 1h (was: 50m) > adding self-contained Kafka service jar for testing > --- > > Key: BEAM-9869 > URL: https://issues.apache.org/jira/browse/BEAM-9869 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1h > Remaining Estimate: 0h > > adding self-contained Kafka service jar for testing -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor
[ https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=438538=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438538 ] ASF GitHub Bot logged work on BEAM-9964: Author: ASF GitHub Bot Created on: 29/May/20 01:03 Start Date: 29/May/20 01:03 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #11849: URL: https://github.com/apache/beam/pull/11849#issuecomment-635697264 =/ looks like the dataflow precommit succeeded but the API call to update it here failed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438538) Time Spent: 5h 10m (was: 5h) > Setting workerCacheMb to make its way to the WindmillStateCache Constructor > --- > > Key: BEAM-9964 > URL: https://issues.apache.org/jira/browse/BEAM-9964 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Omar Ismail >Assignee: Omar Ismail >Priority: P3 > Fix For: 2.22.0 > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, > the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to > make it allowable to change the cache value in Streaming when setting > -workerCacheMB. > I've never made changes to the Beam SDK, so I am super excited to work on > this! > > [[1] > https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8
[ https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=438536=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438536 ] ASF GitHub Bot logged work on BEAM-9785: Author: ASF GitHub Bot Created on: 29/May/20 01:02 Start Date: 29/May/20 01:02 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #11788: URL: https://github.com/apache/beam/pull/11788#issuecomment-635696787 @epicfaace Thanks for your initiative to help with Python 3.8. Please see the discussion on introducing high-priority/low priority versions: https://lists.apache.org/thread.html/r643cae69e5be136e6bca75bf896991fa313f79623ca056271588c87d%40%3Cdev.beam.apache.org%3E cc: Yoshiki (@lazylynx) who is was also working on this and may have some thoughts how to best integrate these changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438536) Time Spent: 1h 10m (was: 1h) > Add PostCommit suite for Python 3.8 > --- > > Key: BEAM-9785 > URL: https://issues.apache.org/jira/browse/BEAM-9785 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: yoshiki obata >Assignee: Ashwin Ramaswami >Priority: P2 > Time Spent: 1h 10m > Remaining Estimate: 0h > > Add PostCommit suites for Python 3.8. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor
[ https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=438535=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438535 ] ASF GitHub Bot logged work on BEAM-9964: Author: ASF GitHub Bot Created on: 29/May/20 01:00 Start Date: 29/May/20 01:00 Worklog Time Spent: 10m Work Description: steveniemitz commented on pull request #11849: URL: https://github.com/apache/beam/pull/11849#issuecomment-635696194 > gahh so sorry that I missed this. I guess you did have to end up contributing this : ) heh no problem, teamwork! :highfive: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438535) Time Spent: 5h (was: 4h 50m) > Setting workerCacheMb to make its way to the WindmillStateCache Constructor > --- > > Key: BEAM-9964 > URL: https://issues.apache.org/jira/browse/BEAM-9964 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Omar Ismail >Assignee: Omar Ismail >Priority: P3 > Fix For: 2.22.0 > > Time Spent: 5h > Remaining Estimate: 0h > > Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, > the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to > make it allowable to change the cache value in Streaming when setting > -workerCacheMB. > I've never made changes to the Beam SDK, so I am super excited to work on > this! > > [[1] > https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438532=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438532 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 29/May/20 00:58 Start Date: 29/May/20 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635695687 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438532) Time Spent: 1h 50m (was: 1h 40m) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1h 50m > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438534 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 29/May/20 00:58 Start Date: 29/May/20 00:58 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635695779 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438534) Time Spent: 2h 40m (was: 2.5h) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438524 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 28/May/20 23:55 Start Date: 28/May/20 23:55 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635677737 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438524) Time Spent: 2.5h (was: 2h 20m) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438523 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 28/May/20 23:51 Start Date: 28/May/20 23:51 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635676513 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438523) Time Spent: 1h 40m (was: 1.5h) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1h 40m > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438522 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 28/May/20 23:51 Start Date: 28/May/20 23:51 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635676452 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438522) Time Spent: 1.5h (was: 1h 20m) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1.5h > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9785) Add PostCommit suite for Python 3.8
[ https://issues.apache.org/jira/browse/BEAM-9785?focusedWorklogId=438520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438520 ] ASF GitHub Bot logged work on BEAM-9785: Author: ASF GitHub Bot Created on: 28/May/20 23:48 Start Date: 28/May/20 23:48 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11788: URL: https://github.com/apache/beam/pull/11788#issuecomment-635675870 /cc @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438520) Time Spent: 1h (was: 50m) > Add PostCommit suite for Python 3.8 > --- > > Key: BEAM-9785 > URL: https://issues.apache.org/jira/browse/BEAM-9785 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: yoshiki obata >Assignee: Ashwin Ramaswami >Priority: P2 > Time Spent: 1h > Remaining Estimate: 0h > > Add PostCommit suites for Python 3.8. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9500) Refactor load tests
[ https://issues.apache.org/jira/browse/BEAM-9500?focusedWorklogId=438518=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438518 ] ASF GitHub Bot logged work on BEAM-9500: Author: ASF GitHub Bot Created on: 28/May/20 23:45 Start Date: 28/May/20 23:45 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11181: URL: https://github.com/apache/beam/pull/11181#issuecomment-635674912 @piotr-szuberski - what is the next step for this PR? Is it still active? Should we close it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438518) Time Spent: 50m (was: 40m) > Refactor load tests > --- > > Key: BEAM-9500 > URL: https://issues.apache.org/jira/browse/BEAM-9500 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Michał Walenia >Assignee: Piotr Szuberski >Priority: P3 > Time Spent: 50m > Remaining Estimate: 0h > > `make {{LoadTest}} parameterized instead of a base class. > Each subclass of {{LoadTest}} is really just the main {{loadTest}} function > and really that function is about the same as writing a {{PTransform}}. If > you eliminate subclassing you can have {{LoadTest}} own the pipeline setup > with so it will never be possible to forget or mess up > {{readSourceFromOptions}} and {{ParDo.of(runtimeMonitor)}}. It will be less > repeat boilerplate. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=438517=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438517 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 28/May/20 23:42 Start Date: 28/May/20 23:42 Worklog Time Spent: 10m Work Description: chadrik commented on pull request #11632: URL: https://github.com/apache/beam/pull/11632#issuecomment-635673990 LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438517) Time Spent: 85h 10m (was: 85h) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: P2 > Time Spent: 85h 10m > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438516 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 28/May/20 23:42 Start Date: 28/May/20 23:42 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635674009 I don't think so. We changed from using "key=value" strings to StagedFile objects in https://github.com/apache/beam/pull/11039/files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438516) Time Spent: 2h 20m (was: 2h 10m) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9956) Bad formatting on environments page
[ https://issues.apache.org/jira/browse/BEAM-9956?focusedWorklogId=438515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438515 ] ASF GitHub Bot logged work on BEAM-9956: Author: ASF GitHub Bot Created on: 28/May/20 23:39 Start Date: 28/May/20 23:39 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11704: URL: https://github.com/apache/beam/pull/11704#issuecomment-635673169 Could you rebase? Is this still an issue after the website change? R: @rosetn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438515) Time Spent: 0.5h (was: 20m) > Bad formatting on environments page > --- > > Key: BEAM-9956 > URL: https://issues.apache.org/jira/browse/BEAM-9956 > Project: Beam > Issue Type: Bug > Components: website >Reporter: Kyle Weaver >Assignee: Brent Worden >Priority: P3 > Time Spent: 0.5h > Remaining Estimate: 0h > > https://beam.apache.org/documentation/runtime/environments/ > At the bottom of the page, it looks like there are text instructions that are > formatted as code and vice-versa. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8451) Interactive Beam example failing from stack overflow
[ https://issues.apache.org/jira/browse/BEAM-8451?focusedWorklogId=438514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438514 ] ASF GitHub Bot logged work on BEAM-8451: Author: ASF GitHub Bot Created on: 28/May/20 23:38 Start Date: 28/May/20 23:38 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11706: URL: https://github.com/apache/beam/pull/11706#issuecomment-635672892 R: @rosetn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438514) Time Spent: 2h 10m (was: 2h) > Interactive Beam example failing from stack overflow > > > Key: BEAM-8451 > URL: https://issues.apache.org/jira/browse/BEAM-8451 > Project: Beam > Issue Type: Bug > Components: examples-python, runner-py-interactive >Reporter: Igor Durovic >Assignee: Chun Yang >Priority: P2 > Fix For: 2.18.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > > RecursionError: maximum recursion depth exceeded in __instancecheck__ > at > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py#L405] > > This occurred after the execution of the last cell in > [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/interactive/examples/Interactive%20Beam%20Example.ipynb] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438511=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438511 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 28/May/20 23:36 Start Date: 28/May/20 23:36 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635672193 @chamikaramj, @ihji is the `filesToStage` change a problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438511) Time Spent: 2h 10m (was: 2h) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9825) Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll
[ https://issues.apache.org/jira/browse/BEAM-9825?focusedWorklogId=438509=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438509 ] ASF GitHub Bot logged work on BEAM-9825: Author: ASF GitHub Bot Created on: 28/May/20 23:35 Start Date: 28/May/20 23:35 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-635672074 All tests passed. I do not see a LGTM, maybe I am missing. Is this ready to be merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438509) Remaining Estimate: 85h 20m (was: 85.5h) Time Spent: 10h 40m (was: 10.5h) > Transforms for Intersect, IntersectAll, Except, ExceptAll, Union, UnionAll > -- > > Key: BEAM-9825 > URL: https://issues.apache.org/jira/browse/BEAM-9825 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 10h 40m > Remaining Estimate: 85h 20m > > I'd like to propose following new high-level transforms. > * Intersect > Compute the intersection between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that common to both _leftCollection_ and > _rightCollection_ > > * Except > Compute the difference between elements of two PCollection. > Given _leftCollection_ and _rightCollection_, this transform returns a > collection containing elements that are in _leftCollection_ but not in > _rightCollection_ > * Union > Find the elements that are either of two PCollection. > Implement IntersetAll, ExceptAll and UnionAll variants of transforms. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10055) Add --region to 3 of the python examples
[ https://issues.apache.org/jira/browse/BEAM-10055?focusedWorklogId=438508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438508 ] ASF GitHub Bot logged work on BEAM-10055: - Author: ASF GitHub Bot Created on: 28/May/20 23:34 Start Date: 28/May/20 23:34 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11779: URL: https://github.com/apache/beam/pull/11779#issuecomment-635671668 LGTM. I will merge after tests pass. Thank you @tedromer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438508) Remaining Estimate: 20m (was: 0.5h) Time Spent: 40m (was: 0.5h) > Add --region to 3 of the python examples > > > Key: BEAM-10055 > URL: https://issues.apache.org/jira/browse/BEAM-10055 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ted Romer >Assignee: Ted Romer >Priority: P3 > Original Estimate: 1h > Time Spent: 40m > Remaining Estimate: 20m > > Proposed fix: > {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10055) Add --region to 3 of the python examples
[ https://issues.apache.org/jira/browse/BEAM-10055?focusedWorklogId=438507=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438507 ] ASF GitHub Bot logged work on BEAM-10055: - Author: ASF GitHub Bot Created on: 28/May/20 23:34 Start Date: 28/May/20 23:34 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11779: URL: https://github.com/apache/beam/pull/11779#issuecomment-635671566 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438507) Remaining Estimate: 0.5h (was: 40m) Time Spent: 0.5h (was: 20m) > Add --region to 3 of the python examples > > > Key: BEAM-10055 > URL: https://issues.apache.org/jira/browse/BEAM-10055 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ted Romer >Assignee: Ted Romer >Priority: P3 > Original Estimate: 1h > Time Spent: 0.5h > Remaining Estimate: 0.5h > > Proposed fix: > {color:#FF}[https://github.com/tedromer/beam/compare/tedromer:ef811fe...tedromer:1f39865]{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10149) Support OnTimeBehavior and closing_behavior for triggers in Python.
Robert Bradshaw created BEAM-10149: -- Summary: Support OnTimeBehavior and closing_behavior for triggers in Python. Key: BEAM-10149 URL: https://issues.apache.org/jira/browse/BEAM-10149 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Robert Bradshaw -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8532) Beam Python trigger driver sets incorrect timestamp for output windows.
[ https://issues.apache.org/jira/browse/BEAM-8532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Bradshaw resolved BEAM-8532. --- Fix Version/s: 2.20.0 Resolution: Fixed > Beam Python trigger driver sets incorrect timestamp for output windows. > --- > > Key: BEAM-8532 > URL: https://issues.apache.org/jira/browse/BEAM-8532 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Robert Bradshaw >Assignee: Robert Bradshaw >Priority: P1 > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > The timestamp should lie in the window, otherwise re-windowing will not be > idempotent. > https://github.com/apache/beam/blob/release-2.16.0/sdks/python/apache_beam/transforms/trigger.py#L1183 > should be using {{window.max_timestamp()}} rather than {{.end}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438505=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438505 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 28/May/20 23:30 Start Date: 28/May/20 23:30 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635670461 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438505) Time Spent: 1h 20m (was: 1h 10m) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam > Issue Type: Improvement > Components: io-java-kafka >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Time Spent: 1h 20m > Remaining Estimate: 0h > > adding cross-language KafkaIO integration test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8371) Sunset Beam Python 2 support in new releases in 2020.
[ https://issues.apache.org/jira/browse/BEAM-8371?focusedWorklogId=438503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438503 ] ASF GitHub Bot logged work on BEAM-8371: Author: ASF GitHub Bot Created on: 28/May/20 23:27 Start Date: 28/May/20 23:27 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11819: URL: https://github.com/apache/beam/pull/11819#issuecomment-635669412 Should we close this pull request for now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438503) Time Spent: 1h (was: 50m) > Sunset Beam Python 2 support in new releases in 2020. > - > > Key: BEAM-8371 > URL: https://issues.apache.org/jira/browse/BEAM-8371 > Project: Beam > Issue Type: Task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Ashwin Ramaswami >Priority: P2 > Time Spent: 1h > Remaining Estimate: 0h > > Creating this Jira to track eventual sunset of Python 2 support in Beam. > This was previously discussed in [1], [2], [3]. > We can use this issue to communicate next steps, collect feedback and give > updates on current status. > [1] > https://lists.apache.org/thread.html/eba6caa58ea79a7ecbc8560d1c680a366b44c531d96ce5c699d41535@%3Cdev.beam.apache.org%3E > [2] > https://lists.apache.org/thread.html/456631fe1a696c537ef8ebfee42cd3ea8121bf7c639c52da5f7032e7@%3Cdev.beam.apache.org%3E > [3] > https://lists.apache.org/thread.html/r0d5c309a7e3107854f4892ccfeb1a17c0cec25dfce188678ab8df072%40%3Cdev.beam.apache.org%3E -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9946) Enhance Partition transform to provide partitionfn with SideInputs
[ https://issues.apache.org/jira/browse/BEAM-9946?focusedWorklogId=438502=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438502 ] ASF GitHub Bot logged work on BEAM-9946: Author: ASF GitHub Bot Created on: 28/May/20 23:26 Start Date: 28/May/20 23:26 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11682: URL: https://github.com/apache/beam/pull/11682#issuecomment-635668922 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438502) Remaining Estimate: 92.5h (was: 92h 40m) Time Spent: 3.5h (was: 3h 20m) > Enhance Partition transform to provide partitionfn with SideInputs > -- > > Key: BEAM-9946 > URL: https://issues.apache.org/jira/browse/BEAM-9946 > Project: Beam > Issue Type: New Feature > Components: sdk-java-core >Reporter: Darshan Jani >Assignee: Darshan Jani >Priority: P2 > Original Estimate: 96h > Time Spent: 3.5h > Remaining Estimate: 92.5h > > Currently _Partition_ transform can partition a collection into n collections > based on only _element_ value in _PartitionFn_ to decide on which partition a > particular element belongs to. > {code:java} > public interface PartitionFn extends Serializable { > int partitionFor(T elem, int numPartitions); > } > public static Partition of(int numPartitions, PartitionFn > partitionFn) { > return new Partition<>(new PartitionDoFn(numPartitions, partitionFn)); > } > {code} > It will be useful to introduce new API with additional _sideInputs_ provided > to partition function. User will be able to write logic to use both _element_ > value and _sideInputs_ to decide on which partition a particular element > belongs to. > Option-1: Proposed new API: > {code:java} > public interface PartitionWithSideInputsFn extends Serializable { > int partitionFor(T elem, int numPartitions, Context c); > } > public static Partition of(int numPartitions, > PartitionWithSideInputsFn partitionFn, Requirements requirements) { > ... > } > {code} > User can use any of the two APIs as per there partitioning function logic. > Option-2: Redesign old API with Builder Pattern which can provide optionally > a _Requirements_ with _sideInputs._ Deprecate old API. > {code:java} > // using sideviews > Partition.into(numberOfPartitions).via( > fn( > (input,c) -> { > // use c.sideInput(view) > // use input > // return partitionnumber > },requiresSideInputs(view)) > ) > // without using sideviews > Partition.into(numberOfPartitions).via( > fn((input,c) -> { > // use input > // return partitionnumber > }) > ) > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input
[ https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prathap Kumar Parvathareddy updated BEAM-10148: --- Description: *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* Below are few possible issues that can be ruled out as I already validated them. # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code shortly which helps in providing more visibility. was: *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code shortly which helps in providing more visibility. > Data loaded using ReadAllFromText is not projected properly as side input > - > > Key: BEAM-10148 > URL: https://issues.apache.org/jira/browse/BEAM-10148 > Project: Beam > Issue Type: Bug > Components: io-py-files >Affects Versions: 2.19.0 > Environment: Runner: Dataflow Runner > Beam SDK: Python 2.19 >Reporter: Prathap Kumar Parvathareddy >Priority: P2 > > *Context:* > Data Enrichment Pattern with 2 Sources. > Source 1: Google Cloud PubSub delivering JSON messages > Source 2: Google Cloud Storage Files (acts as a Side Input) > *Steps:* > 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based > on file path inside a PCollection and convert each record as a tuple with 2 > fields > 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to > the main input that is being loaded from PubSub. > 3. Expectation is that the side inputs should be available but for some > reason AsDict is not containing all the data that was loaded from GCS > > *Possible Issues:* > Below are few possible issues that can be ruled out as I already > validated them. > # Window Mismatches - Main Input window is within the scope of Side Input > window. > # Delay in updating Side Input State - Pipeline has just 1 VM and Side > Input has only 8 json messages and total size of side input is around 40 KB > > *Troubleshooting:* > # Working fine in a DirectRunner > # Validated that ReadAllFromText transform is loading all the data properly > from multiple files with subsequent transform building KV as well. However > when the output of KV transform is used as a sideinput using ASDict() for > some reason certain elements are skipped and not available inside look up. > *Code* > Will update the link
[jira] [Updated] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input
[ https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prathap Kumar Parvathareddy updated BEAM-10148: --- Description: *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code shortly that provides more visibility was: *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code that provides more visibility > Data loaded using ReadAllFromText is not projected properly as side input > - > > Key: BEAM-10148 > URL: https://issues.apache.org/jira/browse/BEAM-10148 > Project: Beam > Issue Type: Bug > Components: io-py-files >Affects Versions: 2.19.0 > Environment: Runner: Dataflow Runner > Beam SDK: Python 2.19 >Reporter: Prathap Kumar Parvathareddy >Priority: P2 > > *Context:* > Data Enrichment Pattern with 2 Sources. > Source 1: Google Cloud PubSub delivering JSON messages > Source 2: Google Cloud Storage Files (acts as a Side Input) > *Steps:* > 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based > on file path inside a PCollection and convert each record as a tuple with 2 > fields > 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to > the main input that is being loaded from PubSub. > 3. Expectation is that the side inputs should be available but for some > reason AsDict is not containing all the data that was loaded from GCS > *Possible Issues:* > # Window Mismatches - Main Input window is within the scope of Side Input > window. > # Delay in updating Side Input State - Pipeline has just 1 VM and Side > Input has only 8 json messages and total size of side input is around 40 KB > > *Troubleshooting:* > # Working fine in a DirectRunner > # Validated that ReadAllFromText transform is loading all the data properly > from multiple files with subsequent transform building KV as well. However > when the output of KV transform is used as a sideinput using ASDict() for > some reason certain elements are skipped and not available inside look up. > *Code* > Will update the link pointing to complete code shortly that provides more > visibility > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438498=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438498 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 23:14 Start Date: 28/May/20 23:14 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635664997 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438498) Time Spent: 3h 20m (was: 3h 10m) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input
[ https://issues.apache.org/jira/browse/BEAM-10148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prathap Kumar Parvathareddy updated BEAM-10148: --- Description: *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code shortly which helps in providing more visibility. was: *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code shortly that provides more visibility > Data loaded using ReadAllFromText is not projected properly as side input > - > > Key: BEAM-10148 > URL: https://issues.apache.org/jira/browse/BEAM-10148 > Project: Beam > Issue Type: Bug > Components: io-py-files >Affects Versions: 2.19.0 > Environment: Runner: Dataflow Runner > Beam SDK: Python 2.19 >Reporter: Prathap Kumar Parvathareddy >Priority: P2 > > *Context:* > Data Enrichment Pattern with 2 Sources. > Source 1: Google Cloud PubSub delivering JSON messages > Source 2: Google Cloud Storage Files (acts as a Side Input) > *Steps:* > 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based > on file path inside a PCollection and convert each record as a tuple with 2 > fields > 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to > the main input that is being loaded from PubSub. > 3. Expectation is that the side inputs should be available but for some > reason AsDict is not containing all the data that was loaded from GCS > *Possible Issues:* > # Window Mismatches - Main Input window is within the scope of Side Input > window. > # Delay in updating Side Input State - Pipeline has just 1 VM and Side > Input has only 8 json messages and total size of side input is around 40 KB > > *Troubleshooting:* > # Working fine in a DirectRunner > # Validated that ReadAllFromText transform is loading all the data properly > from multiple files with subsequent transform building KV as well. However > when the output of KV transform is used as a sideinput using ASDict() for > some reason certain elements are skipped and not available inside look up. > *Code* > Will update the link pointing to complete code shortly which helps in > providing more visibility. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10148) Data loaded using ReadAllFromText is not projected properly as side input
Prathap Kumar Parvathareddy created BEAM-10148: -- Summary: Data loaded using ReadAllFromText is not projected properly as side input Key: BEAM-10148 URL: https://issues.apache.org/jira/browse/BEAM-10148 Project: Beam Issue Type: Bug Components: io-py-files Affects Versions: 2.19.0 Environment: Runner: Dataflow Runner Beam SDK: Python 2.19 Reporter: Prathap Kumar Parvathareddy *Context:* Data Enrichment Pattern with 2 Sources. Source 1: Google Cloud PubSub delivering JSON messages Source 2: Google Cloud Storage Files (acts as a Side Input) *Steps:* 1. Load the data from GCS (Google Cloud Storage) using ReadAllFromText based on file path inside a PCollection and convert each record as a tuple with 2 fields 2. Project the Tuple loaded in step 1 as a side input using Pvalue.ASDict to the main input that is being loaded from PubSub. 3. Expectation is that the side inputs should be available but for some reason AsDict is not containing all the data that was loaded from GCS *Possible Issues:* # Window Mismatches - Main Input window is within the scope of Side Input window. # Delay in updating Side Input State - Pipeline has just 1 VM and Side Input has only 8 json messages and total size of side input is around 40 KB *Troubleshooting:* # Working fine in a DirectRunner # Validated that ReadAllFromText transform is loading all the data properly from multiple files with subsequent transform building KV as well. However when the output of KV transform is used as a sideinput using ASDict() for some reason certain elements are skipped and not available inside look up. *Code* Will update the link pointing to complete code that provides more visibility -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438497 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 28/May/20 23:11 Start Date: 28/May/20 23:11 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635664357 Looks like the PreCommit failed with "Exception: Dataflow only supports Python versions 2 and 3.5+, got: (3, 8)". Is that a known failure? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438497) Time Spent: 5h 40m (was: 5.5h) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P2 > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9964) Setting workerCacheMb to make its way to the WindmillStateCache Constructor
[ https://issues.apache.org/jira/browse/BEAM-9964?focusedWorklogId=438496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438496 ] ASF GitHub Bot logged work on BEAM-9964: Author: ASF GitHub Bot Created on: 28/May/20 23:10 Start Date: 28/May/20 23:10 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11849: URL: https://github.com/apache/beam/pull/11849#issuecomment-635663865 gahh so sorry that I missed this. I guess you did have to end up contributing this : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438496) Time Spent: 4h 50m (was: 4h 40m) > Setting workerCacheMb to make its way to the WindmillStateCache Constructor > --- > > Key: BEAM-9964 > URL: https://issues.apache.org/jira/browse/BEAM-9964 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Omar Ismail >Assignee: Omar Ismail >Priority: P3 > Fix For: 2.22.0 > > Time Spent: 4h 50m > Remaining Estimate: 0h > > Setting --workerCacheMB seems to affect batch pipelines only. For Streaming, > the cache seems to be hardcoded to 100Mb [1]. If possible, I would like to > make it allowable to change the cache value in Streaming when setting > -workerCacheMB. > I've never made changes to the Beam SDK, so I am super excited to work on > this! > > [[1] > https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73|https://github.com/apache/beam/blob/5e659bb80bcbf70795f6806e05a255ee72706d9f/runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/WindmillStateCache.java#L73] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10144) Update pipeline options snippets for best practices
[ https://issues.apache.org/jira/browse/BEAM-10144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10144: Status: Open (was: Triage Needed) > Update pipeline options snippets for best practices > --- > > Key: BEAM-10144 > URL: https://issues.apache.org/jira/browse/BEAM-10144 > Project: Beam > Issue Type: Improvement > Components: examples-python >Reporter: David Cavazos >Assignee: David Cavazos >Priority: P3 > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10112) Add python sdk state and timer examples to website
[ https://issues.apache.org/jira/browse/BEAM-10112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10112: Status: Open (was: Triage Needed) > Add python sdk state and timer examples to website > -- > > Key: BEAM-10112 > URL: https://issues.apache.org/jira/browse/BEAM-10112 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Yichi Zhang >Assignee: Yichi Zhang >Priority: P2 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7746) Add type hints to python code
[ https://issues.apache.org/jira/browse/BEAM-7746?focusedWorklogId=438493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438493 ] ASF GitHub Bot logged work on BEAM-7746: Author: ASF GitHub Bot Created on: 28/May/20 23:04 Start Date: 28/May/20 23:04 Worklog Time Spent: 10m Work Description: robertwb commented on a change in pull request #11632: URL: https://github.com/apache/beam/pull/11632#discussion_r432170796 ## File path: sdks/python/apache_beam/dataframe/transforms.py ## @@ -16,13 +16,28 @@ from __future__ import absolute_import +import typing +from typing import Any +from typing import Dict +from typing import List +from typing import Mapping +from typing import Tuple +from typing import TypeVar +from typing import Union + import pandas as pd import apache_beam as beam from apache_beam import transforms from apache_beam.dataframe import expressions from apache_beam.dataframe import frames # pylint: disable=unused-import +if typing.TYPE_CHECKING: Review comment: +1 for consistency. Changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438493) Time Spent: 85h (was: 84h 50m) > Add type hints to python code > - > > Key: BEAM-7746 > URL: https://issues.apache.org/jira/browse/BEAM-7746 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: P2 > Time Spent: 85h > Remaining Estimate: 0h > > As a developer of the beam source code, I would like the code to use pep484 > type hints so that I can clearly see what types are required, get completion > in my IDE, and enforce code correctness via a static analyzer like mypy. > This may be considered a precursor to BEAM-7060 > Work has been started here: [https://github.com/apache/beam/pull/9056] > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438492=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438492 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 28/May/20 23:03 Start Date: 28/May/20 23:03 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11854: URL: https://github.com/apache/beam/pull/11854#issuecomment-635657611 R: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438492) Time Spent: 23h 40m (was: 23.5h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 23h 40m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438491=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438491 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 28/May/20 23:02 Start Date: 28/May/20 23:02 Worklog Time Spent: 10m Work Description: chamikaramj opened a new pull request #11854: URL: https://github.com/apache/beam/pull/11854 Without this cross-language KafkaIO users may have to do pipeline.run(False) instead of pipeline.run() when executing a pipeline using Dataflow. @TheNeuralBit this should be a pretty safe change to cherry-pick if you are OK with it. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438488=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438488 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 28/May/20 22:57 Start Date: 28/May/20 22:57 Worklog Time Spent: 10m Work Description: robertwb merged pull request #11844: URL: https://github.com/apache/beam/pull/11844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438488) Time Spent: 23h 10m (was: 23h) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 23h 10m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8019) Support cross-language transforms for DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-8019?focusedWorklogId=438489=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438489 ] ASF GitHub Bot logged work on BEAM-8019: Author: ASF GitHub Bot Created on: 28/May/20 22:57 Start Date: 28/May/20 22:57 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635651784 Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438489) Time Spent: 23h 20m (was: 23h 10m) > Support cross-language transforms for DataflowRunner > > > Key: BEAM-8019 > URL: https://issues.apache.org/jira/browse/BEAM-8019 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Chamikara Madhusanka Jayalath >Assignee: Chamikara Madhusanka Jayalath >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 23h 20m > Remaining Estimate: 0h > > This is to capture the Beam changes needed for this task. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10136) Add cross-language wrapper for Java's JdbcIO Write
[ https://issues.apache.org/jira/browse/BEAM-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10136: Status: Open (was: Triage Needed) > Add cross-language wrapper for Java's JdbcIO Write > -- > > Key: BEAM-10136 > URL: https://issues.apache.org/jira/browse/BEAM-10136 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrapper for Java's Jdbc Write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10140) Add cross-language wrapper for Java's SpannerIO Read
[ https://issues.apache.org/jira/browse/BEAM-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10140: Status: Open (was: Triage Needed) > Add cross-language wrapper for Java's SpannerIO Read > > > Key: BEAM-10140 > URL: https://issues.apache.org/jira/browse/BEAM-10140 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrapper for Java's SpannerIO Read -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10137) Add cross-language wrapper for Java's KinesisIO Read
[ https://issues.apache.org/jira/browse/BEAM-10137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10137: Status: Open (was: Triage Needed) > Add cross-language wrapper for Java's KinesisIO Read > > > Key: BEAM-10137 > URL: https://issues.apache.org/jira/browse/BEAM-10137 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrapper for Java's KinesisIO Read -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10138) Add cross-language wrapper for Java's KinesisIO Write
[ https://issues.apache.org/jira/browse/BEAM-10138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10138: Status: Open (was: Triage Needed) > Add cross-language wrapper for Java's KinesisIO Write > - > > Key: BEAM-10138 > URL: https://issues.apache.org/jira/browse/BEAM-10138 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrapper for Java's KinesisIO Write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10139) Add cross-language wrapper for Java's SpannerIO Write
[ https://issues.apache.org/jira/browse/BEAM-10139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10139: Status: Open (was: Triage Needed) > Add cross-language wrapper for Java's SpannerIO Write > - > > Key: BEAM-10139 > URL: https://issues.apache.org/jira/browse/BEAM-10139 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrapper for Java's SpannerIO Write -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10144) Update pipeline options snippets for best practices
[ https://issues.apache.org/jira/browse/BEAM-10144?focusedWorklogId=438486=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438486 ] ASF GitHub Bot logged work on BEAM-10144: - Author: ASF GitHub Bot Created on: 28/May/20 22:49 Start Date: 28/May/20 22:49 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11851: URL: https://github.com/apache/beam/pull/11851#issuecomment-635648807 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438486) Time Spent: 1h 20m (was: 1h 10m) > Update pipeline options snippets for best practices > --- > > Key: BEAM-10144 > URL: https://issues.apache.org/jira/browse/BEAM-10144 > Project: Beam > Issue Type: Improvement > Components: examples-python >Reporter: David Cavazos >Assignee: David Cavazos >Priority: P3 > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10135) Add cross-language wrapper for Java's JdbcIO Read
[ https://issues.apache.org/jira/browse/BEAM-10135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10135: Status: Open (was: Triage Needed) > Add cross-language wrapper for Java's JdbcIO Read > - > > Key: BEAM-10135 > URL: https://issues.apache.org/jira/browse/BEAM-10135 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrapper for Java's Jdbc Read -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10134) Add Cross-language wrappers for Java IOs
[ https://issues.apache.org/jira/browse/BEAM-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ismaël Mejía updated BEAM-10134: Status: Open (was: Triage Needed) > Add Cross-language wrappers for Java IOs > > > Key: BEAM-10134 > URL: https://issues.apache.org/jira/browse/BEAM-10134 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: Not applicable >Reporter: Piotr Szuberski >Priority: P2 > Labels: portability > Fix For: Not applicable > > > Add cross-language wrappers for Java IOs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Damon Douglas updated BEAM-9679: Description: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. ||Transform||Pull Request||Status|| |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| |Combine| | | |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed| |Partition| | | |Side Input| | | |Side Output| | | |Branching| | | |Composite Transform| | | |DoFn Additional Parameters| | | was: A kata devoted to core beam transforms patterns after [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] where the take away is an individual's ability to master the following using an Apache Beam pipeline using the Golang SDK. ||Transform||Pull Request||Status|| |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| |Combine| | | |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Open| |Partition| | | |Side Input| | | |Side Output| | | |Branching| | | |Composite Transform| | | |DoFn Additional Parameters| | | > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 5h > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > > ||Transform||Pull Request||Status|| > |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| > |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| > |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| > |Combine| | | > |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Closed| > |Partition| | | > |Side Input| | | > |Side Output| | | > |Branching| | | > |Composite Transform| | | > |DoFn Additional Parameters| | | -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9679) Core Transforms | Go SDK Code Katas
[ https://issues.apache.org/jira/browse/BEAM-9679?focusedWorklogId=438481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438481 ] ASF GitHub Bot logged work on BEAM-9679: Author: ASF GitHub Bot Created on: 28/May/20 22:37 Start Date: 28/May/20 22:37 Worklog Time Spent: 10m Work Description: damondouglas commented on pull request #11803: URL: https://github.com/apache/beam/pull/11803#issuecomment-635644834 I updated [the stepik course](https://stepik.org/course/70387) and commited the updated `*-remote.yaml` files to this PR. It is ready to merge into master. Thank you, @henryken and @lostluck for your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438481) Time Spent: 5h (was: 4h 50m) > Core Transforms | Go SDK Code Katas > --- > > Key: BEAM-9679 > URL: https://issues.apache.org/jira/browse/BEAM-9679 > Project: Beam > Issue Type: Sub-task > Components: katas, sdk-go >Reporter: Damon Douglas >Assignee: Damon Douglas >Priority: P2 > Time Spent: 5h > Remaining Estimate: 0h > > A kata devoted to core beam transforms patterns after > [https://github.com/apache/beam/tree/master/learning/katas/java/Core%20Transforms] > where the take away is an individual's ability to master the following using > an Apache Beam pipeline using the Golang SDK. > > ||Transform||Pull Request||Status|| > |Map|[11564|https://github.com/apache/beam/pull/11564]|Closed| > |GroupByKey|[11734|https://github.com/apache/beam/pull/11734]|Closed| > |CoGroupByKey|[11803|https://github.com/apache/beam/pull/11803]|Open| > |Combine| | | > |Flatten|[11806|https://github.com/apache/beam/pull/11806]|Open| > |Partition| | | > |Side Input| | | > |Side Output| | | > |Branching| | | > |Composite Transform| | | > |DoFn Additional Parameters| | | -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438480=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438480 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 22:37 Start Date: 28/May/20 22:37 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635644599 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438480) Time Spent: 3h 10m (was: 3h) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438479=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438479 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 28/May/20 22:26 Start Date: 28/May/20 22:26 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635640892 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438479) Time Spent: 5.5h (was: 5h 20m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P2 > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438478=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438478 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 28/May/20 22:25 Start Date: 28/May/20 22:25 Worklog Time Spent: 10m Work Description: pabloem commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635640799 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438478) Time Spent: 5h 20m (was: 5h 10m) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P2 > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9692) Clean Python DataflowRunner to use portable pipelines
[ https://issues.apache.org/jira/browse/BEAM-9692?focusedWorklogId=438477=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438477 ] ASF GitHub Bot logged work on BEAM-9692: Author: ASF GitHub Bot Created on: 28/May/20 22:23 Start Date: 28/May/20 22:23 Worklog Time Spent: 10m Work Description: rohdesamuel commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635640073 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438477) Time Spent: 5h 10m (was: 5h) > Clean Python DataflowRunner to use portable pipelines > - > > Key: BEAM-9692 > URL: https://issues.apache.org/jira/browse/BEAM-9692 > Project: Beam > Issue Type: Improvement > Components: runner-dataflow >Reporter: Sam Rohde >Assignee: Sam Rohde >Priority: P2 > Time Spent: 5h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8280) re-enable IOTypeHints.from_callable
[ https://issues.apache.org/jira/browse/BEAM-8280?focusedWorklogId=438476=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438476 ] ASF GitHub Bot logged work on BEAM-8280: Author: ASF GitHub Bot Created on: 28/May/20 22:14 Start Date: 28/May/20 22:14 Worklog Time Spent: 10m Work Description: udim merged pull request #11070: URL: https://github.com/apache/beam/pull/11070 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438476) Time Spent: 15h 50m (was: 15h 40m) > re-enable IOTypeHints.from_callable > --- > > Key: BEAM-8280 > URL: https://issues.apache.org/jira/browse/BEAM-8280 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Udi Meiri >Assignee: Udi Meiri >Priority: P2 > Fix For: 2.21.0 > > Time Spent: 15h 50m > Remaining Estimate: 0h > > See https://issues.apache.org/jira/browse/BEAM-8279 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds
[ https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119112#comment-17119112 ] Ankur Goenka edited comment on BEAM-10115 at 5/28/20, 10:12 PM: An action item from this seems to be cleaning /tmp/dataflow-requirements-cache before uploading the requirements as it might contain old artifacts. https://issues.apache.org/jira/browse/BEAM-10147 was (Author: angoenka): An action item from this seems to be cleaning /tmp/dataflow-requirements-cache before uploading the requirements as it might contain old artifacts. > Staging requirements.txt fails but staging setup.py succeeds > > > Key: BEAM-10115 > URL: https://issues.apache.org/jira/browse/BEAM-10115 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Kenneth Knowles >Priority: P2 > > User reports on StackOverflow: > https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python > The issue appears to be a problem with staging, and a difference between > using `requirements.txt` and `setup.py` for some reason. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10147) Clean /tmp/dataflow-requirements-cache before downloading the requirements from requirements.txt
Ankur Goenka created BEAM-10147: --- Summary: Clean /tmp/dataflow-requirements-cache before downloading the requirements from requirements.txt Key: BEAM-10147 URL: https://issues.apache.org/jira/browse/BEAM-10147 Project: Beam Issue Type: Bug Components: runner-dataflow, sdk-py-core Reporter: Ankur Goenka when using requirements.txt with beam pipelines, /tmp/dataflow-requirements-cache is not cleaned and all the new dependencies are downloaded and then all the files in /tmp/dataflow-requirements-cache are staged which is wasteful and unecessary. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10115) Staging requirements.txt fails but staging setup.py succeeds
[ https://issues.apache.org/jira/browse/BEAM-10115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119112#comment-17119112 ] Ankur Goenka commented on BEAM-10115: - An action item from this seems to be cleaning /tmp/dataflow-requirements-cache before uploading the requirements as it might contain old artifacts. > Staging requirements.txt fails but staging setup.py succeeds > > > Key: BEAM-10115 > URL: https://issues.apache.org/jira/browse/BEAM-10115 > Project: Beam > Issue Type: Bug > Components: runner-dataflow, sdk-py-core >Reporter: Kenneth Knowles >Priority: P2 > > User reports on StackOverflow: > https://stackoverflow.com/questions/62032382/dataflow-fails-when-i-add-requirements-txt-python > The issue appears to be a problem with staging, and a difference between > using `requirements.txt` and `setup.py` for some reason. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438470 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 21:59 Start Date: 28/May/20 21:59 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635630717 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438470) Time Spent: 3h (was: 2h 50m) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 3h > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438468=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438468 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 28/May/20 21:47 Start Date: 28/May/20 21:47 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432142992 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( Review comment: updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438468) Time Spent: 1h 10m (was: 1h) > adding cross-language KafkaIO integration test > -- > > Key: BEAM-10125 > URL: https://issues.apache.org/jira/browse/BEAM-10125 > Project: Beam >
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438466=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438466 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 28/May/20 21:47 Start Date: 28/May/20 21:47 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432142695 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( + ['java', '-jar', expansion_service_jar_file, expansion_service_port]) + kafka_server = subprocess.Popen( Review comment: Yes, this is for external testing (probably only for small scale correctness tests, we may still need kubernetes cluster for large scale performance tests). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438466)
[jira] [Work logged] (BEAM-10125) adding cross-language KafkaIO integration test
[ https://issues.apache.org/jira/browse/BEAM-10125?focusedWorklogId=438467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438467 ] ASF GitHub Bot logged work on BEAM-10125: - Author: ASF GitHub Bot Created on: 28/May/20 21:47 Start Date: 28/May/20 21:47 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432142814 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( + ['java', '-jar', expansion_service_jar_file, expansion_service_port]) + kafka_server = subprocess.Popen( + ['java', '-jar', local_kafka_jar_file, kafka_port, zookeeper_port]) + time.sleep(3) + channel_creds = grpc.local_channel_credentials() + with grpc.secure_channel('localhost:%s' % expansion_service_port, + channel_creds) as channel: +grpc.channel_ready_future(channel).result() + + yield expansion_service_port, kafka_port +finally: + if expansion_server: +expansion_server.kill() + if kafka_server: +kafka_server.kill() + + def get_options(self): +options = PipelineOptions([ +
[jira] [Work logged] (BEAM-9198) BeamSQL aggregation analytics functionality
[ https://issues.apache.org/jira/browse/BEAM-9198?focusedWorklogId=438465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438465 ] ASF GitHub Bot logged work on BEAM-9198: Author: ASF GitHub Bot Created on: 28/May/20 21:44 Start Date: 28/May/20 21:44 Worklog Time Spent: 10m Work Description: jhnmora000 closed pull request #11845: URL: https://github.com/apache/beam/pull/11845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438465) Time Spent: 50m (was: 40m) > BeamSQL aggregation analytics functionality > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: P2 > Labels: gsoc, gsoc2020, mentor > Time Spent: 50m > Remaining Estimate: 0h > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > - > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9198) BeamSQL aggregation analytics functionality
[ https://issues.apache.org/jira/browse/BEAM-9198?focusedWorklogId=438464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438464 ] ASF GitHub Bot logged work on BEAM-9198: Author: ASF GitHub Bot Created on: 28/May/20 21:44 Start Date: 28/May/20 21:44 Worklog Time Spent: 10m Work Description: jhnmora000 commented on pull request #11845: URL: https://github.com/apache/beam/pull/11845#issuecomment-635624778 Thanks for your help @amaliujia . I will close this PR and continue experimenting with BeamSQL/Calcite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438464) Time Spent: 40m (was: 0.5h) > BeamSQL aggregation analytics functionality > > > Key: BEAM-9198 > URL: https://issues.apache.org/jira/browse/BEAM-9198 > Project: Beam > Issue Type: New Feature > Components: dsl-sql >Reporter: Rui Wang >Assignee: John Mora >Priority: P2 > Labels: gsoc, gsoc2020, mentor > Time Spent: 40m > Remaining Estimate: 0h > > Mentor email: ruw...@google.com. Feel free to send emails for your questions. > Project Information > - > BeamSQL has a long list of of aggregation/aggregation analytics > functionalities to support. > To begin with, you will need to support this syntax: > {code:sql} > analytic_function_name ( [ argument_list ] ) > OVER ( > [ PARTITION BY partition_expression_list ] > [ ORDER BY expression [{ ASC | DESC }] [, ...] ] > [ window_frame_clause ] > ) > {code} > As there is a long list of analytics functions, a good start point is support > rank() first. > This will requires touch core components of BeamSQL: > 1. SQL parser to support the syntax above. > 2. SQL core to implement physical relational operator. > 3. Distributed algorithms to implement a list of functions in a distributed > manner. > 4. Enable in ZetaSQL dialect. > To understand what SQL analytics functionality is, you could check this great > explanation doc: > https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts. > To know about Beam's programming model, check: > https://beam.apache.org/documentation/programming-guide/#overview -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9876) Migrate the Beam website from Jekyll to Hugo to enable localization of the site content
[ https://issues.apache.org/jira/browse/BEAM-9876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aizhamal Nurmamat kyzy resolved BEAM-9876. -- Fix Version/s: 2.21.0 Resolution: Fixed > Migrate the Beam website from Jekyll to Hugo to enable localization of the > site content > --- > > Key: BEAM-9876 > URL: https://issues.apache.org/jira/browse/BEAM-9876 > Project: Beam > Issue Type: Task > Components: website >Reporter: Aizhamal Nurmamat kyzy >Assignee: Aizhamal Nurmamat kyzy >Priority: P2 > Fix For: 2.21.0 > > Time Spent: 12.5h > Remaining Estimate: 0h > > Enable internationalization of the Apache Beam website to increase the reach > of the project, and facilitate adoption and growth of its community. > The proposal was to do this by migrating the current Apache Beam website from > Jekyll do Hugo [1]. Hugo supports internationalization out-of-the-box, making > it easier both for contributors and maintainers support the > internationalization effort. > The further discussion on implementation can be viewed here [2] > [1] > [https://lists.apache.org/thread.html/rfab4cc1411318c3f4667bee051df68f37be11846ada877f3576c41a9%40%3Cdev.beam.apache.org%3E] > [2] > [https://lists.apache.org/thread.html/r6b999b6d7d1f6cbb94e16bb2deed2b65098a6b14c4ac98707fe0c36a%40%3Cdev.beam.apache.org%3E] > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438462 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 21:35 Start Date: 28/May/20 21:35 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635620991 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438462) Time Spent: 2h 40m (was: 2.5h) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438463 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 21:35 Start Date: 28/May/20 21:35 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635621052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438463) Time Spent: 2h 50m (was: 2h 40m) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438461 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 21:35 Start Date: 28/May/20 21:35 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635620885 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438461) Time Spent: 2.5h (was: 2h 20m) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9935) Resolve differences in allowed_split_point implementations
[ https://issues.apache.org/jira/browse/BEAM-9935?focusedWorklogId=438460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438460 ] ASF GitHub Bot logged work on BEAM-9935: Author: ASF GitHub Bot Created on: 28/May/20 21:31 Start Date: 28/May/20 21:31 Worklog Time Spent: 10m Work Description: youngoli merged pull request #11791: URL: https://github.com/apache/beam/pull/11791 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438460) Time Spent: 3h 20m (was: 3h 10m) > Resolve differences in allowed_split_point implementations > -- > > Key: BEAM-9935 > URL: https://issues.apache.org/jira/browse/BEAM-9935 > Project: Beam > Issue Type: Bug > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Luke Cwik >Assignee: Daniel Oliveira >Priority: P2 > Time Spent: 3h 20m > Remaining Estimate: 0h > > [Java SDK > harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223] > doesn't support it yet which is also safe. > [Go SDK > harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273] > only supports splits if points are specified and it doesn't use the fraction > at all. > [Python SDK > harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947] > ignores the split points meaning that it may return an invalid split > location based upon the runners limitations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10144) Update pipeline options snippets for best practices
[ https://issues.apache.org/jira/browse/BEAM-10144?focusedWorklogId=438458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438458 ] ASF GitHub Bot logged work on BEAM-10144: - Author: ASF GitHub Bot Created on: 28/May/20 21:29 Start Date: 28/May/20 21:29 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #11851: URL: https://github.com/apache/beam/pull/11851#issuecomment-635618216 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438458) Time Spent: 1h 10m (was: 1h) > Update pipeline options snippets for best practices > --- > > Key: BEAM-10144 > URL: https://issues.apache.org/jira/browse/BEAM-10144 > Project: Beam > Issue Type: Improvement > Components: examples-python >Reporter: David Cavazos >Assignee: David Cavazos >Priority: P3 > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438453 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 28/May/20 21:23 Start Date: 28/May/20 21:23 Worklog Time Spent: 10m Work Description: TheNeuralBit commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635615410 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438453) Time Spent: 2h (was: 1h 50m) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 2h > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9935) Resolve differences in allowed_split_point implementations
[ https://issues.apache.org/jira/browse/BEAM-9935?focusedWorklogId=438452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438452 ] ASF GitHub Bot logged work on BEAM-9935: Author: ASF GitHub Bot Created on: 28/May/20 21:22 Start Date: 28/May/20 21:22 Worklog Time Spent: 10m Work Description: youngoli commented on pull request #11791: URL: https://github.com/apache/beam/pull/11791#issuecomment-635615190 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438452) Time Spent: 3h 10m (was: 3h) > Resolve differences in allowed_split_point implementations > -- > > Key: BEAM-9935 > URL: https://issues.apache.org/jira/browse/BEAM-9935 > Project: Beam > Issue Type: Bug > Components: sdk-go, sdk-java-harness, sdk-py-harness >Reporter: Luke Cwik >Assignee: Daniel Oliveira >Priority: P2 > Time Spent: 3h 10m > Remaining Estimate: 0h > > [Java SDK > harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/BeamFnDataReadRunner.java#L223] > doesn't support it yet which is also safe. > [Go SDK > harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/go/pkg/beam/core/runtime/exec/datasource.go#L273] > only supports splits if points are specified and it doesn't use the fraction > at all. > [Python SDK > harness|https://github.com/apache/beam/blob/d82d061aa303430f3d2853f397f3130fae6200cd/sdks/python/apache_beam/runners/worker/bundle_processor.py#L947] > ignores the split points meaning that it may return an invalid split > location based upon the runners limitations. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10097) Migrate PCollection views to use both iterable and multimap materializations/access patterns
[ https://issues.apache.org/jira/browse/BEAM-10097?focusedWorklogId=438450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438450 ] ASF GitHub Bot logged work on BEAM-10097: - Author: ASF GitHub Bot Created on: 28/May/20 21:19 Start Date: 28/May/20 21:19 Worklog Time Spent: 10m Work Description: lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635613803 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438450) Time Spent: 2h 20m (was: 2h 10m) > Migrate PCollection views to use both iterable and multimap > materializations/access patterns > > > Key: BEAM-10097 > URL: https://issues.apache.org/jira/browse/BEAM-10097 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core, sdk-java-harness >Reporter: Luke Cwik >Assignee: Luke Cwik >Priority: P2 > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently all the PCollection views have a trival mapping from KV Iterable> to the view that is being requested (singleton, iterable, list, > map, multimap. > We should be using the primitive views (iterable, multimap) directly without > going through the naive mapping. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-8910) Use AVRO instead of JSON in BigQuery bounded source.
[ https://issues.apache.org/jira/browse/BEAM-8910?focusedWorklogId=438440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438440 ] ASF GitHub Bot logged work on BEAM-8910: Author: ASF GitHub Bot Created on: 28/May/20 21:05 Start Date: 28/May/20 21:05 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11086: URL: https://github.com/apache/beam/pull/11086#discussion_r432112443 ## File path: sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py ## @@ -254,11 +256,36 @@ def test_big_query_new_types(self): 'output_schema': NEW_TYPES_OUTPUT_SCHEMA, 'use_standard_sql': False, 'wait_until_finish_duration': WAIT_UNTIL_FINISH_DURATION_MS, +'use_json_exports': True, 'on_success_matcher': all_of(*pipeline_verifiers) } options = self.test_pipeline.get_full_options_as_args(**extra_opts) big_query_query_to_table_pipeline.run_bq_pipeline(options) + @attr('IT') + def test_big_query_new_types(self): Review comment: Looks like we run this test internally using this name. Since this test now changes to `use_beam_bq_sink`, please check that maintain reasonable internal test coverage for the native sink, or use a new test name for new behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438440) Time Spent: 14h 20m (was: 14h 10m) > Use AVRO instead of JSON in BigQuery bounded source. > > > Key: BEAM-8910 > URL: https://issues.apache.org/jira/browse/BEAM-8910 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Kamil Wasilewski >Assignee: Pablo Estrada >Priority: P3 > Time Spent: 14h 20m > Remaining Estimate: 0h > > The proposed BigQuery bounded source in Python SDK (see PR: > [https://github.com/apache/beam/pull/9772)] uses a BigQuery export job to > take a snapshot of the table and read from each produced JSON file. A > performance improvement can be gain by switching to AVRO instead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9852) C-ares status is not ARES_SUCCESS: Misformatted domain name
[ https://issues.apache.org/jira/browse/BEAM-9852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119077#comment-17119077 ] Kyle Weaver commented on BEAM-9852: --- The error message was improved in GRPC, so hopefully we will get more debugging information in the future. https://github.com/grpc/grpc/pull/22865 In the mean time, it's difficult to ascertain the root cause. > C-ares status is not ARES_SUCCESS: Misformatted domain name > --- > > Key: BEAM-9852 > URL: https://issues.apache.org/jira/browse/BEAM-9852 > Project: Beam > Issue Type: Sub-task > Components: runner-flink, runner-spark >Reporter: Kyle Weaver >Priority: P2 > Labels: portability-flink, portability-spark > > This affects both Flink and Spark portable runners. It does not appear to > cause pipelines to fail. > Exception in thread read_grpc_client_inputs: > Traceback (most recent call last): > File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner > self.run() > File "/usr/local/lib/python3.7/threading.py", line 870, in run > self._target(*self._args, **self._kwargs) > File > "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 545, in > target=lambda: self._read_inputs(elements_iterator), > File > "/usr/local/lib/python3.7/site-packages/apache_beam/runners/worker/data_plane.py", > line 528, in _read_inputs > for elements in elements_iterator: > File "/usr/local/lib/python3.7/site-packages/grpc/channel.py", line 388, in > __next_ > return self._next() > File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 365, in > _next > raise self > grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with: > status = StatusCode.UNAVAILABLE > details = "DNS resolution failed" > debug_error_string = > "{"created":"@1587426512.443144965","description":"Failed to pick > subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3876,"referenced_errors":[{"created":"@1587426512.443142363","description":"Resolver > transient > failure","file":"src/core/ext/filters/client_channel/resolving_lb_policy.cc","file_line":263,"referenced_errors":[{"created":"@1587426512.443141313","description":"DNS > resolution > failed","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/dns_resolver_ares.cc","file_line":357,"grpc_status":14,"referenced_errors":[{"created":"@1587426512.443136986","description":"C-ares > status is not ARES_SUCCESS: Misformatted domain > name","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":244,"referenced_errors":[ > {"created":"@1587426512.443126564","description":"C-ares status is not > ARES_SUCCESS: Misformatted domain > name","file":"src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.cc","file_line":244} > ]}]}]}]}" > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests
[ https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438426=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438426 ] ASF GitHub Bot logged work on BEAM-7774: Author: ASF GitHub Bot Created on: 28/May/20 20:36 Start Date: 28/May/20 20:36 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-635593321 @kamilwu - please merge once this looks good to you, I don't have other input here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438426) Time Spent: 5h 50m (was: 5h 40m) > Stop usign Perfkit Benchmarker tool in Python performance tests > --- > > Key: BEAM-7774 > URL: https://issues.apache.org/jira/browse/BEAM-7774 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Piotr Szuberski >Priority: P2 > Time Spent: 5h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests
[ https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438425 ] ASF GitHub Bot logged work on BEAM-7774: Author: ASF GitHub Bot Created on: 28/May/20 20:34 Start Date: 28/May/20 20:34 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432107251 ## File path: .test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json ## @@ -77,7 +77,7 @@ ], "orderByTime": "ASC", "policy": "default", - "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" WHERE metric = 'Python performance test' AND $timeFilter GROUP BY time($__interval), \"metric\"", + "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" WHERE metric = 'wordcount_it_runtime' AND $timeFilter GROUP BY time($__interval), \"metric\"", Review comment: Seeing this query now - yes, I wound just keep the metric 'runtime', since we already know it is wordcount_py27_results, and it would be simpler that pipeline does not need to know the name of the suite. In the future we might add different metrics like 'cost' or total cputime consumed by other workers as opposed to runtime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438425) Time Spent: 5h 40m (was: 5.5h) > Stop usign Perfkit Benchmarker tool in Python performance tests > --- > > Key: BEAM-7774 > URL: https://issues.apache.org/jira/browse/BEAM-7774 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Piotr Szuberski >Priority: P2 > Time Spent: 5h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438424=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438424 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 28/May/20 20:32 Start Date: 28/May/20 20:32 Worklog Time Spent: 10m Work Description: chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635591331 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438424) Time Spent: 1h 50m (was: 1h 40m) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests
[ https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438423 ] ASF GitHub Bot logged work on BEAM-7774: Author: ASF GitHub Bot Created on: 28/May/20 20:31 Start Date: 28/May/20 20:31 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432106093 ## File path: sdks/python/apache_beam/examples/wordcount_it_test.py ## @@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts): # Register clean up before pipeline execution self.addCleanup(delete_files, [test_output + '*']) +publish_to_bq = bool( +test_pipeline.get_option('publish_to_big_query') or False) + +# Start measure time for performance test +start_time = time.time() + # Get pipeline options from command argument: --test-pipeline-options, # and start pipeline job by calling pipeline main function. run_wordcount( test_pipeline.get_full_options_as_args(**extra_opts), -save_main_session=False) +save_main_session=False, +) + +end_time = time.time() +run_time = end_time - start_time + +if publish_to_bq: + self._publish_metrics(test_pipeline, run_time) + + def _publish_metrics(self, pipeline, metric_value): +influx_options = InfluxDBMetricsPublisherOptions( +pipeline.get_option('influx_measurement'), +pipeline.get_option('influx_db_name'), +pipeline.get_option('influx_hostname'), +os.getenv('INFLUXDB_USER'), +os.getenv('INFLUXDB_USER_PASSWORD'), +) +metric_reader = MetricsReader( +project_name=pipeline.get_option('project'), +bq_table=pipeline.get_option('metrics_table'), +bq_dataset=pipeline.get_option('metrics_dataset'), +publish_to_bq=True, +influxdb_options=influx_options, +) + +metric_reader.publish_values(( +metric_value, Review comment: Do we need "wordcount_it" in the metric name? It depends on how these metrics will be stored, if they are already associated in the database with a test suite that launches the pipeline `Python WordCount IT Benchmarks`, then this information is captured and perhaps we don't need to repeat it in two different places. Leaving this up to you and @kamilwu. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438423) Time Spent: 5.5h (was: 5h 20m) > Stop usign Perfkit Benchmarker tool in Python performance tests > --- > > Key: BEAM-7774 > URL: https://issues.apache.org/jira/browse/BEAM-7774 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Piotr Szuberski >Priority: P2 > Time Spent: 5.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9082) "Socket closed" Spurious GRPC errors in Flink/Spark runner log output
[ https://issues.apache.org/jira/browse/BEAM-9082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119056#comment-17119056 ] Kyle Weaver commented on BEAM-9082: --- Thanks Daniel. FYI I filed a separate issue for 1. (BEAM-9852). > "Socket closed" Spurious GRPC errors in Flink/Spark runner log output > - > > Key: BEAM-9082 > URL: https://issues.apache.org/jira/browse/BEAM-9082 > Project: Beam > Issue Type: Sub-task > Components: runner-flink, runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: P2 > Labels: portability-flink, portability-spark > > We often see "Socket closed" errors on job shutdown, even though the pipeline > has finished successfully. They are misleading and especially annoying at > scale. > ERROR:root:Failed to read inputs in the data plane. > ... > grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that > terminated with: > status = StatusCode.UNAVAILABLE > details = "Socket closed" > debug_error_string = > "{"created":"@1578597616.309419460","description":"Error received from peer > ipv6:[::1]:37211","file":"src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Socket > closed","grpc_status":14}" -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-9220) Add use_runner_v2 argument for dataflow
[ https://issues.apache.org/jira/browse/BEAM-9220?focusedWorklogId=438422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438422 ] ASF GitHub Bot logged work on BEAM-9220: Author: ASF GitHub Bot Created on: 28/May/20 20:29 Start Date: 28/May/20 20:29 Worklog Time Spent: 10m Work Description: lostluck merged pull request #11207: URL: https://github.com/apache/beam/pull/11207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438422) Time Spent: 4h (was: 3h 50m) > Add use_runner_v2 argument for dataflow > --- > > Key: BEAM-9220 > URL: https://issues.apache.org/jira/browse/BEAM-9220 > Project: Beam > Issue Type: New Feature > Components: runner-dataflow >Reporter: Ankur Goenka >Priority: P2 > Fix For: 2.20.0 > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10027) Support for Kotlin-based Beam Katas
[ https://issues.apache.org/jira/browse/BEAM-10027?focusedWorklogId=438421=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438421 ] ASF GitHub Bot logged work on BEAM-10027: - Author: ASF GitHub Bot Created on: 28/May/20 20:28 Start Date: 28/May/20 20:28 Worklog Time Spent: 10m Work Description: rionmonster commented on a change in pull request #11761: URL: https://github.com/apache/beam/pull/11761#discussion_r432104332 ## File path: learning/katas/kotlin/Windowing/Fixed Time Window/Fixed Time Window/test/org/apache/beam/learning/katas/windowing/fixedwindow/WindowedEvent.kt ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.learning.katas.windowing.fixedwindow + +import java.io.Serializable +import java.util.* + +class WindowedEvent(private val event: String?, private val count: Long, private val window: String) : Serializable { Review comment: Done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438421) Time Spent: 11h 20m (was: 11h 10m) > Support for Kotlin-based Beam Katas > --- > > Key: BEAM-10027 > URL: https://issues.apache.org/jira/browse/BEAM-10027 > Project: Beam > Issue Type: Improvement > Components: katas >Reporter: Rion Williams >Assignee: Rion Williams >Priority: P2 > Original Estimate: 8h > Time Spent: 11h 20m > Remaining Estimate: 0h > > Currently, there are a series of examples available demonstrating the use of > Apache Beam with Kotlin. It would be nice to have support for the same Beam > Katas that exist for Python, Go, and Java to also support Kotlin. > The port itself shouldn't be that involved since it can still target the JVM, > so it would likely just require the inclusion for Kotlin dependencies and a > conversion for all of the existing Java examples. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10027) Support for Kotlin-based Beam Katas
[ https://issues.apache.org/jira/browse/BEAM-10027?focusedWorklogId=438420=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438420 ] ASF GitHub Bot logged work on BEAM-10027: - Author: ASF GitHub Bot Created on: 28/May/20 20:25 Start Date: 28/May/20 20:25 Worklog Time Spent: 10m Work Description: rionmonster commented on a change in pull request #11761: URL: https://github.com/apache/beam/pull/11761#discussion_r432103074 ## File path: learning/katas/kotlin/Core Transforms/Combine/CombineFn/src/org/apache/beam/learning/katas/coretransforms/combine/combinefn/Task.kt ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.learning.katas.coretransforms.combine.combinefn + +import org.apache.beam.learning.katas.coretransforms.combine.combinefn.Task.AverageFn.Accum +import org.apache.beam.learning.katas.util.Log +import org.apache.beam.sdk.Pipeline +import org.apache.beam.sdk.options.PipelineOptionsFactory +import org.apache.beam.sdk.transforms.Combine +import org.apache.beam.sdk.transforms.Combine.CombineFn +import org.apache.beam.sdk.transforms.Create +import org.apache.beam.sdk.values.PCollection +import java.io.Serializable +import java.util.* + +object Task { +@JvmStatic +fun main(args: Array) { +val options = PipelineOptionsFactory.fromArgs(*args).create() +val pipeline = Pipeline.create(options) + +val numbers = pipeline.apply(Create.of(10, 20, 50, 70, 90)) + +val output = applyTransform(numbers) + +output.apply(Log.ofElements()) + +pipeline.run() +} + +@JvmStatic +fun applyTransform(input: PCollection): PCollection { +return input.apply(Combine.globally(AverageFn())) +} + +internal class AverageFn : CombineFn() { + +internal inner class Accum : Serializable { Review comment: Done ## File path: learning/katas/kotlin/util/test/org/apache/beam/learning/katas/util/ContainsKvs.kt ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.learning.katas.util + +import com.google.common.collect.ImmutableList +import com.google.common.collect.Iterables +import org.apache.beam.sdk.transforms.SerializableFunction +import org.apache.beam.sdk.values.KV +import org.hamcrest.CoreMatchers +import org.hamcrest.Matcher +import org.hamcrest.collection.IsIterableContainingInAnyOrder +import org.junit.Assert +import java.util.* + +class ContainsKvs private constructor(private val expectedKvs: List>>) : SerializableFunction>>, Void?> { + +companion object { +@SafeVarargs +fun containsKvs(vararg kvs: KV>): SerializableFunction>>, Void?> { +return ContainsKvs(ImmutableList.copyOf(kvs)) +} +} + +override fun apply(input: Iterable>>): Void? { +val matchers: MutableList>>> = ArrayList() + +for (expected in expectedKvs) { +val values = Iterables.toArray(expected.value, String::class.java) + matchers.add(KvMatcher.Companion.isKv(CoreMatchers.equalTo(expected.key), IsIterableContainingInAnyOrder.containsInAnyOrder(*values))) Review comment: Done! This is an automated message from the Apache Git Service.
[jira] [Work logged] (BEAM-10144) Update pipeline options snippets for best practices
[ https://issues.apache.org/jira/browse/BEAM-10144?focusedWorklogId=438419=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438419 ] ASF GitHub Bot logged work on BEAM-10144: - Author: ASF GitHub Bot Created on: 28/May/20 20:24 Start Date: 28/May/20 20:24 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #11851: URL: https://github.com/apache/beam/pull/11851#issuecomment-635587627 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438419) Time Spent: 1h (was: 50m) > Update pipeline options snippets for best practices > --- > > Key: BEAM-10144 > URL: https://issues.apache.org/jira/browse/BEAM-10144 > Project: Beam > Issue Type: Improvement > Components: examples-python >Reporter: David Cavazos >Assignee: David Cavazos >Priority: P3 > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-7774) Stop usign Perfkit Benchmarker tool in Python performance tests
[ https://issues.apache.org/jira/browse/BEAM-7774?focusedWorklogId=438418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438418 ] ASF GitHub Bot logged work on BEAM-7774: Author: ASF GitHub Bot Created on: 28/May/20 20:22 Start Date: 28/May/20 20:22 Worklog Time Spent: 10m Work Description: tvalentyn commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432101289 ## File path: .test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json ## @@ -0,0 +1,297 @@ +{ Review comment: > exporting a new dashboard to JSON file Thanks, does this mean: creating a new dashboard manually via UI? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438418) Time Spent: 5h 20m (was: 5h 10m) > Stop usign Perfkit Benchmarker tool in Python performance tests > --- > > Key: BEAM-7774 > URL: https://issues.apache.org/jira/browse/BEAM-7774 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Lukasz Gajowy >Assignee: Piotr Szuberski >Priority: P2 > Time Spent: 5h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10078) uniquify Dataflow specific jars when staging
[ https://issues.apache.org/jira/browse/BEAM-10078?focusedWorklogId=438417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438417 ] ASF GitHub Bot logged work on BEAM-10078: - Author: ASF GitHub Bot Created on: 28/May/20 20:19 Start Date: 28/May/20 20:19 Worklog Time Spent: 10m Work Description: ihji commented on a change in pull request #11814: URL: https://github.com/apache/beam/pull/11814#discussion_r432099933 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java ## @@ -397,10 +397,21 @@ public static PackageAttributes forFileToStage( String.format("Non-existent file to stage: %s", file.getAbsolutePath())); } checkState(!file.isDirectory(), "Source file must not be a directory."); + String target; + switch (dest) { +case "dataflow-worker.jar": Review comment: Put some comments and logging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 438417) Time Spent: 1h 40m (was: 1.5h) > uniquify Dataflow specific jars when staging > > > Key: BEAM-10078 > URL: https://issues.apache.org/jira/browse/BEAM-10078 > Project: Beam > Issue Type: Bug > Components: runner-dataflow >Reporter: Heejong Lee >Assignee: Heejong Lee >Priority: P2 > Fix For: 2.22.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > After BEAM-9383, Dataflow specific jars (dataflow-worker.jar, windmill_main) > could be overwritten when two or more jobs share the same staging location. > Since they 1) should have specific predefined names AND 2) should have unique > location for avoiding collision, they need special handling when staging > artifacts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-10146) Clean Docker images after tests
Kyle Weaver created BEAM-10146: -- Summary: Clean Docker images after tests Key: BEAM-10146 URL: https://issues.apache.org/jira/browse/BEAM-10146 Project: Beam Issue Type: Improvement Components: testing Reporter: Kyle Weaver We build Docker images for many tests, but as far as I know we never clean them up. On Jenkins, we prune Docker images every day (https://github.com/apache/beam/blob/b0844c9326841f8ff30950b526015b23e6c3af9b/.test-infra/jenkins/job_Inventory.groovy#L69). But this is an issue for developers' workstations. Over time this can result in O(100GB) disk usage. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work logged] (BEAM-10107) beam website PR listed twice in release guide with contradictory instructions
[ https://issues.apache.org/jira/browse/BEAM-10107?focusedWorklogId=438416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-438416 ] ASF GitHub Bot logged work on BEAM-10107: - Author: ASF GitHub Bot Created on: 28/May/20 20:10 Start Date: 28/May/20 20:10 Worklog Time Spent: 10m Work Description: ibzib opened a new pull request #11852: URL: https://github.com/apache/beam/pull/11852 …ase guide. Just some minor docs cleanup. R: @TheNeuralBit Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[jira] [Updated] (BEAM-10145) Kafka IO performance tests leaving behind unused disks on apache-beam-testing
[ https://issues.apache.org/jira/browse/BEAM-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Udi Meiri updated BEAM-10145: - Attachment: VfRJduZigE1.png > Kafka IO performance tests leaving behind unused disks on apache-beam-testing > - > > Key: BEAM-10145 > URL: https://issues.apache.org/jira/browse/BEAM-10145 > Project: Beam > Issue Type: Bug > Components: testing >Reporter: Udi Meiri >Priority: P2 > Attachments: VfRJduZigE1.png > > > Sample disk description: > {"kubernetes.io/created-for/pv/name":"pvc-97dd8abb-a0ac-11ea-aa65-42010a80013b","kubernetes.io/created-for/pvc/name":"data-pzoo-0","kubernetes.io/created-for/pvc/namespace":"beam-performancetests-kafka-io-826"} -- This message was sent by Atlassian Jira (v8.3.4#803005)