[GitHub] [beam] mwalenia commented on pull request #11566: [BEAM-9723] Add DLP integration transforms
mwalenia commented on pull request #11566: URL: https://github.com/apache/beam/pull/11566#issuecomment-635791896 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] mwalenia commented on pull request #11331: [BEAM-9646] Add Google Cloud vision integration transform
mwalenia commented on pull request #11331: URL: https://github.com/apache/beam/pull/11331#issuecomment-635791979 run java precommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635752517 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing
chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635752368 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing
chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635752434 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635752197 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635752138 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] darshanj commented on pull request #11855: [BEAM-10005] | combinefn for ApproximateQuantiles and ApproximateUnique
darshanj commented on pull request #11855: URL: https://github.com/apache/beam/pull/11855#issuecomment-635734663 R: @kennknowles This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] darshanj opened a new pull request #11855: [BEAM-10005] | combinefn for ApproximateQuantiles and ApproximateUnique
darshanj opened a new pull request #11855: URL: https://github.com/apache/beam/pull/11855 Added api combineFn that can be used for inputs windowed with non global windows **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/ic
[GitHub] [beam] boyuanzz commented on a change in pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)
boyuanzz commented on a change in pull request #11642: URL: https://github.com/apache/beam/pull/11642#discussion_r432230809 ## File path: sdks/python/apache_beam/runners/direct/sdf_direct_runner.py ## @@ -464,7 +464,7 @@ def initiate_checkpoint(): with self._checkpoint_lock: if checkpoint_state.checkpointed: return - checkpoint_state.residual_restriction = tracker.checkpoint() + checkpoint_state.residual_restriction = tracker.try_claim(0) Review comment: Are you meaning `try_split(0)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635722845 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)
aaltay commented on pull request #11642: URL: https://github.com/apache/beam/pull/11642#issuecomment-635705379 > > This is a single line change and passing all the tests. If the change make sense can we merge it? (question to @boyuanzz ) > > I don't think the change is correct. I can open a PR and cc the author. OK. In that case, no action is required from you. We can wait until @epicfaace updates this. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing
chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635702404 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing
chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635702360 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing
chamikaramj commented on pull request #11846: URL: https://github.com/apache/beam/pull/11846#issuecomment-635702305 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] steveniemitz commented on pull request #11849: [BEAM-9964] Move workerCacheMb to a user-visible place
steveniemitz commented on pull request #11849: URL: https://github.com/apache/beam/pull/11849#issuecomment-635697264 =/ looks like the dataflow precommit succeeded but the API call to update it here failed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests
tvalentyn commented on pull request #11788: URL: https://github.com/apache/beam/pull/11788#issuecomment-635696787 @epicfaace Thanks for your initiative to help with Python 3.8. Please see the discussion on introducing high-priority/low priority versions: https://lists.apache.org/thread.html/r643cae69e5be136e6bca75bf896991fa313f79623ca056271588c87d%40%3Cdev.beam.apache.org%3E cc: Yoshiki (@lazylynx) who is was also working on this and may have some thoughts how to best integrate these changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] steveniemitz commented on pull request #11849: [BEAM-9964] Move workerCacheMb to a user-visible place
steveniemitz commented on pull request #11849: URL: https://github.com/apache/beam/pull/11849#issuecomment-635696194 > gahh so sorry that I missed this. I guess you did have to end up contributing this : ) heh no problem, teamwork! :highfive: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635695687 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635695779 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11835: Various fixes to allow Java PAssert to run on Python
robertwb commented on a change in pull request #11835: URL: https://github.com/apache/beam/pull/11835#discussion_r432198245 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -518,6 +519,28 @@ def format_result(k_v): 'B-3': {10, 15, 16}, }.items( + def test_never(self): +with TestPipeline() as p: + + def construct_timestamped(k_t): +return TimestampedValue((k_t[0], k_t[1]), k_t[1]) + + def format_result(k_v): +return ('%s-%s' % (k_v[0], len(k_v[1])), set(k_v[1])) + + result = ( + p + | beam.Create([1, 1, 2, 3, 4, 5, 10, 11]) + | beam.FlatMap(lambda t: [('A', t), ('B', t + 5)]) + | beam.Map(construct_timestamped) + | beam.WindowInto( + FixedWindows(10), + trigger=Never(), + accumulation_mode=AccumulationMode.DISCARDING) + | beam.GroupByKey() + | beam.Map(format_result)) + assert_that(result, equal_to([])) Review comment: I couldn't find an existing bug about closing behaviors, so I filed BEAM-10149. That seems pretty invasive to change as part of this PR, so I added a hack to support its use with proper closing behavior in batch only. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] boyuanzz commented on pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)
boyuanzz commented on pull request #11642: URL: https://github.com/apache/beam/pull/11642#issuecomment-635680691 > This is a single line change and passing all the tests. If the change make sense can we merge it? (question to @boyuanzz ) I don't think the change is correct. I can open a PR and cc the author. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635677737 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635676513 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635676452 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests
aaltay commented on pull request #11788: URL: https://github.com/apache/beam/pull/11788#issuecomment-635675870 /cc @tvalentyn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11758: Old Fastjson has a serious security problem
aaltay commented on pull request #11758: URL: https://github.com/apache/beam/pull/11758#issuecomment-635675525 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)
aaltay commented on pull request #11642: URL: https://github.com/apache/beam/pull/11642#issuecomment-635675425 This is a single line change and passing all the tests. If the change make sense can we merge it? (question to @boyuanzz ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11181: [BEAM-9500] [WIP] Refactor load tests
aaltay commented on pull request #11181: URL: https://github.com/apache/beam/pull/11181#issuecomment-635674912 @piotr-szuberski - what is the next step for this PR? Is it still active? Should we close it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635674009 I don't think so. We changed from using "key=value" strings to StagedFile objects in https://github.com/apache/beam/pull/11039/files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chadrik commented on pull request #11632: [BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.*
chadrik commented on pull request #11632: URL: https://github.com/apache/beam/pull/11632#issuecomment-635673990 LGTM! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11704: [BEAM-9956] removed unbalanced code markup
aaltay commented on pull request #11704: URL: https://github.com/apache/beam/pull/11704#issuecomment-635673169 Could you rebase? Is this still an issue after the website change? R: @rosetn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11706: [BEAM-8451] annotate python only sections
aaltay commented on pull request #11706: URL: https://github.com/apache/beam/pull/11706#issuecomment-635672892 R: @rosetn This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
TheNeuralBit commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635672193 @chamikaramj, @ihji is the `filesToStage` change a problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11758: Old Fastjson has a serious security problem
aaltay commented on pull request #11758: URL: https://github.com/apache/beam/pull/11758#issuecomment-635672368 LGTM. I can merge it if tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms
aaltay commented on pull request #11610: URL: https://github.com/apache/beam/pull/11610#issuecomment-635672074 All tests passed. I do not see a LGTM, maybe I am missing. Is this ready to be merged? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11779: [BEAM-10055] Add --region to python examples where it was missing
aaltay commented on pull request #11779: URL: https://github.com/apache/beam/pull/11779#issuecomment-635671668 LGTM. I will merge after tests pass. Thank you @tedromer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11779: [BEAM-10055] Add --region to python examples where it was missing
aaltay commented on pull request #11779: URL: https://github.com/apache/beam/pull/11779#issuecomment-635671566 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on pull request #11847: URL: https://github.com/apache/beam/pull/11847#issuecomment-635670461 LGTM. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11819: [BEAM-8371] Remove support for Python 2
aaltay commented on pull request #11819: URL: https://github.com/apache/beam/pull/11819#issuecomment-635669412 Should we close this pull request for now? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11682: [BEAM-9946] | added new api in Partition Transform
aaltay commented on pull request #11682: URL: https://github.com/apache/beam/pull/11682#issuecomment-635668922 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635664997 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rohdesamuel commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride
rohdesamuel commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635664357 Looks like the PreCommit failed with "Exception: Dataflow only supports Python versions 2 and 3.5+, got: (3, 8)". Is that a known failure? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11849: [BEAM-9964] Move workerCacheMb to a user-visible place
pabloem commented on pull request #11849: URL: https://github.com/apache/beam/pull/11849#issuecomment-635663865 gahh so sorry that I missed this. I guess you did have to end up contributing this : ) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb commented on a change in pull request #11632: [BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.*
robertwb commented on a change in pull request #11632: URL: https://github.com/apache/beam/pull/11632#discussion_r432170796 ## File path: sdks/python/apache_beam/dataframe/transforms.py ## @@ -16,13 +16,28 @@ from __future__ import absolute_import +import typing +from typing import Any +from typing import Dict +from typing import List +from typing import Mapping +from typing import Tuple +from typing import TypeVar +from typing import Union + import pandas as pd import apache_beam as beam from apache_beam import transforms from apache_beam.dataframe import expressions from apache_beam.dataframe import frames # pylint: disable=unused-import +if typing.TYPE_CHECKING: Review comment: +1 for consistency. Changed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj opened a new pull request #11854: [BEAM-8019] Cherry pick 11844
chamikaramj opened a new pull request #11854: URL: https://github.com/apache/beam/pull/11854 Without this cross-language KafkaIO users may have to do pipeline.run(False) instead of pipeline.run() when executing a pipeline using Dataflow. @TheNeuralBit this should be a pretty safe change to cherry-pick if you are OK with it. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild
[GitHub] [beam] chamikaramj commented on pull request #11854: [BEAM-8019] Cherry pick 11844
chamikaramj commented on pull request #11854: URL: https://github.com/apache/beam/pull/11854#issuecomment-635657611 R: @TheNeuralBit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb merged pull request #11853: Update multi-language roadmap status.
robertwb merged pull request #11853: URL: https://github.com/apache/beam/pull/11853 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
chamikaramj commented on pull request #11844: URL: https://github.com/apache/beam/pull/11844#issuecomment-635651784 Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb merged pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.
robertwb merged pull request #11844: URL: https://github.com/apache/beam/pull/11844 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices
aaltay commented on pull request #11851: URL: https://github.com/apache/beam/pull/11851#issuecomment-635648807 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] iemejia commented on a change in pull request #11853: Update multi-language roadmap status.
iemejia commented on a change in pull request #11853: URL: https://github.com/apache/beam/pull/11853#discussion_r432163908 ## File path: website/www/site/content/en/roadmap/connectors-multi-sdk.md ## @@ -62,27 +62,29 @@ Work related to making cross-language transforms available for Flink runner. Work related to making cross-language transforms available for Dataflow runner. -* Basic support for executing cross-language transforms on Dataflow runner +* Basic support for executing cross-language transforms on Dataflow runner + This work requires updates to Dataflow service's job submission and job execution logic. This is currently being developed at Google. ### Support for Direct runner Work related to making cross-language transforms available on Direct runner -* Basic support for executing cross-language transforms on portable Direct runner - Not started +* Basic support for executing cross-language transforms on Pyton Direct runner - completed +* Basic support for executing cross-language transforms on Java Direct runner - Not started ### Connector/transform support Ongoing and planned work related to making existing connectors/transforms available to other SDKs through the cross-language transforms framework. -* Java KafkIO - In progress - [BEAM-7029](https://issues.apache.org/jira/browse/BEAM-7029) +* Java KafkIO - completed - [BEAM-7029](https://issues.apache.org/jira/browse/BEAM-7029) Review comment: ```suggestion * Java KafkaIO - completed - [BEAM-7029](https://issues.apache.org/jira/browse/BEAM-7029) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #11803: [BEAM-9679] Add a CoGroupByKey lesson to the Core Transforms section
damondouglas commented on pull request #11803: URL: https://github.com/apache/beam/pull/11803#issuecomment-635644834 I updated [the stepik course](https://stepik.org/course/70387) and commited the updated `*-remote.yaml` files to this PR. It is ready to merge into master. Thank you, @henryken and @lostluck for your help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635644599 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride
pabloem commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635640892 Run Python 2 PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride
pabloem commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635640799 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rohdesamuel commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride
rohdesamuel commented on pull request #11594: URL: https://github.com/apache/beam/pull/11594#issuecomment-635640073 R: @pabloem This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] udim merged pull request #11070: [BEAM-8280] Blog post: Python typing changes
udim merged pull request #11070: URL: https://github.com/apache/beam/pull/11070 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635630717 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
ihji commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432142814 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( + ['java', '-jar', expansion_service_jar_file, expansion_service_port]) + kafka_server = subprocess.Popen( + ['java', '-jar', local_kafka_jar_file, kafka_port, zookeeper_port]) + time.sleep(3) + channel_creds = grpc.local_channel_credentials() + with grpc.secure_channel('localhost:%s' % expansion_service_port, + channel_creds) as channel: +grpc.channel_ready_future(channel).result() + + yield expansion_service_port, kafka_port +finally: + if expansion_server: +expansion_server.kill() + if kafka_server: +kafka_server.kill() + + def get_options(self): +options = PipelineOptions([ +'--runner', +'FlinkRunner', +'--parallelism', +'2', +'--experiment', +'beam_fn_api' +]) +return options + + def test_kafkaio_write(self): +local_kafka_jar = os.environ.get('LOCAL_KAFKA_JAR') +expansion_service_jar = os.environ.get('EXPANSION_JAR') +with self.local_services(expansion_service_jar, Review comment: updated. ---
[GitHub] [beam] ihji commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
ihji commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432142992 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( Review comment: updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
ihji commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432142695 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( + ['java', '-jar', expansion_service_jar_file, expansion_service_port]) + kafka_server = subprocess.Popen( Review comment: Yes, this is for external testing (probably only for small scale correctness tests, we may still need kubernetes cluster for large scale performance tests). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jhnmora000 commented on pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality
jhnmora000 commented on pull request #11845: URL: https://github.com/apache/beam/pull/11845#issuecomment-635624778 Thanks for your help @amaliujia . I will close this PR and continue experimenting with BeamSQL/Calcite. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] jhnmora000 closed pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality
jhnmora000 closed pull request #11845: URL: https://github.com/apache/beam/pull/11845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635621052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635620991 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635620885 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11853: Update multi-language roadmap status.
chamikaramj commented on pull request #11853: URL: https://github.com/apache/beam/pull/11853#issuecomment-635620838 LGTM. Thanks for updating. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli merged pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.
youngoli merged pull request #11791: URL: https://github.com/apache/beam/pull/11791 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidcavazos commented on pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices
davidcavazos commented on pull request #11851: URL: https://github.com/apache/beam/pull/11851#issuecomment-635618216 Run Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
TheNeuralBit commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635615410 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] youngoli commented on pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.
youngoli commented on pull request #11791: URL: https://github.com/apache/beam/pull/11791#issuecomment-635615190 Run Go PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] robertwb opened a new pull request #11853: Update multi-language roadmap status.
robertwb opened a new pull request #11853: URL: https://github.com/apache/beam/pull/11853 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/
[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma
lukecwik commented on pull request #11821: URL: https://github.com/apache/beam/pull/11821#issuecomment-635613803 Run Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
tvalentyn commented on a change in pull request #11086: URL: https://github.com/apache/beam/pull/11086#discussion_r432112443 ## File path: sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py ## @@ -254,11 +256,36 @@ def test_big_query_new_types(self): 'output_schema': NEW_TYPES_OUTPUT_SCHEMA, 'use_standard_sql': False, 'wait_until_finish_duration': WAIT_UNTIL_FINISH_DURATION_MS, +'use_json_exports': True, 'on_success_matcher': all_of(*pipeline_verifiers) } options = self.test_pipeline.get_full_options_as_args(**extra_opts) big_query_query_to_table_pipeline.run_bq_pipeline(options) + @attr('IT') + def test_big_query_new_types(self): Review comment: Looks like we run this test internally using this name. Since this test now changes to `use_beam_bq_sink`, please check that maintain reasonable internal test coverage for the native sink, or use a new test name for new behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
tvalentyn commented on pull request #11661: URL: https://github.com/apache/beam/pull/11661#issuecomment-635593321 @kamilwu - please merge once this looks good to you, I don't have other input here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
tvalentyn commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432107251 ## File path: .test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json ## @@ -77,7 +77,7 @@ ], "orderByTime": "ASC", "policy": "default", - "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" WHERE metric = 'Python performance test' AND $timeFilter GROUP BY time($__interval), \"metric\"", + "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" WHERE metric = 'wordcount_it_runtime' AND $timeFilter GROUP BY time($__interval), \"metric\"", Review comment: Seeing this query now - yes, I wound just keep the metric 'runtime', since we already know it is wordcount_py27_results, and it would be simpler that pipeline does not need to know the name of the suite. In the future we might add different metrics like 'cost' or total cputime consumed by other workers as opposed to runtime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
chamikaramj commented on pull request #11814: URL: https://github.com/apache/beam/pull/11814#issuecomment-635591331 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
tvalentyn commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432106093 ## File path: sdks/python/apache_beam/examples/wordcount_it_test.py ## @@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts): # Register clean up before pipeline execution self.addCleanup(delete_files, [test_output + '*']) +publish_to_bq = bool( +test_pipeline.get_option('publish_to_big_query') or False) + +# Start measure time for performance test +start_time = time.time() + # Get pipeline options from command argument: --test-pipeline-options, # and start pipeline job by calling pipeline main function. run_wordcount( test_pipeline.get_full_options_as_args(**extra_opts), -save_main_session=False) +save_main_session=False, +) + +end_time = time.time() +run_time = end_time - start_time + +if publish_to_bq: + self._publish_metrics(test_pipeline, run_time) + + def _publish_metrics(self, pipeline, metric_value): +influx_options = InfluxDBMetricsPublisherOptions( +pipeline.get_option('influx_measurement'), +pipeline.get_option('influx_db_name'), +pipeline.get_option('influx_hostname'), +os.getenv('INFLUXDB_USER'), +os.getenv('INFLUXDB_USER_PASSWORD'), +) +metric_reader = MetricsReader( +project_name=pipeline.get_option('project'), +bq_table=pipeline.get_option('metrics_table'), +bq_dataset=pipeline.get_option('metrics_dataset'), +publish_to_bq=True, +influxdb_options=influx_options, +) + +metric_reader.publish_values(( +metric_value, Review comment: Do we need "wordcount_it" in the metric name? It depends on how these metrics will be stored, if they are already associated in the database with a test suite that launches the pipeline `Python WordCount IT Benchmarks`, then this information is captured and perhaps we don't need to repeat it in two different places. Leaving this up to you and @kamilwu. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck merged pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2
lostluck merged pull request #11207: URL: https://github.com/apache/beam/pull/11207 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rionmonster commented on a change in pull request #11761: [BEAM-10027] Support for Kotlin-based Beam Katas
rionmonster commented on a change in pull request #11761: URL: https://github.com/apache/beam/pull/11761#discussion_r432104332 ## File path: learning/katas/kotlin/Windowing/Fixed Time Window/Fixed Time Window/test/org/apache/beam/learning/katas/windowing/fixedwindow/WindowedEvent.kt ## @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.learning.katas.windowing.fixedwindow + +import java.io.Serializable +import java.util.* + +class WindowedEvent(private val event: String?, private val count: Long, private val window: String) : Serializable { Review comment: Done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] rionmonster commented on a change in pull request #11761: [BEAM-10027] Support for Kotlin-based Beam Katas
rionmonster commented on a change in pull request #11761: URL: https://github.com/apache/beam/pull/11761#discussion_r432103074 ## File path: learning/katas/kotlin/Core Transforms/Combine/CombineFn/src/org/apache/beam/learning/katas/coretransforms/combine/combinefn/Task.kt ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.learning.katas.coretransforms.combine.combinefn + +import org.apache.beam.learning.katas.coretransforms.combine.combinefn.Task.AverageFn.Accum +import org.apache.beam.learning.katas.util.Log +import org.apache.beam.sdk.Pipeline +import org.apache.beam.sdk.options.PipelineOptionsFactory +import org.apache.beam.sdk.transforms.Combine +import org.apache.beam.sdk.transforms.Combine.CombineFn +import org.apache.beam.sdk.transforms.Create +import org.apache.beam.sdk.values.PCollection +import java.io.Serializable +import java.util.* + +object Task { +@JvmStatic +fun main(args: Array) { +val options = PipelineOptionsFactory.fromArgs(*args).create() +val pipeline = Pipeline.create(options) + +val numbers = pipeline.apply(Create.of(10, 20, 50, 70, 90)) + +val output = applyTransform(numbers) + +output.apply(Log.ofElements()) + +pipeline.run() +} + +@JvmStatic +fun applyTransform(input: PCollection): PCollection { +return input.apply(Combine.globally(AverageFn())) +} + +internal class AverageFn : CombineFn() { + +internal inner class Accum : Serializable { Review comment: Done ## File path: learning/katas/kotlin/util/test/org/apache/beam/learning/katas/util/ContainsKvs.kt ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.learning.katas.util + +import com.google.common.collect.ImmutableList +import com.google.common.collect.Iterables +import org.apache.beam.sdk.transforms.SerializableFunction +import org.apache.beam.sdk.values.KV +import org.hamcrest.CoreMatchers +import org.hamcrest.Matcher +import org.hamcrest.collection.IsIterableContainingInAnyOrder +import org.junit.Assert +import java.util.* + +class ContainsKvs private constructor(private val expectedKvs: List>>) : SerializableFunction>>, Void?> { + +companion object { +@SafeVarargs +fun containsKvs(vararg kvs: KV>): SerializableFunction>>, Void?> { +return ContainsKvs(ImmutableList.copyOf(kvs)) +} +} + +override fun apply(input: Iterable>>): Void? { +val matchers: MutableList>>> = ArrayList() + +for (expected in expectedKvs) { +val values = Iterables.toArray(expected.value, String::class.java) + matchers.add(KvMatcher.Companion.isKv(CoreMatchers.equalTo(expected.key), IsIterableContainingInAnyOrder.containsInAnyOrder(*values))) Review comment: Done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] aaltay commented on pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices
aaltay commented on pull request #11851: URL: https://github.com/apache/beam/pull/11851#issuecomment-635587627 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
tvalentyn commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432101289 ## File path: .test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json ## @@ -0,0 +1,297 @@ +{ Review comment: > exporting a new dashboard to JSON file Thanks, does this mean: creating a new dashboard manually via UI? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ihji commented on a change in pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging
ihji commented on a change in pull request #11814: URL: https://github.com/apache/beam/pull/11814#discussion_r432099933 ## File path: runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java ## @@ -397,10 +397,21 @@ public static PackageAttributes forFileToStage( String.format("Non-existent file to stage: %s", file.getAbsolutePath())); } checkState(!file.isDirectory(), "Source file must not be a directory."); + String target; + switch (dest) { +case "dataflow-worker.jar": Review comment: Put some comments and logging This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] ibzib opened a new pull request #11852: [BEAM-10107] Remove outdated instructions for website updates in rele…
ibzib opened a new pull request #11852: URL: https://github.com/apache/beam/pull/11852 …ase guide. Just some minor docs cleanup. R: @TheNeuralBit Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)
[GitHub] [beam] chamikaramj commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test
chamikaramj commented on a change in pull request #11847: URL: https://github.com/apache/beam/pull/11847#discussion_r432083261 ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'localhost:%s' % os.environ.get('EXPANSION_PORT')) +self.sum_counter = Metrics.counter('source', 'elements_sum') + + def build_write_pipeline(self, pipeline): +_ = ( +pipeline +| 'Impulse' >> beam.Impulse() +| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: disable=range-builtin-not-iterating +| 'Reshuffle' >> beam.Reshuffle() +| 'MakeKV' >> beam.Map(lambda x: + (b'', str(x).encode())).with_output_types( + typing.Tuple[bytes, bytes]) +| 'WriteToKafka' >> WriteToKafka( +producer_config={'bootstrap.servers': self.bootstrap_servers}, +topic=self.topic, +expansion_service=self.expansion_service)) + + def build_read_pipeline(self, pipeline): +_ = ( +pipeline +| 'ReadFromKafka' >> ReadFromKafka( +consumer_config={ +'bootstrap.servers': self.bootstrap_servers, +'auto.offset.reset': 'earliest' +}, +topics=[self.topic], +expansion_service=self.expansion_service) +| 'Windowing' >> beam.WindowInto( +beam.window.FixedWindows(300), +trigger=beam.transforms.trigger.AfterProcessingTime(60), +accumulation_mode=beam.transforms.trigger.AccumulationMode. +DISCARDING) +| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode())) +| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults() +| 'SetSumCounter' >> beam.Map(self.sum_counter.inc)) + + def run_xlang_kafkaio(self, pipeline): +self.build_write_pipeline(pipeline) +self.build_read_pipeline(pipeline) +pipeline.run(False) + + +@unittest.skipUnless( +os.environ.get('LOCAL_KAFKA_JAR'), +"LOCAL_KAFKA_JAR environment var is not provided.") +@unittest.skipUnless( +os.environ.get('EXPANSION_JAR'), +"EXPANSION_JAR environment var is not provided.") +class CrossLanguageKafkaIOTest(unittest.TestCase): + def get_open_port(self): +s = None +try: + s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) +except: # pylint: disable=bare-except + # Above call will fail for nodes that only support IPv6. + s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) +s.bind(('localhost', 0)) +s.listen(1) +port = s.getsockname()[1] +s.close() +return port + + @contextlib.contextmanager + def local_services(self, expansion_service_jar_file, local_kafka_jar_file): +expansion_service_port = str(self.get_open_port()) +kafka_port = str(self.get_open_port()) +zookeeper_port = str(self.get_open_port()) + +expansion_server = None +kafka_server = None +try: + expansion_server = subprocess.Popen( Review comment: Is this only for internal testing ? Externally, kafkaio.py should automatically startup an expansion service as long as we are in a release or a Beam repo clone. ## File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py ## @@ -0,0 +1,145 @@ +"""Integration test for Python cross-language pipelines for Java KafkaIO.""" + +from __future__ import absolute_import + +import contextlib +import logging +import os +import socket +import subprocess +import time +import typing +import unittest + +import grpc + +import apache_beam as beam +from apache_beam.io.external.kafka import ReadFromKafka +from apache_beam.io.external.kafka import WriteToKafka +from apache_beam.metrics import Metrics +from apache_beam.options.pipeline_options import PipelineOptions +from apache_beam.testing.test_pipeline import TestPipeline + + +class CrossLanguageKafkaIO(object): + def __init__(self, bootstrap_servers, topic, expansion_service=None): +self.bootstrap_servers = bootstrap_servers +self.topic = topic +self.expansion_service = expansion_service or ( +'l
[GitHub] [beam] robertwb commented on a change in pull request #11835: Various fixes to allow Java PAssert to run on Python
robertwb commented on a change in pull request #11835: URL: https://github.com/apache/beam/pull/11835#discussion_r432093237 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -518,6 +519,28 @@ def format_result(k_v): 'B-3': {10, 15, 16}, }.items( + def test_never(self): +with TestPipeline() as p: + + def construct_timestamped(k_t): +return TimestampedValue((k_t[0], k_t[1]), k_t[1]) + + def format_result(k_v): +return ('%s-%s' % (k_v[0], len(k_v[1])), set(k_v[1])) + + result = ( + p + | beam.Create([1, 1, 2, 3, 4, 5, 10, 11]) + | beam.FlatMap(lambda t: [('A', t), ('B', t + 5)]) + | beam.Map(construct_timestamped) + | beam.WindowInto( + FixedWindows(10), + trigger=Never(), + accumulation_mode=AccumulationMode.DISCARDING) + | beam.GroupByKey() + | beam.Map(format_result)) + assert_that(result, equal_to([])) Review comment: Ack. (Interestingly, I originally was expecting it to fire at EoW.) The ULR doesn't yet support gc timers. I can look into fixing this (or at least making the "Never" trigger correct). As an aside, the non-trivial triggering and windowing in PAssert makes it less than ideal for validating new runners. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem removed a comment on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem removed a comment on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-626066297 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck merged pull request #11806: [BEAM-9679] Flatten Kata for Go
lostluck merged pull request #11806: URL: https://github.com/apache/beam/pull/11806 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lostluck commented on pull request #11806: [BEAM-9679] Flatten Kata for Go
lostluck commented on pull request #11806: URL: https://github.com/apache/beam/pull/11806#issuecomment-635568870 @damondouglas That sounds correct to me as well, in order to avoid colliding stepik updates. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] pabloem removed a comment on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro
pabloem removed a comment on pull request #11086: URL: https://github.com/apache/beam/pull/11086#issuecomment-634947671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] damondouglas commented on pull request #11806: [BEAM-9679] Flatten Kata for Go
damondouglas commented on pull request #11806: URL: https://github.com/apache/beam/pull/11806#issuecomment-635563478 @henryken Just confirming these steps: 1. @lostluck merges this PR #11806 to master 1. @damondouglas merges new changes from PR #11806 to PR #11803 1. @damondouglas uploads to [Stepik](https://stepik.org/course/70387) 1. @lostluck merges PR #11803 to master This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] TheNeuralBit commented on pull request #11777: [BEAM-10054] Fix watermark hold for on_time_pane
TheNeuralBit commented on pull request #11777: URL: https://github.com/apache/beam/pull/11777#issuecomment-635549457 Are those tests sufficient though if they're passing before this PR? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] kennknowles commented on a change in pull request #11835: Various fixes to allow Java PAssert to run on Python
kennknowles commented on a change in pull request #11835: URL: https://github.com/apache/beam/pull/11835#discussion_r432064924 ## File path: sdks/python/apache_beam/transforms/trigger_test.py ## @@ -518,6 +519,28 @@ def format_result(k_v): 'B-3': {10, 15, 16}, }.items( + def test_never(self): +with TestPipeline() as p: + + def construct_timestamped(k_t): +return TimestampedValue((k_t[0], k_t[1]), k_t[1]) + + def format_result(k_v): +return ('%s-%s' % (k_v[0], len(k_v[1])), set(k_v[1])) + + result = ( + p + | beam.Create([1, 1, 2, 3, 4, 5, 10, 11]) + | beam.FlatMap(lambda t: [('A', t), ('B', t + 5)]) + | beam.Map(construct_timestamped) + | beam.WindowInto( + FixedWindows(10), + trigger=Never(), + accumulation_mode=AccumulationMode.DISCARDING) + | beam.GroupByKey() + | beam.Map(format_result)) + assert_that(result, equal_to([])) Review comment: Never trigger is misnamed. It means that the trigger itself never fires, but that at window GC output is produced. That's why PAssert uses it to gather the full result. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432061809 ## File path: sdks/python/apache_beam/examples/wordcount_it_test.py ## @@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts): # Register clean up before pipeline execution self.addCleanup(delete_files, [test_output + '*']) +publish_to_bq = bool( +test_pipeline.get_option('publish_to_big_query') or False) + +# Start measure time for performance test +start_time = time.time() + # Get pipeline options from command argument: --test-pipeline-options, # and start pipeline job by calling pipeline main function. run_wordcount( test_pipeline.get_full_options_as_args(**extra_opts), -save_main_session=False) +save_main_session=False, +) + +end_time = time.time() +run_time = end_time - start_time + +if publish_to_bq: + self._publish_metrics(test_pipeline, run_time) + + def _publish_metrics(self, pipeline, metric_value): +influx_options = InfluxDBMetricsPublisherOptions( +pipeline.get_option('influx_measurement'), +pipeline.get_option('influx_db_name'), +pipeline.get_option('influx_hostname'), +os.getenv('INFLUXDB_USER'), +os.getenv('INFLUXDB_USER_PASSWORD'), +) +metric_reader = MetricsReader( +project_name=pipeline.get_option('project'), +bq_table=pipeline.get_option('metrics_table'), +bq_dataset=pipeline.get_option('metrics_dataset'), +publish_to_bq=True, +influxdb_options=influx_options, +) + +metric_reader.publish_values(( +metric_value, Review comment: Good point, I changed it to wordcount_it_runtime and the order of key, value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …
piotr-szuberski commented on a change in pull request #11661: URL: https://github.com/apache/beam/pull/11661#discussion_r432061809 ## File path: sdks/python/apache_beam/examples/wordcount_it_test.py ## @@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts): # Register clean up before pipeline execution self.addCleanup(delete_files, [test_output + '*']) +publish_to_bq = bool( +test_pipeline.get_option('publish_to_big_query') or False) + +# Start measure time for performance test +start_time = time.time() + # Get pipeline options from command argument: --test-pipeline-options, # and start pipeline job by calling pipeline main function. run_wordcount( test_pipeline.get_full_options_as_args(**extra_opts), -save_main_session=False) +save_main_session=False, +) + +end_time = time.time() +run_time = end_time - start_time + +if publish_to_bq: + self._publish_metrics(test_pipeline, run_time) + + def _publish_metrics(self, pipeline, metric_value): +influx_options = InfluxDBMetricsPublisherOptions( +pipeline.get_option('influx_measurement'), +pipeline.get_option('influx_db_name'), +pipeline.get_option('influx_hostname'), +os.getenv('INFLUXDB_USER'), +os.getenv('INFLUXDB_USER_PASSWORD'), +) +metric_reader = MetricsReader( +project_name=pipeline.get_option('project'), +bq_table=pipeline.get_option('metrics_table'), +bq_dataset=pipeline.get_option('metrics_dataset'), +publish_to_bq=True, +influxdb_options=influx_options, +) + +metric_reader.publish_values(( +metric_value, Review comment: Good point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidcavazos commented on a change in pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices
davidcavazos commented on a change in pull request #11851: URL: https://github.com/apache/beam/pull/11851#discussion_r432049685 ## File path: sdks/python/apache_beam/examples/snippets/snippets.py ## @@ -226,35 +227,33 @@ def _add_argparse_args(cls, parser): # [END pipeline_options_define_custom] - from apache_beam.options.pipeline_options import GoogleCloudOptions - from apache_beam.options.pipeline_options import StandardOptions - # [START pipeline_options_dataflow_service] - # Create and set your PipelineOptions. - options = PipelineOptions(flags=argv) + import apache_beam as beam + from apache_beam.options.pipeline_options import PipelineOptions + # Create and set your PipelineOptions. # For Cloud execution, specify DataflowRunner and set the Cloud Platform - # project, job name, staging file location, temp file location, and region. - options.view_as(StandardOptions).runner = 'DataflowRunner' - google_cloud_options = options.view_as(GoogleCloudOptions) - google_cloud_options.project = 'my-project-id' - google_cloud_options.job_name = 'myjob' - google_cloud_options.staging_location = 'gs://my-bucket/binaries' - google_cloud_options.temp_location = 'gs://my-bucket/temp' - google_cloud_options.region = 'us-central1' + # project, job name, temporary files location, and region. + # For more information about regions, check: + # https://cloud.google.com/dataflow/docs/concepts/regional-endpoints + options = PipelineOptions( + flags=argv, + runner='DataflowRunner', + project='my-project-id', + job_name='unique-job-name', + temp_location='gs://my-bucket/temp', + region='us-central1') # Create the Pipeline with the specified options. - p = Pipeline(options=options) + # with beam.Pipeline(options=options) as pipeline: Review comment: *Note:* This is commented out because if we leave it uncommented, even if it doesn't do anything, it makes the test fail with an error. But I still wanted it here for reference. ``` subprocess.CalledProcessError: Command '['/Users/dcavazos/src/beam/env/bin/python', '-m', 'pip', 'download', '--dest', '/var/folders/z2/zp_k4l5n2cq84fsn4y633mg400dsyy/T/tmpdv09ddqk', 'apache-beam==2.22.0.dev0', '--no-deps', '--no-binary', ':all:']' returned non-zero exit status 1. Pip install failed for package: apache-beam==2.22.0.dev0 Output from execution of subprocess: b'' ERROR: Could not find a version that satisfies the requirement apache-beam==2.22.0.dev0 (from versions: 0.6.0, 2.0.0, 2.1.0, 2.1.1, 2.2.0, 2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0, 2.21.0) ERROR: No matching distribution found for apache-beam==2.22.0.dev0 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] lukecwik commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness
lukecwik commented on a change in pull request #11792: URL: https://github.com/apache/beam/pull/11792#discussion_r432050850 ## File path: runners/portability/java/build.gradle ## @@ -31,9 +45,123 @@ dependencies { compile project(path: ":sdks:java:harness", configuration: "shadow") compile library.java.vendored_grpc_1_26_0 compile library.java.slf4j_api + testCompile project(path: ":runners:core-construction-java", configuration: "testRuntime") testCompile library.java.hamcrest_core testCompile library.java.junit testCompile library.java.mockito_core testCompile library.java.slf4j_jdk14 + + validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest") + validatesRunner project(path: ":runners:core-java", configuration: "testRuntime") + validatesRunner project(path: project.path, configuration: "testRuntime") +} + + +project.evaluationDependsOn(":sdks:java:core") +project.evaluationDependsOn(":runners:core-java") + +ext.virtualenvDir = "${project.buildDir}/virtualenv" +ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid" +ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? project.property("localJobServicePortFile") : "${project.buildDir}/local_job_service_port" +ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout" + +ext.pythonSdkDir = "${project.rootDir}/sdks/python" + +void execInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + exec { +workingDir pythonSdkDir +commandLine "sh", "-c", shellCommand + } } + +void execBackgroundInVirtualenv(String... args) { + String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ") + println "execBackgroundInVirtualEnv: ${shellCommand}" + ProcessBuilder pb = new ProcessBuilder().redirectErrorStream(true).directory(new File(pythonSdkDir)).command(["sh", "-c", shellCommand]) + Process proc = pb.start(); + + // redirectIO does not work for connecting to groovy/gradle stdout + BufferedReader reader = new BufferedReader(new InputStreamReader(proc.getInputStream())); + String line + while ((line = reader.readLine()) != null) { +println line + } + proc.waitFor(); +} + +task virtualenv { Review comment: I wouldn't want Beam to move to a build setup where each gradle file does its own thing because the fragmentation will hurt debugging build issues and slow down rolling out build changes that impact more then one project. One example where we decided to split a common setup was between releasing java projects and releasing vendored projects which lead to fixes that weren't done in both places leading to bugs that lasted for months. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam] davidcavazos commented on a change in pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices
davidcavazos commented on a change in pull request #11851: URL: https://github.com/apache/beam/pull/11851#discussion_r432049685 ## File path: sdks/python/apache_beam/examples/snippets/snippets.py ## @@ -226,35 +227,33 @@ def _add_argparse_args(cls, parser): # [END pipeline_options_define_custom] - from apache_beam.options.pipeline_options import GoogleCloudOptions - from apache_beam.options.pipeline_options import StandardOptions - # [START pipeline_options_dataflow_service] - # Create and set your PipelineOptions. - options = PipelineOptions(flags=argv) + import apache_beam as beam + from apache_beam.options.pipeline_options import PipelineOptions + # Create and set your PipelineOptions. # For Cloud execution, specify DataflowRunner and set the Cloud Platform - # project, job name, staging file location, temp file location, and region. - options.view_as(StandardOptions).runner = 'DataflowRunner' - google_cloud_options = options.view_as(GoogleCloudOptions) - google_cloud_options.project = 'my-project-id' - google_cloud_options.job_name = 'myjob' - google_cloud_options.staging_location = 'gs://my-bucket/binaries' - google_cloud_options.temp_location = 'gs://my-bucket/temp' - google_cloud_options.region = 'us-central1' + # project, job name, temporary files location, and region. + # For more information about regions, check: + # https://cloud.google.com/dataflow/docs/concepts/regional-endpoints + options = PipelineOptions( + flags=argv, + runner='DataflowRunner', + project='my-project-id', + job_name='unique-job-name', + temp_location='gs://my-bucket/temp', + region='us-central1') # Create the Pipeline with the specified options. - p = Pipeline(options=options) + # with beam.Pipeline(options=options) as pipeline: Review comment: *Note:* This is commented out because if we leave it uncommented, even if it doesn't do anything, it makes the test fail with an error. But I still wanted it here for reference. ``` subprocess.CalledProcessError: Command '['/Users/dcavazos/src/beam/env/bin/python', '-m', 'pip', 'download', '--dest', '/var/folders/z2/zp_k4l5n2cq84fsn4y633mg400dsyy/T/tmpdv09ddqk', 'apache-beam==2.22.0.dev0', '--no-deps', '--no-binary', ':all:']' returned non-zero exit status 1. Pip install failed for package: apache-beam==2.22.0.dev0 Output from execution of subprocess: b'' ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org