[GitHub] [beam] mwalenia commented on pull request #11566: [BEAM-9723] Add DLP integration transforms

2020-05-28 Thread GitBox


mwalenia commented on pull request #11566:
URL: https://github.com/apache/beam/pull/11566#issuecomment-635791896


   run java precommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] mwalenia commented on pull request #11331: [BEAM-9646] Add Google Cloud vision integration transform

2020-05-28 Thread GitBox


mwalenia commented on pull request #11331:
URL: https://github.com/apache/beam/pull/11331#issuecomment-635791979


   run java precommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635752517


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635752368


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635752434


   Run Python2_PVR_Flink PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635752197


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635752138


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] darshanj commented on pull request #11855: [BEAM-10005] | combinefn for ApproximateQuantiles and ApproximateUnique

2020-05-28 Thread GitBox


darshanj commented on pull request #11855:
URL: https://github.com/apache/beam/pull/11855#issuecomment-635734663


   R: @kennknowles



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] darshanj opened a new pull request #11855: [BEAM-10005] | combinefn for ApproximateQuantiles and ApproximateUnique

2020-05-28 Thread GitBox


darshanj opened a new pull request #11855:
URL: https://github.com/apache/beam/pull/11855


   Added api combineFn that can be used for inputs windowed with non global 
windows
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/ic

[GitHub] [beam] boyuanzz commented on a change in pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)

2020-05-28 Thread GitBox


boyuanzz commented on a change in pull request #11642:
URL: https://github.com/apache/beam/pull/11642#discussion_r432230809



##
File path: sdks/python/apache_beam/runners/direct/sdf_direct_runner.py
##
@@ -464,7 +464,7 @@ def initiate_checkpoint():
   with self._checkpoint_lock:
 if checkpoint_state.checkpointed:
   return
-  checkpoint_state.residual_restriction = tracker.checkpoint()
+  checkpoint_state.residual_restriction = tracker.try_claim(0)

Review comment:
   Are you meaning `try_split(0)`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635722845


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)

2020-05-28 Thread GitBox


aaltay commented on pull request #11642:
URL: https://github.com/apache/beam/pull/11642#issuecomment-635705379


   > > This is a single line change and passing all the tests. If the change 
make sense can we merge it? (question to @boyuanzz )
   > 
   > I don't think the change is correct. I can open a PR and cc the author.
   
   OK. In that case, no action is required from you. We can wait until 
@epicfaace updates this. Thank you!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635702404


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635702360


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11846: [BEAM-9869] adding self-contained Kafka service jar for testing

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11846:
URL: https://github.com/apache/beam/pull/11846#issuecomment-635702305


   LGTM. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] steveniemitz commented on pull request #11849: [BEAM-9964] Move workerCacheMb to a user-visible place

2020-05-28 Thread GitBox


steveniemitz commented on pull request #11849:
URL: https://github.com/apache/beam/pull/11849#issuecomment-635697264


   =/ looks like the dataflow precommit succeeded but the API call to update it 
here failed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests

2020-05-28 Thread GitBox


tvalentyn commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-635696787


   @epicfaace Thanks for your initiative to help with Python 3.8.
   Please see the discussion on introducing high-priority/low priority 
versions: 
https://lists.apache.org/thread.html/r643cae69e5be136e6bca75bf896991fa313f79623ca056271588c87d%40%3Cdev.beam.apache.org%3E
   cc: Yoshiki (@lazylynx) who is was also working on this and may have some 
thoughts how to best integrate these changes.   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] steveniemitz commented on pull request #11849: [BEAM-9964] Move workerCacheMb to a user-visible place

2020-05-28 Thread GitBox


steveniemitz commented on pull request #11849:
URL: https://github.com/apache/beam/pull/11849#issuecomment-635696194


   > gahh so sorry that I missed this. I guess you did have to end up 
contributing this : )
   
   heh no problem, teamwork! :highfive:
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635695687


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635695779


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11835: Various fixes to allow Java PAssert to run on Python

2020-05-28 Thread GitBox


robertwb commented on a change in pull request #11835:
URL: https://github.com/apache/beam/pull/11835#discussion_r432198245



##
File path: sdks/python/apache_beam/transforms/trigger_test.py
##
@@ -518,6 +519,28 @@ def format_result(k_v):
   'B-3': {10, 15, 16},
   }.items(
 
+  def test_never(self):
+with TestPipeline() as p:
+
+  def construct_timestamped(k_t):
+return TimestampedValue((k_t[0], k_t[1]), k_t[1])
+
+  def format_result(k_v):
+return ('%s-%s' % (k_v[0], len(k_v[1])), set(k_v[1]))
+
+  result = (
+  p
+  | beam.Create([1, 1, 2, 3, 4, 5, 10, 11])
+  | beam.FlatMap(lambda t: [('A', t), ('B', t + 5)])
+  | beam.Map(construct_timestamped)
+  | beam.WindowInto(
+  FixedWindows(10),
+  trigger=Never(),
+  accumulation_mode=AccumulationMode.DISCARDING)
+  | beam.GroupByKey()
+  | beam.Map(format_result))
+  assert_that(result, equal_to([]))

Review comment:
   I couldn't find an existing bug about closing behaviors, so I filed 
BEAM-10149. That seems pretty invasive to change as part of this PR, so I added 
a hack to support its use with proper closing behavior in batch only. PTAL.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] boyuanzz commented on pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)

2020-05-28 Thread GitBox


boyuanzz commented on pull request #11642:
URL: https://github.com/apache/beam/pull/11642#issuecomment-635680691


   > This is a single line change and passing all the tests. If the change make 
sense can we merge it? (question to @boyuanzz )
   
   I don't think the change is correct. I can open a PR and cc the author.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635677737







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635676513


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635676452


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11788: [BEAM-9785] Add Python 3.8 postcommit tests

2020-05-28 Thread GitBox


aaltay commented on pull request #11788:
URL: https://github.com/apache/beam/pull/11788#issuecomment-635675870


   /cc @tvalentyn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11758: Old Fastjson has a serious security problem

2020-05-28 Thread GitBox


aaltay commented on pull request #11758:
URL: https://github.com/apache/beam/pull/11758#issuecomment-635675525


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11642: Replace call to .checkpoint() in SDF direct runner to .try_claim(0)

2020-05-28 Thread GitBox


aaltay commented on pull request #11642:
URL: https://github.com/apache/beam/pull/11642#issuecomment-635675425


   This is a single line change and passing all the tests. If the change make 
sense can we merge it? (question to @boyuanzz )



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11181: [BEAM-9500] [WIP] Refactor load tests

2020-05-28 Thread GitBox


aaltay commented on pull request #11181:
URL: https://github.com/apache/beam/pull/11181#issuecomment-635674912


   @piotr-szuberski - what is the next step for this PR? Is it still active? 
Should we close it?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635674009


   I don't think so. We changed from using "key=value" strings to StagedFile 
objects in https://github.com/apache/beam/pull/11039/files.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chadrik commented on pull request #11632: [BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.*

2020-05-28 Thread GitBox


chadrik commented on pull request #11632:
URL: https://github.com/apache/beam/pull/11632#issuecomment-635673990


   LGTM!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11704: [BEAM-9956] removed unbalanced code markup

2020-05-28 Thread GitBox


aaltay commented on pull request #11704:
URL: https://github.com/apache/beam/pull/11704#issuecomment-635673169


   Could you rebase? Is this still an issue after the website change?
   
   R: @rosetn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11706: [BEAM-8451] annotate python only sections

2020-05-28 Thread GitBox


aaltay commented on pull request #11706:
URL: https://github.com/apache/beam/pull/11706#issuecomment-635672892


   R: @rosetn 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


TheNeuralBit commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635672193


   @chamikaramj, @ihji is the `filesToStage` change a problem?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11758: Old Fastjson has a serious security problem

2020-05-28 Thread GitBox


aaltay commented on pull request #11758:
URL: https://github.com/apache/beam/pull/11758#issuecomment-635672368


   LGTM. I can merge it if tests pass.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11610: [BEAM-9825] | Implement Intersect,Union,Except transforms

2020-05-28 Thread GitBox


aaltay commented on pull request #11610:
URL: https://github.com/apache/beam/pull/11610#issuecomment-635672074


   All tests passed. I do not see a LGTM, maybe I am missing. Is this ready to 
be merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11779: [BEAM-10055] Add --region to python examples where it was missing

2020-05-28 Thread GitBox


aaltay commented on pull request #11779:
URL: https://github.com/apache/beam/pull/11779#issuecomment-635671668


   LGTM. I will merge after tests pass. Thank you @tedromer 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11779: [BEAM-10055] Add --region to python examples where it was missing

2020-05-28 Thread GitBox


aaltay commented on pull request #11779:
URL: https://github.com/apache/beam/pull/11779#issuecomment-635671566


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11847:
URL: https://github.com/apache/beam/pull/11847#issuecomment-635670461


   LGTM. Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11819: [BEAM-8371] Remove support for Python 2

2020-05-28 Thread GitBox


aaltay commented on pull request #11819:
URL: https://github.com/apache/beam/pull/11819#issuecomment-635669412


   Should we close this pull request for now?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11682: [BEAM-9946] | added new api in Partition Transform

2020-05-28 Thread GitBox


aaltay commented on pull request #11682:
URL: https://github.com/apache/beam/pull/11682#issuecomment-635668922


   Run Java PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635664997


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride

2020-05-28 Thread GitBox


rohdesamuel commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635664357


   Looks like the PreCommit failed with "Exception: Dataflow only supports 
Python versions 2 and 3.5+, got: (3, 8)". Is that a known failure?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11849: [BEAM-9964] Move workerCacheMb to a user-visible place

2020-05-28 Thread GitBox


pabloem commented on pull request #11849:
URL: https://github.com/apache/beam/pull/11849#issuecomment-635663865


   gahh so sorry that I missed this. I guess you did have to end up 
contributing this : )



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb commented on a change in pull request #11632: [BEAM-7746] Fix type errors and enable checks for apache_beam.dataframe.*

2020-05-28 Thread GitBox


robertwb commented on a change in pull request #11632:
URL: https://github.com/apache/beam/pull/11632#discussion_r432170796



##
File path: sdks/python/apache_beam/dataframe/transforms.py
##
@@ -16,13 +16,28 @@
 
 from __future__ import absolute_import
 
+import typing
+from typing import Any
+from typing import Dict
+from typing import List
+from typing import Mapping
+from typing import Tuple
+from typing import TypeVar
+from typing import Union
+
 import pandas as pd
 
 import apache_beam as beam
 from apache_beam import transforms
 from apache_beam.dataframe import expressions
 from apache_beam.dataframe import frames  # pylint: disable=unused-import
 
+if typing.TYPE_CHECKING:

Review comment:
   +1 for consistency. Changed. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj opened a new pull request #11854: [BEAM-8019] Cherry pick 11844

2020-05-28 Thread GitBox


chamikaramj opened a new pull request #11854:
URL: https://github.com/apache/beam/pull/11854


   Without this cross-language KafkaIO users may have to do
   pipeline.run(False)
   
   instead of
   pipeline.run()
   
   when executing a pipeline using Dataflow.
   
   @TheNeuralBit this should be a pretty safe change to cherry-pick if you are 
OK with it.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild

[GitHub] [beam] chamikaramj commented on pull request #11854: [BEAM-8019] Cherry pick 11844

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11854:
URL: https://github.com/apache/beam/pull/11854#issuecomment-635657611


   R: @TheNeuralBit 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #11853: Update multi-language roadmap status.

2020-05-28 Thread GitBox


robertwb merged pull request #11853:
URL: https://github.com/apache/beam/pull/11853


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11844:
URL: https://github.com/apache/beam/pull/11844#issuecomment-635651784


   Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb merged pull request #11844: [BEAM-8019] Enables proto holders when 'test_runner_api' is True.

2020-05-28 Thread GitBox


robertwb merged pull request #11844:
URL: https://github.com/apache/beam/pull/11844


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices

2020-05-28 Thread GitBox


aaltay commented on pull request #11851:
URL: https://github.com/apache/beam/pull/11851#issuecomment-635648807


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] iemejia commented on a change in pull request #11853: Update multi-language roadmap status.

2020-05-28 Thread GitBox


iemejia commented on a change in pull request #11853:
URL: https://github.com/apache/beam/pull/11853#discussion_r432163908



##
File path: website/www/site/content/en/roadmap/connectors-multi-sdk.md
##
@@ -62,27 +62,29 @@ Work related to making cross-language transforms available 
for Flink runner.
 
 Work related to making cross-language transforms available for Dataflow runner.
 
-* Basic support for executing cross-language transforms on Dataflow runner 
+* Basic support for executing cross-language transforms on Dataflow runner
   + This work requires updates to Dataflow service's job submission and job 
execution logic. This is currently being developed at Google.
 
 ### Support for Direct runner
 
 Work related to making cross-language transforms available on Direct runner
 
-* Basic support for executing cross-language transforms on portable Direct 
runner - Not started
+* Basic support for executing cross-language transforms on Pyton Direct runner 
- completed
+* Basic support for executing cross-language transforms on Java Direct runner 
- Not started
 
 ### Connector/transform support
 
 Ongoing and planned work related to making existing connectors/transforms 
available to other SDKs through the cross-language transforms framework.
 
-* Java KafkIO - In progress - 
[BEAM-7029](https://issues.apache.org/jira/browse/BEAM-7029)
+* Java KafkIO - completed - 
[BEAM-7029](https://issues.apache.org/jira/browse/BEAM-7029)

Review comment:
   ```suggestion
   * Java KafkaIO - completed - 
[BEAM-7029](https://issues.apache.org/jira/browse/BEAM-7029)
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #11803: [BEAM-9679] Add a CoGroupByKey lesson to the Core Transforms section

2020-05-28 Thread GitBox


damondouglas commented on pull request #11803:
URL: https://github.com/apache/beam/pull/11803#issuecomment-635644834


   I updated [the stepik course](https://stepik.org/course/70387) and commited 
the updated `*-remote.yaml` files to this PR.  It is ready to merge into 
master.  Thank you, @henryken and @lostluck for your help.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635644599


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride

2020-05-28 Thread GitBox


pabloem commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635640892


   Run Python 2 PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride

2020-05-28 Thread GitBox


pabloem commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635640799







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rohdesamuel commented on pull request #11594: [BEAM-9692] Replace apply_WriteToBigQuery with PTransformOverride

2020-05-28 Thread GitBox


rohdesamuel commented on pull request #11594:
URL: https://github.com/apache/beam/pull/11594#issuecomment-635640073


   R: @pabloem 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] udim merged pull request #11070: [BEAM-8280] Blog post: Python typing changes

2020-05-28 Thread GitBox


udim merged pull request #11070:
URL: https://github.com/apache/beam/pull/11070


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635630717


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


ihji commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432142814



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(
+  ['java', '-jar', expansion_service_jar_file, expansion_service_port])
+  kafka_server = subprocess.Popen(
+  ['java', '-jar', local_kafka_jar_file, kafka_port, zookeeper_port])
+  time.sleep(3)
+  channel_creds = grpc.local_channel_credentials()
+  with grpc.secure_channel('localhost:%s' % expansion_service_port,
+   channel_creds) as channel:
+grpc.channel_ready_future(channel).result()
+
+  yield expansion_service_port, kafka_port
+finally:
+  if expansion_server:
+expansion_server.kill()
+  if kafka_server:
+kafka_server.kill()
+
+  def get_options(self):
+options = PipelineOptions([
+'--runner',
+'FlinkRunner',
+'--parallelism',
+'2',
+'--experiment',
+'beam_fn_api'
+])
+return options
+
+  def test_kafkaio_write(self):
+local_kafka_jar = os.environ.get('LOCAL_KAFKA_JAR')
+expansion_service_jar = os.environ.get('EXPANSION_JAR')
+with self.local_services(expansion_service_jar,

Review comment:
   updated.




---

[GitHub] [beam] ihji commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


ihji commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432142992



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(

Review comment:
   updated.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


ihji commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432142695



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(
+  ['java', '-jar', expansion_service_jar_file, expansion_service_port])
+  kafka_server = subprocess.Popen(

Review comment:
   Yes, this is for external testing (probably only for small scale 
correctness tests, we may still need kubernetes cluster for large scale 
performance tests).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jhnmora000 commented on pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality

2020-05-28 Thread GitBox


jhnmora000 commented on pull request #11845:
URL: https://github.com/apache/beam/pull/11845#issuecomment-635624778


   Thanks for your help @amaliujia . I will close this PR and continue 
experimenting with BeamSQL/Calcite. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] jhnmora000 closed pull request #11845: [BEAM-9198] BeamSQL aggregation analytics functionality

2020-05-28 Thread GitBox


jhnmora000 closed pull request #11845:
URL: https://github.com/apache/beam/pull/11845


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635621052







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635620991


   Run Java Flink PortableValidatesRunner Streaming



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635620885







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11853: Update multi-language roadmap status.

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11853:
URL: https://github.com/apache/beam/pull/11853#issuecomment-635620838


   LGTM. Thanks for updating.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] youngoli merged pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.

2020-05-28 Thread GitBox


youngoli merged pull request #11791:
URL: https://github.com/apache/beam/pull/11791


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidcavazos commented on pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices

2020-05-28 Thread GitBox


davidcavazos commented on pull request #11851:
URL: https://github.com/apache/beam/pull/11851#issuecomment-635618216


   Run Python PreCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


TheNeuralBit commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635615410


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] youngoli commented on pull request #11791: [BEAM-9935] Respect allowed split points and fraction in Go.

2020-05-28 Thread GitBox


youngoli commented on pull request #11791:
URL: https://github.com/apache/beam/pull/11791#issuecomment-635615190


   Run Go PostCommit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] robertwb opened a new pull request #11853: Update multi-language roadmap status.

2020-05-28 Thread GitBox


robertwb opened a new pull request #11853:
URL: https://github.com/apache/beam/pull/11853


   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/

[GitHub] [beam] lukecwik commented on pull request #11821: [BEAM-10097, BEAM-5982, BEAM-3080] Use primitive views directly instead of transforming KV> to the view type via a naive ma

2020-05-28 Thread GitBox


lukecwik commented on pull request #11821:
URL: https://github.com/apache/beam/pull/11821#issuecomment-635613803


   Run Spark ValidatesRunner



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-28 Thread GitBox


tvalentyn commented on a change in pull request #11086:
URL: https://github.com/apache/beam/pull/11086#discussion_r432112443



##
File path: sdks/python/apache_beam/io/gcp/big_query_query_to_table_it_test.py
##
@@ -254,11 +256,36 @@ def test_big_query_new_types(self):
 'output_schema': NEW_TYPES_OUTPUT_SCHEMA,
 'use_standard_sql': False,
 'wait_until_finish_duration': WAIT_UNTIL_FINISH_DURATION_MS,
+'use_json_exports': True,
 'on_success_matcher': all_of(*pipeline_verifiers)
 }
 options = self.test_pipeline.get_full_options_as_args(**extra_opts)
 big_query_query_to_table_pipeline.run_bq_pipeline(options)
 
+  @attr('IT')
+  def test_big_query_new_types(self):

Review comment:
   Looks like we run this test internally using this name. Since this test 
now changes to  `use_beam_bq_sink`, please check that  maintain reasonable 
internal test coverage for the native sink, or use a new test name for new 
behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-28 Thread GitBox


tvalentyn commented on pull request #11661:
URL: https://github.com/apache/beam/pull/11661#issuecomment-635593321


   @kamilwu - please merge once this looks good to you, I don't have other 
input here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-28 Thread GitBox


tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432107251



##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json
##
@@ -77,7 +77,7 @@
   ],
   "orderByTime": "ASC",
   "policy": "default",
-  "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" 
WHERE metric = 'Python performance test' AND $timeFilter GROUP BY 
time($__interval),  \"metric\"",
+  "query": "SELECT mean(\"value\") FROM \"wordcount_py27_results\" 
WHERE metric = 'wordcount_it_runtime' AND $timeFilter GROUP BY 
time($__interval),  \"metric\"",

Review comment:
   Seeing this query now - yes, I wound just keep the metric 'runtime', 
since we already know it is wordcount_py27_results, and it would be simpler 
that pipeline does not need to know the name of the suite. In the future we 
might add different metrics like 'cost' or total cputime consumed by other 
workers as opposed to runtime.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] chamikaramj commented on pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


chamikaramj commented on pull request #11814:
URL: https://github.com/apache/beam/pull/11814#issuecomment-635591331







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-28 Thread GitBox


tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432106093



##
File path: sdks/python/apache_beam/examples/wordcount_it_test.py
##
@@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts):
 # Register clean up before pipeline execution
 self.addCleanup(delete_files, [test_output + '*'])
 
+publish_to_bq = bool(
+test_pipeline.get_option('publish_to_big_query') or False)
+
+# Start measure time for performance test
+start_time = time.time()
+
 # Get pipeline options from command argument: --test-pipeline-options,
 # and start pipeline job by calling pipeline main function.
 run_wordcount(
 test_pipeline.get_full_options_as_args(**extra_opts),
-save_main_session=False)
+save_main_session=False,
+)
+
+end_time = time.time()
+run_time = end_time - start_time
+
+if publish_to_bq:
+  self._publish_metrics(test_pipeline, run_time)
+
+  def _publish_metrics(self, pipeline, metric_value):
+influx_options = InfluxDBMetricsPublisherOptions(
+pipeline.get_option('influx_measurement'),
+pipeline.get_option('influx_db_name'),
+pipeline.get_option('influx_hostname'),
+os.getenv('INFLUXDB_USER'),
+os.getenv('INFLUXDB_USER_PASSWORD'),
+)
+metric_reader = MetricsReader(
+project_name=pipeline.get_option('project'),
+bq_table=pipeline.get_option('metrics_table'),
+bq_dataset=pipeline.get_option('metrics_dataset'),
+publish_to_bq=True,
+influxdb_options=influx_options,
+)
+
+metric_reader.publish_values((
+metric_value,

Review comment:
   Do we need "wordcount_it" in the metric name? It depends on how these 
metrics will be stored, if they are already associated in the database with a 
test suite that launches the pipeline  `Python WordCount IT Benchmarks`, then 
this information is captured and perhaps we don't need to repeat it in two 
different places. Leaving this up to you and @kamilwu.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #11207: [BEAM-9220] Go Dataflow jobs to use runner v2

2020-05-28 Thread GitBox


lostluck merged pull request #11207:
URL: https://github.com/apache/beam/pull/11207


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rionmonster commented on a change in pull request #11761: [BEAM-10027] Support for Kotlin-based Beam Katas

2020-05-28 Thread GitBox


rionmonster commented on a change in pull request #11761:
URL: https://github.com/apache/beam/pull/11761#discussion_r432104332



##
File path: learning/katas/kotlin/Windowing/Fixed Time Window/Fixed Time 
Window/test/org/apache/beam/learning/katas/windowing/fixedwindow/WindowedEvent.kt
##
@@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.learning.katas.windowing.fixedwindow
+
+import java.io.Serializable
+import java.util.*
+
+class WindowedEvent(private val event: String?, private val count: Long, 
private val window: String) : Serializable {

Review comment:
   Done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] rionmonster commented on a change in pull request #11761: [BEAM-10027] Support for Kotlin-based Beam Katas

2020-05-28 Thread GitBox


rionmonster commented on a change in pull request #11761:
URL: https://github.com/apache/beam/pull/11761#discussion_r432103074



##
File path: learning/katas/kotlin/Core 
Transforms/Combine/CombineFn/src/org/apache/beam/learning/katas/coretransforms/combine/combinefn/Task.kt
##
@@ -0,0 +1,99 @@
+/*
+ *  Licensed to the Apache Software Foundation (ASF) under one
+ *  or more contributor license agreements.  See the NOTICE file
+ *  distributed with this work for additional information
+ *  regarding copyright ownership.  The ASF licenses this file
+ *  to you under the Apache License, Version 2.0 (the
+ *  "License"); you may not use this file except in compliance
+ *  with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *  Unless required by applicable law or agreed to in writing, software
+ *  distributed under the License is distributed on an "AS IS" BASIS,
+ *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *  See the License for the specific language governing permissions and
+ *  limitations under the License.
+ */
+package org.apache.beam.learning.katas.coretransforms.combine.combinefn
+
+import 
org.apache.beam.learning.katas.coretransforms.combine.combinefn.Task.AverageFn.Accum
+import org.apache.beam.learning.katas.util.Log
+import org.apache.beam.sdk.Pipeline
+import org.apache.beam.sdk.options.PipelineOptionsFactory
+import org.apache.beam.sdk.transforms.Combine
+import org.apache.beam.sdk.transforms.Combine.CombineFn
+import org.apache.beam.sdk.transforms.Create
+import org.apache.beam.sdk.values.PCollection
+import java.io.Serializable
+import java.util.*
+
+object Task {
+@JvmStatic
+fun main(args: Array) {
+val options = PipelineOptionsFactory.fromArgs(*args).create()
+val pipeline = Pipeline.create(options)
+
+val numbers = pipeline.apply(Create.of(10, 20, 50, 70, 90))
+
+val output = applyTransform(numbers)
+
+output.apply(Log.ofElements())
+
+pipeline.run()
+}
+
+@JvmStatic
+fun applyTransform(input: PCollection): PCollection {
+return input.apply(Combine.globally(AverageFn()))
+}
+
+internal class AverageFn : CombineFn() {
+
+internal inner class Accum : Serializable {

Review comment:
   Done

##
File path: 
learning/katas/kotlin/util/test/org/apache/beam/learning/katas/util/ContainsKvs.kt
##
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.learning.katas.util
+
+import com.google.common.collect.ImmutableList
+import com.google.common.collect.Iterables
+import org.apache.beam.sdk.transforms.SerializableFunction
+import org.apache.beam.sdk.values.KV
+import org.hamcrest.CoreMatchers
+import org.hamcrest.Matcher
+import org.hamcrest.collection.IsIterableContainingInAnyOrder
+import org.junit.Assert
+import java.util.*
+
+class ContainsKvs private constructor(private val expectedKvs: List>>) : SerializableFunction>>, Void?> {
+
+companion object {
+@SafeVarargs
+fun containsKvs(vararg kvs: KV>): 
SerializableFunction>>, Void?> {
+return ContainsKvs(ImmutableList.copyOf(kvs))
+}
+}
+
+override fun apply(input: Iterable>>): Void? {
+val matchers: MutableList>>> = 
ArrayList()
+
+for (expected in expectedKvs) {
+val values = Iterables.toArray(expected.value, String::class.java)
+
matchers.add(KvMatcher.Companion.isKv(CoreMatchers.equalTo(expected.key), 
IsIterableContainingInAnyOrder.containsInAnyOrder(*values)))

Review comment:
   Done!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] aaltay commented on pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices

2020-05-28 Thread GitBox


aaltay commented on pull request #11851:
URL: https://github.com/apache/beam/pull/11851#issuecomment-635587627


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] tvalentyn commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-28 Thread GitBox


tvalentyn commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432101289



##
File path: 
.test-infra/metrics/grafana/dashboards/perftests_metrics/Python_Performance_Tests.json
##
@@ -0,0 +1,297 @@
+{

Review comment:
   >  exporting a new dashboard to JSON file
   
   Thanks, does this mean: creating a new dashboard manually via UI?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ihji commented on a change in pull request #11814: [BEAM-10078] uniquify Dataflow specific jars when staging

2020-05-28 Thread GitBox


ihji commented on a change in pull request #11814:
URL: https://github.com/apache/beam/pull/11814#discussion_r432099933



##
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
##
@@ -397,10 +397,21 @@ public static PackageAttributes forFileToStage(
 String.format("Non-existent file to stage: %s", 
file.getAbsolutePath()));
   }
   checkState(!file.isDirectory(), "Source file must not be a directory.");
+  String target;
+  switch (dest) {
+case "dataflow-worker.jar":

Review comment:
   Put some comments and logging





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] ibzib opened a new pull request #11852: [BEAM-10107] Remove outdated instructions for website updates in rele…

2020-05-28 Thread GitBox


ibzib opened a new pull request #11852:
URL: https://github.com/apache/beam/pull/11852


   …ase guide.
   
   Just some minor docs cleanup.
   
   R: @TheNeuralBit 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] Update `CHANGES.md` with noteworthy changes.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)

[GitHub] [beam] chamikaramj commented on a change in pull request #11847: [BEAM-10125] adding cross-language KafkaIO integration test

2020-05-28 Thread GitBox


chamikaramj commented on a change in pull request #11847:
URL: https://github.com/apache/beam/pull/11847#discussion_r432083261



##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'localhost:%s' % os.environ.get('EXPANSION_PORT'))
+self.sum_counter = Metrics.counter('source', 'elements_sum')
+
+  def build_write_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'Impulse' >> beam.Impulse()
+| 'Generate' >> beam.FlatMap(lambda x: range(1000)) # pylint: 
disable=range-builtin-not-iterating
+| 'Reshuffle' >> beam.Reshuffle()
+| 'MakeKV' >> beam.Map(lambda x:
+   (b'', str(x).encode())).with_output_types(
+   typing.Tuple[bytes, bytes])
+| 'WriteToKafka' >> WriteToKafka(
+producer_config={'bootstrap.servers': self.bootstrap_servers},
+topic=self.topic,
+expansion_service=self.expansion_service))
+
+  def build_read_pipeline(self, pipeline):
+_ = (
+pipeline
+| 'ReadFromKafka' >> ReadFromKafka(
+consumer_config={
+'bootstrap.servers': self.bootstrap_servers,
+'auto.offset.reset': 'earliest'
+},
+topics=[self.topic],
+expansion_service=self.expansion_service)
+| 'Windowing' >> beam.WindowInto(
+beam.window.FixedWindows(300),
+trigger=beam.transforms.trigger.AfterProcessingTime(60),
+accumulation_mode=beam.transforms.trigger.AccumulationMode.
+DISCARDING)
+| 'DecodingValue' >> beam.Map(lambda elem: int(elem[1].decode()))
+| 'CombineGlobally' >> beam.CombineGlobally(sum).without_defaults()
+| 'SetSumCounter' >> beam.Map(self.sum_counter.inc))
+
+  def run_xlang_kafkaio(self, pipeline):
+self.build_write_pipeline(pipeline)
+self.build_read_pipeline(pipeline)
+pipeline.run(False)
+
+
+@unittest.skipUnless(
+os.environ.get('LOCAL_KAFKA_JAR'),
+"LOCAL_KAFKA_JAR environment var is not provided.")
+@unittest.skipUnless(
+os.environ.get('EXPANSION_JAR'),
+"EXPANSION_JAR environment var is not provided.")
+class CrossLanguageKafkaIOTest(unittest.TestCase):
+  def get_open_port(self):
+s = None
+try:
+  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
+except:  # pylint: disable=bare-except
+  # Above call will fail for nodes that only support IPv6.
+  s = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
+s.bind(('localhost', 0))
+s.listen(1)
+port = s.getsockname()[1]
+s.close()
+return port
+
+  @contextlib.contextmanager
+  def local_services(self, expansion_service_jar_file, local_kafka_jar_file):
+expansion_service_port = str(self.get_open_port())
+kafka_port = str(self.get_open_port())
+zookeeper_port = str(self.get_open_port())
+
+expansion_server = None
+kafka_server = None
+try:
+  expansion_server = subprocess.Popen(

Review comment:
   Is this only for internal testing ?
   Externally, kafkaio.py should automatically startup an expansion service as 
long as we are in a release or a Beam repo clone.

##
File path: sdks/python/apache_beam/io/external/xlang_kafkaio_it_test.py
##
@@ -0,0 +1,145 @@
+"""Integration test for Python cross-language pipelines for Java KafkaIO."""
+
+from __future__ import absolute_import
+
+import contextlib
+import logging
+import os
+import socket
+import subprocess
+import time
+import typing
+import unittest
+
+import grpc
+
+import apache_beam as beam
+from apache_beam.io.external.kafka import ReadFromKafka
+from apache_beam.io.external.kafka import WriteToKafka
+from apache_beam.metrics import Metrics
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.testing.test_pipeline import TestPipeline
+
+
+class CrossLanguageKafkaIO(object):
+  def __init__(self, bootstrap_servers, topic, expansion_service=None):
+self.bootstrap_servers = bootstrap_servers
+self.topic = topic
+self.expansion_service = expansion_service or (
+'l

[GitHub] [beam] robertwb commented on a change in pull request #11835: Various fixes to allow Java PAssert to run on Python

2020-05-28 Thread GitBox


robertwb commented on a change in pull request #11835:
URL: https://github.com/apache/beam/pull/11835#discussion_r432093237



##
File path: sdks/python/apache_beam/transforms/trigger_test.py
##
@@ -518,6 +519,28 @@ def format_result(k_v):
   'B-3': {10, 15, 16},
   }.items(
 
+  def test_never(self):
+with TestPipeline() as p:
+
+  def construct_timestamped(k_t):
+return TimestampedValue((k_t[0], k_t[1]), k_t[1])
+
+  def format_result(k_v):
+return ('%s-%s' % (k_v[0], len(k_v[1])), set(k_v[1]))
+
+  result = (
+  p
+  | beam.Create([1, 1, 2, 3, 4, 5, 10, 11])
+  | beam.FlatMap(lambda t: [('A', t), ('B', t + 5)])
+  | beam.Map(construct_timestamped)
+  | beam.WindowInto(
+  FixedWindows(10),
+  trigger=Never(),
+  accumulation_mode=AccumulationMode.DISCARDING)
+  | beam.GroupByKey()
+  | beam.Map(format_result))
+  assert_that(result, equal_to([]))

Review comment:
   Ack. (Interestingly, I originally was expecting it to fire at EoW.) The 
ULR doesn't yet support gc timers. I can look into fixing this (or at least 
making the "Never" trigger correct). 
   
   As an aside, the non-trivial triggering and windowing in PAssert makes it 
less than ideal for validating new runners. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem removed a comment on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-28 Thread GitBox


pabloem removed a comment on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-626066297







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck merged pull request #11806: [BEAM-9679] Flatten Kata for Go

2020-05-28 Thread GitBox


lostluck merged pull request #11806:
URL: https://github.com/apache/beam/pull/11806


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lostluck commented on pull request #11806: [BEAM-9679] Flatten Kata for Go

2020-05-28 Thread GitBox


lostluck commented on pull request #11806:
URL: https://github.com/apache/beam/pull/11806#issuecomment-635568870


   @damondouglas  That sounds correct to me as well, in order to avoid 
colliding stepik updates.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] pabloem removed a comment on pull request #11086: [BEAM-8910] Make custom BQ source read from Avro

2020-05-28 Thread GitBox


pabloem removed a comment on pull request #11086:
URL: https://github.com/apache/beam/pull/11086#issuecomment-634947671







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] damondouglas commented on pull request #11806: [BEAM-9679] Flatten Kata for Go

2020-05-28 Thread GitBox


damondouglas commented on pull request #11806:
URL: https://github.com/apache/beam/pull/11806#issuecomment-635563478


   @henryken Just confirming these steps:
   
   1. @lostluck merges this PR #11806 to master
   
   1. @damondouglas merges new changes from PR #11806 to PR #11803 
   
   1. @damondouglas uploads to [Stepik](https://stepik.org/course/70387)
   
   1. @lostluck merges PR #11803 to master



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] TheNeuralBit commented on pull request #11777: [BEAM-10054] Fix watermark hold for on_time_pane

2020-05-28 Thread GitBox


TheNeuralBit commented on pull request #11777:
URL: https://github.com/apache/beam/pull/11777#issuecomment-635549457


   Are those tests sufficient though if they're passing before this PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] kennknowles commented on a change in pull request #11835: Various fixes to allow Java PAssert to run on Python

2020-05-28 Thread GitBox


kennknowles commented on a change in pull request #11835:
URL: https://github.com/apache/beam/pull/11835#discussion_r432064924



##
File path: sdks/python/apache_beam/transforms/trigger_test.py
##
@@ -518,6 +519,28 @@ def format_result(k_v):
   'B-3': {10, 15, 16},
   }.items(
 
+  def test_never(self):
+with TestPipeline() as p:
+
+  def construct_timestamped(k_t):
+return TimestampedValue((k_t[0], k_t[1]), k_t[1])
+
+  def format_result(k_v):
+return ('%s-%s' % (k_v[0], len(k_v[1])), set(k_v[1]))
+
+  result = (
+  p
+  | beam.Create([1, 1, 2, 3, 4, 5, 10, 11])
+  | beam.FlatMap(lambda t: [('A', t), ('B', t + 5)])
+  | beam.Map(construct_timestamped)
+  | beam.WindowInto(
+  FixedWindows(10),
+  trigger=Never(),
+  accumulation_mode=AccumulationMode.DISCARDING)
+  | beam.GroupByKey()
+  | beam.Map(format_result))
+  assert_that(result, equal_to([]))

Review comment:
   Never trigger is misnamed. It means that the trigger itself never fires, 
but that at window GC output is produced. That's why PAssert uses it to gather 
the full result.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-28 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432061809



##
File path: sdks/python/apache_beam/examples/wordcount_it_test.py
##
@@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts):
 # Register clean up before pipeline execution
 self.addCleanup(delete_files, [test_output + '*'])
 
+publish_to_bq = bool(
+test_pipeline.get_option('publish_to_big_query') or False)
+
+# Start measure time for performance test
+start_time = time.time()
+
 # Get pipeline options from command argument: --test-pipeline-options,
 # and start pipeline job by calling pipeline main function.
 run_wordcount(
 test_pipeline.get_full_options_as_args(**extra_opts),
-save_main_session=False)
+save_main_session=False,
+)
+
+end_time = time.time()
+run_time = end_time - start_time
+
+if publish_to_bq:
+  self._publish_metrics(test_pipeline, run_time)
+
+  def _publish_metrics(self, pipeline, metric_value):
+influx_options = InfluxDBMetricsPublisherOptions(
+pipeline.get_option('influx_measurement'),
+pipeline.get_option('influx_db_name'),
+pipeline.get_option('influx_hostname'),
+os.getenv('INFLUXDB_USER'),
+os.getenv('INFLUXDB_USER_PASSWORD'),
+)
+metric_reader = MetricsReader(
+project_name=pipeline.get_option('project'),
+bq_table=pipeline.get_option('metrics_table'),
+bq_dataset=pipeline.get_option('metrics_dataset'),
+publish_to_bq=True,
+influxdb_options=influx_options,
+)
+
+metric_reader.publish_values((
+metric_value,

Review comment:
   Good point, I changed it to wordcount_it_runtime and the order of key, 
value.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] piotr-szuberski commented on a change in pull request #11661: [BEAM-7774] Remove perfkit benchmarking tool from python performance …

2020-05-28 Thread GitBox


piotr-szuberski commented on a change in pull request #11661:
URL: https://github.com/apache/beam/pull/11661#discussion_r432061809



##
File path: sdks/python/apache_beam/examples/wordcount_it_test.py
##
@@ -84,11 +87,45 @@ def _run_wordcount_it(self, run_wordcount, **opts):
 # Register clean up before pipeline execution
 self.addCleanup(delete_files, [test_output + '*'])
 
+publish_to_bq = bool(
+test_pipeline.get_option('publish_to_big_query') or False)
+
+# Start measure time for performance test
+start_time = time.time()
+
 # Get pipeline options from command argument: --test-pipeline-options,
 # and start pipeline job by calling pipeline main function.
 run_wordcount(
 test_pipeline.get_full_options_as_args(**extra_opts),
-save_main_session=False)
+save_main_session=False,
+)
+
+end_time = time.time()
+run_time = end_time - start_time
+
+if publish_to_bq:
+  self._publish_metrics(test_pipeline, run_time)
+
+  def _publish_metrics(self, pipeline, metric_value):
+influx_options = InfluxDBMetricsPublisherOptions(
+pipeline.get_option('influx_measurement'),
+pipeline.get_option('influx_db_name'),
+pipeline.get_option('influx_hostname'),
+os.getenv('INFLUXDB_USER'),
+os.getenv('INFLUXDB_USER_PASSWORD'),
+)
+metric_reader = MetricsReader(
+project_name=pipeline.get_option('project'),
+bq_table=pipeline.get_option('metrics_table'),
+bq_dataset=pipeline.get_option('metrics_dataset'),
+publish_to_bq=True,
+influxdb_options=influx_options,
+)
+
+metric_reader.publish_values((
+metric_value,

Review comment:
   Good point.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidcavazos commented on a change in pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices

2020-05-28 Thread GitBox


davidcavazos commented on a change in pull request #11851:
URL: https://github.com/apache/beam/pull/11851#discussion_r432049685



##
File path: sdks/python/apache_beam/examples/snippets/snippets.py
##
@@ -226,35 +227,33 @@ def _add_argparse_args(cls, parser):
 
   # [END pipeline_options_define_custom]
 
-  from apache_beam.options.pipeline_options import GoogleCloudOptions
-  from apache_beam.options.pipeline_options import StandardOptions
-
   # [START pipeline_options_dataflow_service]
-  # Create and set your PipelineOptions.
-  options = PipelineOptions(flags=argv)
+  import apache_beam as beam
+  from apache_beam.options.pipeline_options import PipelineOptions
 
+  # Create and set your PipelineOptions.
   # For Cloud execution, specify DataflowRunner and set the Cloud Platform
-  # project, job name, staging file location, temp file location, and region.
-  options.view_as(StandardOptions).runner = 'DataflowRunner'
-  google_cloud_options = options.view_as(GoogleCloudOptions)
-  google_cloud_options.project = 'my-project-id'
-  google_cloud_options.job_name = 'myjob'
-  google_cloud_options.staging_location = 'gs://my-bucket/binaries'
-  google_cloud_options.temp_location = 'gs://my-bucket/temp'
-  google_cloud_options.region = 'us-central1'
+  # project, job name, temporary files location, and region.
+  # For more information about regions, check:
+  # https://cloud.google.com/dataflow/docs/concepts/regional-endpoints
+  options = PipelineOptions(
+  flags=argv,
+  runner='DataflowRunner',
+  project='my-project-id',
+  job_name='unique-job-name',
+  temp_location='gs://my-bucket/temp',
+  region='us-central1')
 
   # Create the Pipeline with the specified options.
-  p = Pipeline(options=options)
+  # with beam.Pipeline(options=options) as pipeline:

Review comment:
   *Note:* This is commented out because if we leave it uncommented, even 
if it doesn't do anything, it makes the test fail with an error. But I still 
wanted it here for reference.
   
   ```
   subprocess.CalledProcessError: Command 
'['/Users/dcavazos/src/beam/env/bin/python', '-m', 'pip', 'download', '--dest', 
'/var/folders/z2/zp_k4l5n2cq84fsn4y633mg400dsyy/T/tmpdv09ddqk', 
'apache-beam==2.22.0.dev0', '--no-deps', '--no-binary', ':all:']' returned 
non-zero exit status 1.
   
   Pip install failed for package: apache-beam==2.22.0.dev0   
   Output from execution of subprocess: b''
   
   ERROR: Could not find a version that satisfies the requirement 
apache-beam==2.22.0.dev0 (from versions: 0.6.0, 2.0.0, 2.1.0, 2.1.1, 2.2.0, 
2.11.0, 2.12.0, 2.13.0, 2.14.0, 2.15.0, 2.16.0, 2.17.0, 2.18.0, 2.19.0, 2.20.0, 
2.21.0)
   ERROR: No matching distribution found for apache-beam==2.22.0.dev0
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] lukecwik commented on a change in pull request #11792: WIP: Add ValidatesRunner task for local_job_service and Java SDK harness

2020-05-28 Thread GitBox


lukecwik commented on a change in pull request #11792:
URL: https://github.com/apache/beam/pull/11792#discussion_r432050850



##
File path: runners/portability/java/build.gradle
##
@@ -31,9 +45,123 @@ dependencies {
   compile project(path: ":sdks:java:harness", configuration: "shadow")
   compile library.java.vendored_grpc_1_26_0
   compile library.java.slf4j_api
+
   testCompile project(path: ":runners:core-construction-java", configuration: 
"testRuntime")
   testCompile library.java.hamcrest_core
   testCompile library.java.junit
   testCompile library.java.mockito_core
   testCompile library.java.slf4j_jdk14
+
+  validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
+  validatesRunner project(path: ":runners:core-java", configuration: 
"testRuntime")
+  validatesRunner project(path: project.path, configuration: "testRuntime")
+}
+
+
+project.evaluationDependsOn(":sdks:java:core")
+project.evaluationDependsOn(":runners:core-java")
+
+ext.virtualenvDir = "${project.buildDir}/virtualenv"
+ext.localJobServicePidFile = "${project.buildDir}/local_job_service_pid"
+ext.localJobServicePortFile = project.hasProperty("localJobServicePortFile") ? 
project.property("localJobServicePortFile") : 
"${project.buildDir}/local_job_service_port"
+ext.localJobServiceStdoutFile = "${project.buildDir}/local_job_service_stdout"
+
+ext.pythonSdkDir = "${project.rootDir}/sdks/python"
+
+void execInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  exec {
+workingDir pythonSdkDir
+commandLine "sh", "-c", shellCommand
+  }
 }
+
+void execBackgroundInVirtualenv(String... args) {
+  String shellCommand = ". ${virtualenvDir}/bin/activate && " + args.collect { 
arg -> "'" + arg.replaceAll("'", "\\'") + "'" }.join(" ")
+  println "execBackgroundInVirtualEnv: ${shellCommand}"
+  ProcessBuilder pb = new 
ProcessBuilder().redirectErrorStream(true).directory(new 
File(pythonSdkDir)).command(["sh", "-c", shellCommand])
+  Process proc = pb.start();
+
+  // redirectIO does not work for connecting to groovy/gradle stdout
+  BufferedReader reader = new BufferedReader(new 
InputStreamReader(proc.getInputStream()));
+  String line
+  while ((line = reader.readLine()) != null) {
+println line
+  }
+  proc.waitFor();
+}
+
+task virtualenv {

Review comment:
   I wouldn't want Beam to move to a build setup where each gradle file 
does its own thing because the fragmentation will hurt debugging build issues 
and slow down rolling out build changes that impact more then one project.
   
   One example where we decided to split a common setup was between releasing 
java projects and releasing vendored projects which lead to fixes that weren't 
done in both places leading to bugs that lasted for months.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [beam] davidcavazos commented on a change in pull request #11851: [BEAM-10144] Update PipelineOptions snippets for best practices

2020-05-28 Thread GitBox


davidcavazos commented on a change in pull request #11851:
URL: https://github.com/apache/beam/pull/11851#discussion_r432049685



##
File path: sdks/python/apache_beam/examples/snippets/snippets.py
##
@@ -226,35 +227,33 @@ def _add_argparse_args(cls, parser):
 
   # [END pipeline_options_define_custom]
 
-  from apache_beam.options.pipeline_options import GoogleCloudOptions
-  from apache_beam.options.pipeline_options import StandardOptions
-
   # [START pipeline_options_dataflow_service]
-  # Create and set your PipelineOptions.
-  options = PipelineOptions(flags=argv)
+  import apache_beam as beam
+  from apache_beam.options.pipeline_options import PipelineOptions
 
+  # Create and set your PipelineOptions.
   # For Cloud execution, specify DataflowRunner and set the Cloud Platform
-  # project, job name, staging file location, temp file location, and region.
-  options.view_as(StandardOptions).runner = 'DataflowRunner'
-  google_cloud_options = options.view_as(GoogleCloudOptions)
-  google_cloud_options.project = 'my-project-id'
-  google_cloud_options.job_name = 'myjob'
-  google_cloud_options.staging_location = 'gs://my-bucket/binaries'
-  google_cloud_options.temp_location = 'gs://my-bucket/temp'
-  google_cloud_options.region = 'us-central1'
+  # project, job name, temporary files location, and region.
+  # For more information about regions, check:
+  # https://cloud.google.com/dataflow/docs/concepts/regional-endpoints
+  options = PipelineOptions(
+  flags=argv,
+  runner='DataflowRunner',
+  project='my-project-id',
+  job_name='unique-job-name',
+  temp_location='gs://my-bucket/temp',
+  region='us-central1')
 
   # Create the Pipeline with the specified options.
-  p = Pipeline(options=options)
+  # with beam.Pipeline(options=options) as pipeline:

Review comment:
   *Note:* This is commented out because if we leave it uncommented, even 
if it doesn't do anything, it makes the test fail with an error. But I still 
wanted it here for reference.
   
   ```
   subprocess.CalledProcessError: Command 
'['/Users/dcavazos/src/beam/env/bin/python', '-m', 'pip', 'download', '--dest', 
'/var/folders/z2/zp_k4l5n2cq84fsn4y633mg400dsyy/T/tmpdv09ddqk', 
'apache-beam==2.22.0.dev0', '--no-deps', '--no-binary', ':all:']' returned 
non-zero exit status 1.
   
   Pip install failed for package: apache-beam==2.22.0.dev0   
   Output from execution of subprocess: b''
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




  1   2   >