[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=269660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269660
 ]

ASF GitHub Bot logged work on BEAM-7424:


Author: ASF GitHub Bot
Created on: 29/Jun/19 01:34
Start Date: 29/Jun/19 01:34
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8933: 
[BEAM-7424] Retry HTTP 429 errors from GCS
URL: https://github.com/apache/beam/pull/8933
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269660)
Time Spent: 3.5h  (was: 3h 20m)

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269655=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269655
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 29/Jun/19 01:16
Start Date: 29/Jun/19 01:16
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8959: [BEAM-7548] 
Cherry pick - fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8959#issuecomment-506915253
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269655)
Time Spent: 8h 20m  (was: 8h 10m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Updated] (BEAM-7657) sdk worker parallelism comments are misleading

2019-06-28 Thread Kyle Weaver (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-7657:
--
Description: 
The SDK worker parallelism arg is set two places, in pipeline options [1] [2] 
and the job server driver [3].

 
{noformat}
if pipeline.sdk_worker_parallelism > 0:
    pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
    if jobServerDriver.sdkWorkerParallelism > 0:
        jobServerDriver.sdkWorkerParallelism is used.
    elif jobServerDriver.sdkWorkerParallelism == 0:
the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java 
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so 
the comment "If 0, it will be automatically set by looking at different 
parameters.." is misleading, and actually only true if 
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.

[1] 
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]

[2] 
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]

[3] 
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]

  was:
The SDK worker parallelism arg is set two places, in pipeline options [1] [2] 
and the job server driver [3].

 
{noformat}
if pipeline.sdk_worker_parallelism > 0:
    pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
    if jobServerDriver.sdkWorkerParallelism > 0:
        jobServerDriver.sdkWorkerParallelism is used.
    else:
the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java 
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so 
the comment "If 0, it will be automatically set by looking at different 
parameters.." is misleading, and actually only true if 
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.

[1] 
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]

[2] 
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]

[3] 
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]


> sdk worker parallelism comments are misleading
> --
>
> Key: BEAM-7657
> URL: https://issues.apache.org/jira/browse/BEAM-7657
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>
> The SDK worker parallelism arg is set two places, in pipeline options [1] [2] 
> and the job server driver [3].
>  
> {noformat}
> if pipeline.sdk_worker_parallelism > 0:
>     pipeline.sdk_worker_parallelism is used.
> elif pipeline.sdk_worker_parallelism == 0:
>     if jobServerDriver.sdkWorkerParallelism > 0:
>         jobServerDriver.sdkWorkerParallelism is used.
>     elif jobServerDriver.sdkWorkerParallelism == 0:
> the runner chooses parallelism based on cores available.
> {noformat}
> Somewhat confusingly, the default is 0 for python pipelines, but 1 for java 
> pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so 
> the comment "If 0, it will be automatically set by looking at different 
> parameters.." is misleading, and actually only true if 
> jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.
> [1] 
> [https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]
> [2] 
> [https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]
> [3] 
> [https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7657) sdk worker parallelism comments are misleading

2019-06-28 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7657:
-

 Summary: sdk worker parallelism comments are misleading
 Key: BEAM-7657
 URL: https://issues.apache.org/jira/browse/BEAM-7657
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver


The SDK worker parallelism arg is set two places, in pipeline options [1] [2] 
and the job server driver [3].

 
{noformat}
if pipeline.sdk_worker_parallelism > 0:
    pipeline.sdk_worker_parallelism is used.
elif pipeline.sdk_worker_parallelism == 0:
    if jobServerDriver.sdkWorkerParallelism > 0:
        jobServerDriver.sdkWorkerParallelism is used.
    else:
the runner chooses parallelism based on cores available.
{noformat}
Somewhat confusingly, the default is 0 for python pipelines, but 1 for java 
pipelines. But anyway, jobServerDriver.sdkWorkerParallelism defaults to 1, so 
the comment "If 0, it will be automatically set by looking at different 
parameters.." is misleading, and actually only true if 
jobServerDriver.sdkWorkerParallelism was explicitly set to 0 as well.

[1] 
[https://github.com/apache/beam/blob/8b379b475a3c838eb12e9b7809ebd8f386095962/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69-L74]

[2] 
[https://github.com/apache/beam/blob/37b76b67b5d0cbd92e6a3fadee67f9fcf93cbc5d/sdks/python/apache_beam/options/pipeline_options.py#L805-L810]

[3] 
[https://github.com/apache/beam/blob/f3623e8ba2257f7659ccb312dc2574f862ef41b5/runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/JobServerDriver.java#L97-L103]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7656) Add sdk-worker-parallelism arg to flink job server shadow jar

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7656?focusedWorklogId=269625=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269625
 ]

ASF GitHub Bot logged work on BEAM-7656:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:32
Start Date: 28/Jun/19 23:32
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8967: [BEAM-7656] Add 
sdk-worker-parallelism arg to flink job server shadow…
URL: https://github.com/apache/beam/pull/8967
 
 
   … jar
   
   R: @angoenka 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Created] (BEAM-7656) Add sdk-worker-parallelism arg to flink job server shadow jar

2019-06-28 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7656:
-

 Summary: Add sdk-worker-parallelism arg to flink job server shadow 
jar
 Key: BEAM-7656
 URL: https://issues.apache.org/jira/browse/BEAM-7656
 Project: Beam
  Issue Type: Improvement
  Components: runner-flink
Reporter: Kyle Weaver
Assignee: Kyle Weaver


It's unfortunate we have to manually add these args, but /shrug



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6692) Spark Translator - RESHUFFLE_URN

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6692?focusedWorklogId=269624=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269624
 ]

ASF GitHub Bot logged work on BEAM-6692:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:25
Start Date: 28/Jun/19 23:25
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8966: [BEAM-6692] 
portable Spark: reshuffle translation
URL: https://github.com/apache/beam/pull/8966#discussion_r298773945
 
 

 ##
 File path: 
runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java
 ##
 @@ -418,4 +442,14 @@ private static String getOutputId(PTransformNode 
transformNode) {
   private static String getExecutableStageIntermediateId(PTransformNode 
transformNode) {
 return transformNode.getId();
   }
+
+  /** Predicate to determine whether a URN is a Spark native transform. */
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsSparkNativeTransform implements 
NativeTransforms.IsNativeTransform {
 
 Review comment:
   Interesting, still curious about it, any ideas @angoenka or @mxm ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269624)
Time Spent: 50m  (was: 40m)

> Spark Translator - RESHUFFLE_URN
> 
>
> Key: BEAM-6692
> URL: https://issues.apache.org/jira/browse/BEAM-6692
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Ankur Goenka
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6692) Spark Translator - RESHUFFLE_URN

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6692?focusedWorklogId=269623=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269623
 ]

ASF GitHub Bot logged work on BEAM-6692:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:24
Start Date: 28/Jun/19 23:24
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8966: [BEAM-6692] 
portable Spark: reshuffle translation
URL: https://github.com/apache/beam/pull/8966#discussion_r298773844
 
 

 ##
 File path: 
runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java
 ##
 @@ -418,4 +442,14 @@ private static String getOutputId(PTransformNode 
transformNode) {
   private static String getExecutableStageIntermediateId(PTransformNode 
transformNode) {
 return transformNode.getId();
   }
+
+  /** Predicate to determine whether a URN is a Spark native transform. */
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsSparkNativeTransform implements 
NativeTransforms.IsNativeTransform {
 
 Review comment:
   I didn't dig into why, but the portable pipeline fuser complains unless we 
mark Reshuffle as a native (or "primitive") transform. 
https://github.com/apache/beam/blob/c565881b3041730f4e1206ed8404e4b0317e5037/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/QueryablePipeline.java#L231-L235
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269623)
Time Spent: 40m  (was: 0.5h)

> Spark Translator - RESHUFFLE_URN
> 
>
> Key: BEAM-6692
> URL: https://issues.apache.org/jira/browse/BEAM-6692
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Ankur Goenka
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7655) Multiple Instances of Beam Table During Query Planning

2019-06-28 Thread Alireza Samadianzakaria (JIRA)
Alireza Samadianzakaria created BEAM-7655:
-

 Summary: Multiple Instances of Beam Table During Query Planning
 Key: BEAM-7655
 URL: https://issues.apache.org/jira/browse/BEAM-7655
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Alireza Samadianzakaria


When Calcite is planning the query it may ask the table provider for the same 
table multiple times and in different alternative query plans there might be 
different instances of the same table.

Since the row count estimation is stored in the table instances, each time that 
the row count estimation is called for a new instance, the table creates a new 
estimate. The estimation may take some time; therefore, this can potentially 
downgrade the performance (of planning) and increase the planning time.

There are two potential ways to solve this problem:

1- Make sure that the table providers do not create multiple instances for the 
same table. 

or

2- Keep the row count estimations in a common data structure or a static Map 
and reuse it in multiple instances when needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6692) Spark Translator - RESHUFFLE_URN

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6692?focusedWorklogId=269621=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269621
 ]

ASF GitHub Bot logged work on BEAM-6692:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:18
Start Date: 28/Jun/19 23:18
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8966: [BEAM-6692] 
portable Spark: reshuffle translation
URL: https://github.com/apache/beam/pull/8966#discussion_r298773004
 
 

 ##
 File path: 
runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkBatchPortablePipelineTranslator.java
 ##
 @@ -418,4 +442,14 @@ private static String getOutputId(PTransformNode 
transformNode) {
   private static String getExecutableStageIntermediateId(PTransformNode 
transformNode) {
 return transformNode.getId();
   }
+
+  /** Predicate to determine whether a URN is a Spark native transform. */
+  @AutoService(NativeTransforms.IsNativeTransform.class)
+  public static class IsSparkNativeTransform implements 
NativeTransforms.IsNativeTransform {
 
 Review comment:
   This isn't used, is it? I saw Flink does this too. But I don't get what is 
the issue with a service for this?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269621)
Time Spent: 0.5h  (was: 20m)

> Spark Translator - RESHUFFLE_URN
> 
>
> Key: BEAM-6692
> URL: https://issues.apache.org/jira/browse/BEAM-6692
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Ankur Goenka
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6692) Spark Translator - RESHUFFLE_URN

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6692?focusedWorklogId=269615=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269615
 ]

ASF GitHub Bot logged work on BEAM-6692:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:10
Start Date: 28/Jun/19 23:10
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8966: [BEAM-6692] portable 
Spark: reshuffle translation
URL: https://github.com/apache/beam/pull/8966#issuecomment-506901982
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269615)
Time Spent: 20m  (was: 10m)

> Spark Translator - RESHUFFLE_URN
> 
>
> Key: BEAM-6692
> URL: https://issues.apache.org/jira/browse/BEAM-6692
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Ankur Goenka
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/BEAM-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875288#comment-16875288
 ] 

Ismaël Mejía commented on BEAM-7589:


PR was merged, [~Brachi] you can try with the tomorrow's SNAPSHOTS and confirm 
if it fixes your issue? Thanks.

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269613
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:06
Start Date: 28/Jun/19 23:06
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269613)
Time Spent: 5h 20m  (was: 5h 10m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269612=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269612
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:06
Start Date: 28/Jun/19 23:06
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8955: [BEAM-7589] Use only 
one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#issuecomment-506901442
 
 
   Merged manually to fix a typo in the commit message and squash the extra 
review commit.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269612)
Time Spent: 5h 10m  (was: 5h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269605=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269605
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:02
Start Date: 28/Jun/19 23:02
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506900696
 
 
   Thank you @chamikaramj .
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269605)
Time Spent: 8h  (was: 7h 50m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269606=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269606
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 23:02
Start Date: 28/Jun/19 23:02
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8959: [BEAM-7548] 
Cherry pick - fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8959#issuecomment-506900779
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269606)
Time Spent: 8h 10m  (was: 8h)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=269600=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269600
 ]

ASF GitHub Bot logged work on BEAM-7424:


Author: ASF GitHub Bot
Created on: 28/Jun/19 22:45
Start Date: 28/Jun/19 22:45
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8933: [BEAM-7424] Retry 
HTTP 429 errors from GCS
URL: https://github.com/apache/beam/pull/8933#issuecomment-506898040
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269600)
Time Spent: 3h 20m  (was: 3h 10m)

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269597
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 22:41
Start Date: 28/Jun/19 22:41
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8960: 
[BEAM-7548] Fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269597)
Time Spent: 7h 50m  (was: 7h 40m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = self.worker.do_instruction(request)
> 11:57:47   File 

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=269577=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269577
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 28/Jun/19 22:16
Start Date: 28/Jun/19 22:16
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output 
the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-506892828
 
 
   Sorry for taking so long but I think in this case we should always output 
`max(element timestamp, reader timestamp)` and not update the BoundedSource API 
to return `unknown`.
   
   This way ReadAllViaFileBasedSource will output the current reader timestamp 
in the bounded case since the input will be `-INF` and in the streaming case 
the user will have to make sure that the input timestamp is always <= the 
timestamp of all the records stored in the BoundedSource.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269577)
Time Spent: 6h  (was: 5h 50m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6692) Spark Translator - RESHUFFLE_URN

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6692?focusedWorklogId=269576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269576
 ]

ASF GitHub Bot logged work on BEAM-6692:


Author: ASF GitHub Bot
Created on: 28/Jun/19 22:15
Start Date: 28/Jun/19 22:15
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #8966: [BEAM-6692] 
portable Spark: reshuffle translation
URL: https://github.com/apache/beam/pull/8966
 
 
   R: @iemejia 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
  [![Build 

[jira] [Work logged] (BEAM-7428) ReadAllViaFileBasedSource does not output the timestamps of the read elements

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7428?focusedWorklogId=269574=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269574
 ]

ASF GitHub Bot logged work on BEAM-7428:


Author: ASF GitHub Bot
Created on: 28/Jun/19 22:13
Start Date: 28/Jun/19 22:13
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8741: [BEAM-7428] Output 
the timestamp on elements in ReadAllViaFileBasedSource
URL: https://github.com/apache/beam/pull/8741#issuecomment-506892324
 
 
   The trouble I have is that advancing the output watermark for the specific 
element+restriction doesn't really provide much since the runner needs to be 
able to maintain that the `output watermark <= input watermark` since it is the 
only thing that knows what future element+restriction may be produced. In the 
case when the input to a SplittableDoFn is bounded, once all the 
elements+restrictions are being processed, the runner can then use the min 
watermark of element+restrictions as the output watermark.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269574)
Time Spent: 5h 50m  (was: 5h 40m)

> ReadAllViaFileBasedSource does not output the timestamps of the read elements
> -
>
> Key: BEAM-7428
> URL: https://issues.apache.org/jira/browse/BEAM-7428
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This differs from the implementation of JavaReadViaImpulse that tackles a 
> similar problem but does output the timestamps correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269566
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:52
Start Date: 28/Jun/19 21:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506887940
 
 
   Thanks. Will merge after tests pass.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269566)
Time Spent: 7h 40m  (was: 7.5h)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269565
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:52
Start Date: 28/Jun/19 21:52
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506887851
 
 
   Run RAT PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269565)
Time Spent: 7.5h  (was: 7h 20m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7.5h
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269564
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:51
Start Date: 28/Jun/19 21:51
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506887815
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269564)
Time Spent: 7h 20m  (was: 7h 10m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Resolved] (BEAM-7326) Document that Beam BigQuery IO expects users to pass base64-encoded bytes, and BQ IO serves base64-encoded bytes to the user.

2019-06-28 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev resolved BEAM-7326.
---
   Resolution: Fixed
 Assignee: Juta Staes
Fix Version/s: Not applicable

> Document that Beam BigQuery IO expects users to pass base64-encoded bytes, 
> and BQ IO serves base64-encoded bytes to the user.
> -
>
> Key: BEAM-7326
> URL: https://issues.apache.org/jira/browse/BEAM-7326
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp
>Reporter: Valentyn Tymofieiev
>Assignee: Juta Staes
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache 
> Beam BigQuery IO connector.
> Current implementation of BigQuery connector in Java and Python SDKs expects 
> that users base64-encode bytes before passing them to BigQuery IO, see 
> discussion on dev: [1] 
> This needs to be reflected in public documentation, see [2-4]
> cc: [~juta] [~chamikara] [~pabloem] 
> cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be 
> done for Go SDK and/or Beam SQL.
> [1] 
> https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3] 
> https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html
> [4] 
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269557=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269557
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:41
Start Date: 28/Jun/19 21:41
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506885590
 
 
   resolved.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269557)
Time Spent: 7h 10m  (was: 7h)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=269555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269555
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:36
Start Date: 28/Jun/19 21:36
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8902: 
[BEAM-7389] Add Python snippet for Map transform
URL: https://github.com/apache/beam/pull/8902
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269555)
Time Spent: 18h 50m  (was: 18h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7535) Create Jenkins jobs for BQ performance tests

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7535?focusedWorklogId=269552=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269552
 ]

ASF GitHub Bot logged work on BEAM-7535:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:35
Start Date: 28/Jun/19 21:35
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #8874: [BEAM-7535] 
Created Jenkins jobs for BQ performance tests
URL: https://github.com/apache/beam/pull/8874#discussion_r298755029
 
 

 ##
 File path: .test-infra/jenkins/job_PerformanceTests_BigQueryIO_Python.groovy
 ##
 @@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+import LoadTestsBuilder as loadTestsBuilder
+import PhraseTriggeringPostCommitBuilder
+
+def now = new Date().format("MMddHHmmss", TimeZone.getTimeZone('UTC'))
+
+def bqio_read_test = [
+title: 'BigQueryIO Read Performance Test Python 10 GB',
+itClass  : 
'apache_beam.io.gcp.bigquery_read_perf_test:BigQueryReadPerfTest.test',
+runner   : CommonTestProperties.Runner.DATAFLOW,
+jobProperties: [
+job_name : 
'performance-tests-bqio-read-python-10gb' + now,
+project  : 'apache-beam-testing',
+temp_location: 
'gs://temp-storage-for-perf-tests/loadtests',
+input_dataset: 'beam_performance',
+input_table  : 'bqio_read_10GB',
+publish_to_big_query : true,
+metrics_dataset  : 'beam_performance',
+metrics_table: 'bqio_read_10GB_results',
+input_options: '\'{"num_records": 10485760,' +
+'"key_size": 1,' +
+'"value_size": 1024}\'',
+autoscaling_algorithm: 'NONE',  // Disable autoscale the 
worker pool.
+]
+]
+
+def bqio_write_test = [
+title: 'BigQueryIO Write Performance Test Python Batch 10 GB',
+itClass  : 
'apache_beam.io.gcp.bigquery_write_perf_test:BigQueryWritePerfTest.test',
+runner   : CommonTestProperties.Runner.DATAFLOW,
+jobProperties: [
+job_name : 
'performance-tests-bqio-write-python-batch-10gb' + now,
+project  : 'apache-beam-testing',
+temp_location: 
'gs://temp-storage-for-perf-tests/loadtests',
+output_dataset   : 'beam_performance',
+output_table : 'bqio_write_10GB',
+publish_to_big_query : true,
+metrics_dataset  : 'beam_performance',
+metrics_table: 'bqio_write_10GB_results',
+input_options: '\'{"num_records": 10485760,' +
 
 Review comment:
   Consider starting a new line after `{` for easier readability here and below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269552)
Time Spent: 6h 50m  (was: 6h 40m)

> Create Jenkins jobs for BQ performance tests
> 
>
> Key: BEAM-7535
> URL: https://issues.apache.org/jira/browse/BEAM-7535
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=269554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269554
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:35
Start Date: 28/Jun/19 21:35
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8902: [BEAM-7389] Add 
Python snippet for Map transform
URL: https://github.com/apache/beam/pull/8902#issuecomment-506884353
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269554)
Time Spent: 18h 40m  (was: 18.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 18h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269546=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269546
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:28
Start Date: 28/Jun/19 21:28
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506882670
 
 
   Seems like there's a conflict now. Can you please resolve it ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269546)
Time Spent: 7h  (was: 6h 50m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 

[jira] [Work logged] (BEAM-7326) Document that Beam BigQuery IO expects users to pass base64-encoded bytes, and BQ IO serves base64-encoded bytes to the user.

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7326?focusedWorklogId=269542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269542
 ]

ASF GitHub Bot logged work on BEAM-7326:


Author: ASF GitHub Bot
Created on: 28/Jun/19 21:27
Start Date: 28/Jun/19 21:27
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #8873: [BEAM-7326] add 
documentation bigquery data types
URL: https://github.com/apache/beam/pull/8873#issuecomment-506882319
 
 
   Thanks, @Juta and @chamikaramj !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269542)
Time Spent: 3h 50m  (was: 3h 40m)

> Document that Beam BigQuery IO expects users to pass base64-encoded bytes, 
> and BQ IO serves base64-encoded bytes to the user.
> -
>
> Key: BEAM-7326
> URL: https://issues.apache.org/jira/browse/BEAM-7326
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp
>Reporter: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache 
> Beam BigQuery IO connector.
> Current implementation of BigQuery connector in Java and Python SDKs expects 
> that users base64-encode bytes before passing them to BigQuery IO, see 
> discussion on dev: [1] 
> This needs to be reflected in public documentation, see [2-4]
> cc: [~juta] [~chamikara] [~pabloem] 
> cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be 
> done for Go SDK and/or Beam SQL.
> [1] 
> https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3] 
> https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html
> [4] 
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7628) Retry createJob requests in Dataflow Runner for retriable errors.

2019-06-28 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-7628:
--
Priority: Major  (was: Minor)

> Retry createJob requests in Dataflow Runner for retriable errors.
> -
>
> Key: BEAM-7628
> URL: https://issues.apache.org/jira/browse/BEAM-7628
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> When Dataflow Runner is sending a job for remote execution, such requests in 
> rare cases might fail with retriable errors. Dataflow Runner could recognize 
> a class of retriable errors and attempt to resubmit the job again when such 
> errors are encountered. Sample retriable error encountered by Beam Java SDK: 
> ```
> java.lang.RuntimeException: Failed to create a workflow job: The operation 
> was cancelled.
> 11:32:14  at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:869)
> 11:32:14  at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:178)
> 11:32:14  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
> 11:32:14  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
> ...
> 11:32:14 Caused by: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 499 Client 
> Closed Request
> 11:32:14 {
> 11:32:14   "code" : 499,
> 11:32:14   "errors" : [ {
> 11:32:14 "domain" : "global",
> 11:32:14 "message" : "The operation was cancelled.",
> 11:32:14 "reason" : "backendError"
> 11:32:14   } ],
> 11:32:14   "message" : "The operation was cancelled.",
> 11:32:14   "status" : "CANCELLED"
> 11:32:14 }
> 11:32:14  at 
> com.google.api.client.googleapis.json.GoogleJsonResponseException.from(GoogleJsonResponseException.java:146)
> 11:32:14  at 
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:113)
> 11:32:14  at 
> com.google.api.client.googleapis.services.json.AbstractGoogleJsonClientRequest.newExceptionOnError(AbstractGoogleJsonClientRequest.java:40)
> 11:32:14  at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest$1.interceptResponse(AbstractGoogleClientRequest.java:321)
> 11:32:14  at 
> com.google.api.client.http.HttpRequest.execute(HttpRequest.java:1067)
> 11:32:14  at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
> 11:32:14  at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
> 11:32:14  at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:469)
> 11:32:14  at 
> org.apache.beam.runners.dataflow.DataflowClient.createJob(DataflowClient.java:61)
> 11:32:14  at 
> org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:855)
> 11:32:14  ... 41 more'
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269522
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:50
Start Date: 28/Jun/19 20:50
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298743718
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisIOIT.java
 ##
 @@ -41,85 +42,106 @@
 import org.junit.runners.JUnit4;
 
 /**
- * Integration test, that writes and reads data to and from real Kinesis. You 
need to provide all
- * {@link KinesisTestOptions} in order to run this.
+ * Integration test, that writes and reads data to and from real Kinesis. You 
need to provide {@link
+ * KinesisTestOptions} in order to run this.
  */
 @RunWith(JUnit4.class)
 public class KinesisIOIT implements Serializable {
-  public static final int NUM_RECORDS = 1000;
-  public static final int NUM_SHARDS = 2;
+  private static int numberOfShards;
+  private static int numberOfRows;
 
-  @Rule public final transient TestPipeline p = TestPipeline.create();
-  @Rule public final transient TestPipeline p2 = TestPipeline.create();
+  @Rule public TestPipeline pipelineWrite = TestPipeline.create();
+  @Rule public TestPipeline pipelineRead = TestPipeline.create();
 
   private static KinesisTestOptions options;
+  private static final Instant now = Instant.now();
 
   @BeforeClass
   public static void setup() {
 PipelineOptionsFactory.register(KinesisTestOptions.class);
 options = 
TestPipeline.testingPipelineOptions().as(KinesisTestOptions.class);
+numberOfShards = options.getNumberOfShards();
+numberOfRows = options.getNumberOfRecords();
   }
 
+  /** Test which write and then read data for a Kinesis stream. */
   @Test
-  public void testWriteThenRead() throws Exception {
-Instant now = Instant.now();
-List inputData = prepareData();
+  public void testWriteThenRead() {
+runWrite();
+runRead();
+  }
 
-// Write data into stream
-p.apply(Create.of(inputData))
+  /** Write test dataset into Kinesis stream. */
+  private void runWrite() {
+pipelineWrite
+.apply("Generate Sequence", GenerateSequence.from(0).to((long) 
numberOfRows))
+.apply("Prepare TestRows", ParDo.of(new 
TestRow.DeterministicallyConstructTestRowFn()))
+.apply("Prepare Kinesis input records", ParDo.of(new ConvertToBytes()))
 .apply(
+"Write to Kinesis",
 KinesisIO.write()
 .withStreamName(options.getAwsKinesisStream())
 .withPartitioner(new RandomPartitioner())
 .withAWSClientsProvider(
 options.getAwsAccessKey(),
 options.getAwsSecretKey(),
 Regions.fromName(options.getAwsKinesisRegion(;
-p.run().waitUntilFinish();
-
-// Read new data from stream that was just written before
-PCollection output =
-p2.apply(
-KinesisIO.read()
-.withStreamName(options.getAwsKinesisStream())
-.withAWSClientsProvider(
-options.getAwsAccessKey(),
-options.getAwsSecretKey(),
-Regions.fromName(options.getAwsKinesisRegion()))
-.withMaxNumRecords(inputData.size())
-// to prevent endless running in case of error
-.withMaxReadTime(Duration.standardMinutes(5))
-
.withInitialPositionInStream(InitialPositionInStream.AT_TIMESTAMP)
-.withInitialTimestampInStream(now)
-.withRequestRecordsLimit(1000))
-.apply(
-ParDo.of(
-new DoFn() {
-
-  @ProcessElement
-  public void processElement(ProcessContext c) {
-KinesisRecord record = c.element();
-byte[] data = record.getData().array();
-c.output(data);
-  }
-}));
-PAssert.that(output).containsInAnyOrder(inputData);
-p2.run().waitUntilFinish();
+
+pipelineWrite.run().waitUntilFinish();
+  }
+
+  /** Read test dataset from Kinesis stream. */
+  private void runRead() {
+PCollection output =
+pipelineRead.apply(
+KinesisIO.read()
+.withStreamName(options.getAwsKinesisStream())
+.withAWSClientsProvider(
+options.getAwsAccessKey(),
+options.getAwsSecretKey(),
+Regions.fromName(options.getAwsKinesisRegion()))
+ 

[jira] [Work logged] (BEAM-7547) StreamingDataflowWorker can observe inconsistent cache for stale work items

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7547?focusedWorklogId=269521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269521
 ]

ASF GitHub Bot logged work on BEAM-7547:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:44
Start Date: 28/Jun/19 20:44
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on pull request #8842: [BEAM-7547] 
Avoid WindmillStateCache cache hits for stale work.
URL: https://github.com/apache/beam/pull/8842
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269521)
Time Spent: 2h 40m  (was: 2.5h)

> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---
>
> Key: BEAM-7547
> URL: https://issues.apache.org/jira/browse/BEAM-7547
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using 
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it 
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
> backend.  It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C. 
>  It uses the cached state and causes possible user consistency failures 
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the 
> commit of the retried work item using the inconsistent cache will fail. 
> However it may cause failures in user logic for example if they keep the set 
> of all seen items in state and throw an exception on duplicates which should 
> have been removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269520
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:41
Start Date: 28/Jun/19 20:41
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8955: [BEAM-7589] Use only 
one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#issuecomment-506870246
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269520)
Time Spent: 4h 50m  (was: 4h 40m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269519=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269519
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:40
Start Date: 28/Jun/19 20:40
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8955: [BEAM-7589] Use only 
one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#issuecomment-506870246
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269519)
Time Spent: 4h 40m  (was: 4.5h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6740) Combine.Globally translation is never called

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6740?focusedWorklogId=269516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269516
 ]

ASF GitHub Bot logged work on BEAM-6740:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:39
Start Date: 28/Jun/19 20:39
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8964: [BEAM-6740] Add 
PTransformTranslator for Combine.Globally
URL: https://github.com/apache/beam/pull/8964#issuecomment-506801786
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269516)
Time Spent: 1h 40m  (was: 1.5h)

> Combine.Globally translation is never called
> 
>
> Key: BEAM-6740
> URL: https://issues.apache.org/jira/browse/BEAM-6740
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Ismaël Mejía
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Beam translates Combine.Globally as a composite transform composed of:
>  * Map that assigns Void keys
>  * Combine.PerKey
> on spark: As Combine.Perkey uses a spark GBK inside it, the runner adds its 
> own translation of Combine.Globally to avoid less performant GBK. This 
> translation should be called in place of entering the composite transform 
> translation.A pipeline like this: 
> {code:java}
> PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 
> 9, 10));
> input.apply(
>  Combine.globally(new IntegerCombineFn()));
> {code}
> {code:java}
>   private static class IntegerCombineFn extends Combine.CombineFn Integer, Integer> {
> @Override
> public Integer createAccumulator() {
>   return 0;
> }
> @Override
> public Integer addInput(Integer accumulator, Integer input) {
>   return accumulator + input;
> }
> @Override
> public Integer mergeAccumulators(Iterable accumulators) {
>   Integer result = 0;
>   for (Integer value : accumulators) {
> result += value;
>   }
>   return result;
> }
> @Override
> public Integer extractOutput(Integer accumulator) {
>   return accumulator;
> }
>   }
> {code}
> is translated as the above composite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7649) Match Python 3 warning messages in setup.py and __init.py__

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7649?focusedWorklogId=269513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269513
 ]

ASF GitHub Bot logged work on BEAM-7649:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:36
Start Date: 28/Jun/19 20:36
Worklog Time Spent: 10m 
  Work Description: akedin commented on pull request #8958: [BEAM-7649] 
Cherrypick PR-8956 onto 2.14.0 release branch
URL: https://github.com/apache/beam/pull/8958
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269513)
Time Spent: 0.5h  (was: 20m)

> Match Python 3 warning messages in setup.py and __init.py__
> ---
>
> Key: BEAM-7649
> URL: https://issues.apache.org/jira/browse/BEAM-7649
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7590?focusedWorklogId=269512=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269512
 ]

ASF GitHub Bot logged work on BEAM-7590:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:35
Start Date: 28/Jun/19 20:35
Worklog Time Spent: 10m 
  Work Description: riazela commented on issue #8928: [DO NOT MERGE] 
[BEAM-7590] Converting JDBC Pipeline Options Map to PipelineOptions.
URL: https://github.com/apache/beam/pull/8928#issuecomment-506868970
 
 
   run java postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269512)
Time Spent: 4h 10m  (was: 4h)

> Convert PipelineOptionsMap to PipelineOption
> 
>
> Key: BEAM-7590
> URL: https://issues.apache.org/jira/browse/BEAM-7590
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Currently, BeamCalciteTable keeps a map version of PipelineOptions and that 
> map version is used in JDBCConnection and RelNodes as well. This map is empty 
> when the pipeline is constructed from SQLTransform and it will have the 
> parameters passed from JDBC Client when the pipeline is started by JDBC path. 
> Since for Row-Count estimation we need to use PipelineOptions (or its 
> sub-classes) and we cannot convert a map that is created from a 
> pipelineOptions Subclasses back to PipelineOptions, it is better to keep 
> PipelineOptions object itself.
> Another thing that will be changed as a result is set command. Currently, if 
> in JDBC we use Set Command for a pipeline option, it will only change that 
> option in the map. This means even if the option is incorrect, it does not 
> throw exception until it creates the actual Pipeline Options. However, if we 
> are keeping the PipelineOptions class itself, then wee need to actually set 
> the passed parameters (using reflection) which will throw exception at the 
> time of setting them. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269508=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269508
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:23
Start Date: 28/Jun/19 20:23
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506865522
 
 
   can we merge it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269508)
Time Spent: 6h 50m  (was: 6h 40m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269501=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269501
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:08
Start Date: 28/Jun/19 20:08
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8959: [BEAM-7548] 
Cherry pick - fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8959#issuecomment-506861853
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269501)
Time Spent: 6h 40m  (was: 6.5h)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269500=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269500
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 20:08
Start Date: 28/Jun/19 20:08
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8959: [BEAM-7548] 
Cherry pick - fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8959#issuecomment-506861808
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269500)
Time Spent: 6.5h  (was: 6h 20m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response 

[jira] [Work logged] (BEAM-6675) The JdbcIO sink should accept schemas

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6675?focusedWorklogId=269457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269457
 ]

ASF GitHub Bot logged work on BEAM-6675:


Author: ASF GitHub Bot
Created on: 28/Jun/19 18:18
Start Date: 28/Jun/19 18:18
Worklog Time Spent: 10m 
  Work Description: JawadHyder commented on issue #8962: [BEAM-6675] 
Generate JDBC statement and preparedStatementSetter automatically when schema 
is available
URL: https://github.com/apache/beam/pull/8962#issuecomment-506831150
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269457)
Time Spent: 40m  (was: 0.5h)

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7424) Retry HTTP 429 errors from GCS w/ exponential backoff when reading data

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7424?focusedWorklogId=269454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269454
 ]

ASF GitHub Bot logged work on BEAM-7424:


Author: ASF GitHub Bot
Created on: 28/Jun/19 18:04
Start Date: 28/Jun/19 18:04
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8933: [BEAM-7424] Retry 
HTTP 429 errors from GCS
URL: https://github.com/apache/beam/pull/8933#issuecomment-506826576
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269454)
Time Spent: 3h 10m  (was: 3h)

> Retry HTTP 429 errors from GCS w/ exponential backoff when reading data
> ---
>
> Key: BEAM-7424
> URL: https://issues.apache.org/jira/browse/BEAM-7424
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp, sdk-py-core
>Reporter: Chamikara Jayalath
>Assignee: Heejong Lee
>Priority: Blocker
> Fix For: 2.14.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This has to be done for both Java and Python SDKs.
> Seems like Java SDK already retries 429 errors w/o backoff (please verify): 
> [https://github.com/apache/beam/blob/master/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/util/RetryHttpRequestInitializer.java#L185]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=269451=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269451
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 28/Jun/19 18:01
Start Date: 28/Jun/19 18:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8895: 
[BEAM-7586] Add Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269451)
Time Spent: 8h 50m  (was: 8h 40m)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7586) Add Integration Test for MongoDbIO in python sdk

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7586?focusedWorklogId=269450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269450
 ]

ASF GitHub Bot logged work on BEAM-7586:


Author: ASF GitHub Bot
Created on: 28/Jun/19 18:01
Start Date: 28/Jun/19 18:01
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8895: [BEAM-7586] Add 
Integration test for python mongodb io
URL: https://github.com/apache/beam/pull/8895#issuecomment-506825817
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269450)
Time Spent: 8h 40m  (was: 8.5h)

> Add Integration Test for MongoDbIO in python sdk 
> -
>
> Key: BEAM-7586
> URL: https://issues.apache.org/jira/browse/BEAM-7586
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Yichi Zhang
>Assignee: Yichi Zhang
>Priority: Major
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269448
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:56
Start Date: 28/Jun/19 17:56
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506824016
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269448)
Time Spent: 6h 20m  (was: 6h 10m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269447
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:55
Start Date: 28/Jun/19 17:55
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8960: [BEAM-7548] Fix 
flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#issuecomment-506823988
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269447)
Time Spent: 6h 10m  (was: 6h)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269444=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269444
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:53
Start Date: 28/Jun/19 17:53
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #8960: 
[BEAM-7548] Fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#discussion_r298692679
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats_test.py
 ##
 @@ -156,17 +156,17 @@ def test_get_sample_size_from_est_error(self):
 assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
 assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 
4
 
-  @unittest.skipIf(sys.version_info < (3, 0, 0),
-   'Skip with py27 because hash function is not good enough.')
-  @retry(reraise=True, stop=stop_after_attempt(3))
+  @unittest.skipIf(sys.version_info < (4, 0, 0),
 
 Review comment:
   fixed it, PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269444)
Time Spent: 6h  (was: 5h 50m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 

[jira] [Work logged] (BEAM-7326) Document that Beam BigQuery IO expects users to pass base64-encoded bytes, and BQ IO serves base64-encoded bytes to the user.

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7326?focusedWorklogId=269439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269439
 ]

ASF GitHub Bot logged work on BEAM-7326:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:49
Start Date: 28/Jun/19 17:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8873: [BEAM-7326] add 
documentation bigquery data types
URL: https://github.com/apache/beam/pull/8873#issuecomment-506821594
 
 
   Thanks. This is great. LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269439)
Time Spent: 3.5h  (was: 3h 20m)

> Document that Beam BigQuery IO expects users to pass base64-encoded bytes, 
> and BQ IO serves base64-encoded bytes to the user.
> -
>
> Key: BEAM-7326
> URL: https://issues.apache.org/jira/browse/BEAM-7326
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp
>Reporter: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache 
> Beam BigQuery IO connector.
> Current implementation of BigQuery connector in Java and Python SDKs expects 
> that users base64-encode bytes before passing them to BigQuery IO, see 
> discussion on dev: [1] 
> This needs to be reflected in public documentation, see [2-4]
> cc: [~juta] [~chamikara] [~pabloem] 
> cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be 
> done for Go SDK and/or Beam SQL.
> [1] 
> https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3] 
> https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html
> [4] 
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7326) Document that Beam BigQuery IO expects users to pass base64-encoded bytes, and BQ IO serves base64-encoded bytes to the user.

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7326?focusedWorklogId=269440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269440
 ]

ASF GitHub Bot logged work on BEAM-7326:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:49
Start Date: 28/Jun/19 17:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8873: 
[BEAM-7326] add documentation bigquery data types
URL: https://github.com/apache/beam/pull/8873
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269440)
Time Spent: 3h 40m  (was: 3.5h)

> Document that Beam BigQuery IO expects users to pass base64-encoded bytes, 
> and BQ IO serves base64-encoded bytes to the user.
> -
>
> Key: BEAM-7326
> URL: https://issues.apache.org/jira/browse/BEAM-7326
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp, io-python-gcp
>Reporter: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> BYTES is one of the Datatypes supported by Google Cloud BigQuery, and Apache 
> Beam BigQuery IO connector.
> Current implementation of BigQuery connector in Java and Python SDKs expects 
> that users base64-encode bytes before passing them to BigQuery IO, see 
> discussion on dev: [1] 
> This needs to be reflected in public documentation, see [2-4]
> cc: [~juta] [~chamikara] [~pabloem] 
> cc: [~lostluck] [~kedin] FYI and to advise whether similar action needs to be 
> done for Go SDK and/or Beam SQL.
> [1] 
> https://lists.apache.org/thread.html/f35c836887014e059527ed1a806e730321e2f9726164a3030575f455@%3Cdev.beam.apache.org%3E
> [2] https://beam.apache.org/documentation/io/built-in/google-bigquery/
> [3] 
> https://beam.apache.org/releases/pydoc/2.12.0/apache_beam.io.gcp.bigquery.html
> [4] 
> https://beam.apache.org/releases/javadoc/2.12.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7654) Enable skipped tests with ApproximateUnique when hash() is improved.

2019-06-28 Thread Hannah Jiang (JIRA)
Hannah Jiang created BEAM-7654:
--

 Summary: Enable skipped tests with ApproximateUnique when hash() 
is improved.
 Key: BEAM-7654
 URL: https://issues.apache.org/jira/browse/BEAM-7654
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Hannah Jiang


test_approximate_unique_global_by_sample_size

test_approximate_unique_global_by_error

 

Above two tests are skipped because hash function is not good enough to pass 
tests everytime. Enable it when hash() is good with newer Python version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269436
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:42
Start Date: 28/Jun/19 17:42
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8960: 
[BEAM-7548] Fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8960#discussion_r298688907
 
 

 ##
 File path: sdks/python/apache_beam/transforms/stats_test.py
 ##
 @@ -156,17 +156,17 @@ def test_get_sample_size_from_est_error(self):
 assert beam.ApproximateUnique._get_sample_size_from_est_error(0.05) == 1600
 assert beam.ApproximateUnique._get_sample_size_from_est_error(0.01) == 
4
 
-  @unittest.skipIf(sys.version_info < (3, 0, 0),
-   'Skip with py27 because hash function is not good enough.')
-  @retry(reraise=True, stop=stop_after_attempt(3))
+  @unittest.skipIf(sys.version_info < (4, 0, 0),
 
 Review comment:
   Please create a JIRA to re-enable the tests and add it here with a TODO.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269436)
Time Spent: 5h 50m  (was: 5h 40m)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in 

[jira] [Work logged] (BEAM-7547) StreamingDataflowWorker can observe inconsistent cache for stale work items

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7547?focusedWorklogId=269422=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269422
 ]

ASF GitHub Bot logged work on BEAM-7547:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:29
Start Date: 28/Jun/19 17:29
Worklog Time Spent: 10m 
  Work Description: dpmills commented on issue #8842: [BEAM-7547] Avoid 
WindmillStateCache cache hits for stale work.
URL: https://github.com/apache/beam/pull/8842#issuecomment-506815477
 
 
   LGTM
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269422)
Time Spent: 2.5h  (was: 2h 20m)

> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---
>
> Key: BEAM-7547
> URL: https://issues.apache.org/jira/browse/BEAM-7547
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using 
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it 
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
> backend.  It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C. 
>  It uses the cached state and causes possible user consistency failures 
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the 
> commit of the retried work item using the inconsistent cache will fail. 
> However it may cause failures in user logic for example if they keep the set 
> of all seen items in state and throw an exception on duplicates which should 
> have been removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (BEAM-7643) Nearly all PostCommits failing due to Google Cloud issues

2019-06-28 Thread Mark Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Liu closed BEAM-7643.
--
   Resolution: Resolved
Fix Version/s: Not applicable

> Nearly all PostCommits failing due to Google Cloud issues
> -
>
> Key: BEAM-7643
> URL: https://issues.apache.org/jira/browse/BEAM-7643
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Daniel Oliveira
>Assignee: Mark Liu
>Priority: Major
>  Labels: currently-failing
> Fix For: Not applicable
>
>
> Multiple Beam PostCommits seem to be failing due to a variety of errors, such 
> as "503 Service unavailable",  "429 Too Many Requests", or "404 Not Found". 
> It's hard to add a complete description of the issue here since there are too 
> many tests failing. Also made worse because some other flakes and issues seem 
> to be masking these failures. But I'll try to find a few examples:
> [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/3661/]
>  
> {noformat}
> 11:17:01 BadStatusCodeError: HttpError accessing 
> :
>  response: <{'status': '429', 'content-length': '598', 'x-xss-protection': 
> '0', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 
> 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 
> 'gzip', 'cache-control': 'private', 'date': 'Wed, 26 Jun 2019 18:17:01 GMT', 
> 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; 
> charset=UTF-8'}>, content <{
> 11:17:01   "error": {
> 11:17:01 "code": 429,
> 11:17:01 "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> 11:17:01 "status": "RESOURCE_EXHAUSTED",
> 11:17:01 "details": [
> 11:17:01   {
> 11:17:01 "@type": "type.googleapis.com/google.rpc.Help",
> 11:17:01 "links": [
> 11:17:01   {
> 11:17:01 "description": "Google developer console API key",
> 11:17:01 "url": 
> "https://console.developers.google.com/project/844138762903/apiui/credential;
> 11:17:01   }
> 11:17:01 ]
> 11:17:01   }
> 11:17:01 ]
> 11:17:01   }
> 11:17:01 }{noformat}
>  
>  
> [https://builds.apache.org/job/beam_PostCommit_Python_Verify/8598/]
>  
> {noformat}
> BadStatusCodeError: HttpError accessing 
> :
>  response: <{'status': '429', 'content-length': '598', 'x-xss-protection': 
> '0', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 
> 'vary': 'Origin, X-Origin, Referer', 'se
> rver': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 
> 'date': 'Wed, 26 Jun 2019 18:10:42 GMT', 'x-frame-options': 'SAMEORIGIN', 
> 'content-type': 'application/json; charset=UTF-8'}>, content <{
> "error": {
> "code": 429,
> "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> "status": "RESOURCE_EXHAUSTED",
> "details": [
> {
> "@type": "type.googleapis.com/google.rpc.Help",
> "links": [
> {
> "description": "Google developer console API key",
> "url": 
> "https://console.developers.google.com/project/844138762903/apiui/credential;
> }
> {noformat}
>  
> [https://builds.apache.org/job/beam_PostCommit_Java/3617/]
> {noformat}
> java.lang.RuntimeException : Failed to create a workflow job: The service is 
> currently unavailable.
> Caused by: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException:
> 503 Service Unavailable
> {
>   "code" : 503,
>   "errors" : [ {
> "domain" : "global",
> "message" : "The service is currently unavailable.",
> "reason" : "backendError"
>   } ],
>   "message" : "The service is currently unavailable.",
>   "status" : "UNAVAILABLE"
> }
> Dataflow SDK version: 2.15.0-SNAPSHOT{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (BEAM-7643) Nearly all PostCommits failing due to Google Cloud issues

2019-06-28 Thread Mark Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875083#comment-16875083
 ] 

Mark Liu commented on BEAM-7643:


Dataflow service is back to normal and we saw builds are passing on Jenkins. 
Will close this ticket. 

> Nearly all PostCommits failing due to Google Cloud issues
> -
>
> Key: BEAM-7643
> URL: https://issues.apache.org/jira/browse/BEAM-7643
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Daniel Oliveira
>Assignee: Mark Liu
>Priority: Major
>  Labels: currently-failing
>
> Multiple Beam PostCommits seem to be failing due to a variety of errors, such 
> as "503 Service unavailable",  "429 Too Many Requests", or "404 Not Found". 
> It's hard to add a complete description of the issue here since there are too 
> many tests failing. Also made worse because some other flakes and issues seem 
> to be masking these failures. But I'll try to find a few examples:
> [https://builds.apache.org/job/beam_PostCommit_Py_ValCont/3661/]
>  
> {noformat}
> 11:17:01 BadStatusCodeError: HttpError accessing 
> :
>  response: <{'status': '429', 'content-length': '598', 'x-xss-protection': 
> '0', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 
> 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 
> 'gzip', 'cache-control': 'private', 'date': 'Wed, 26 Jun 2019 18:17:01 GMT', 
> 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; 
> charset=UTF-8'}>, content <{
> 11:17:01   "error": {
> 11:17:01 "code": 429,
> 11:17:01 "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> 11:17:01 "status": "RESOURCE_EXHAUSTED",
> 11:17:01 "details": [
> 11:17:01   {
> 11:17:01 "@type": "type.googleapis.com/google.rpc.Help",
> 11:17:01 "links": [
> 11:17:01   {
> 11:17:01 "description": "Google developer console API key",
> 11:17:01 "url": 
> "https://console.developers.google.com/project/844138762903/apiui/credential;
> 11:17:01   }
> 11:17:01 ]
> 11:17:01   }
> 11:17:01 ]
> 11:17:01   }
> 11:17:01 }{noformat}
>  
>  
> [https://builds.apache.org/job/beam_PostCommit_Python_Verify/8598/]
>  
> {noformat}
> BadStatusCodeError: HttpError accessing 
> :
>  response: <{'status': '429', 'content-length': '598', 'x-xss-protection': 
> '0', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 
> 'vary': 'Origin, X-Origin, Referer', 'se
> rver': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 
> 'date': 'Wed, 26 Jun 2019 18:10:42 GMT', 'x-frame-options': 'SAMEORIGIN', 
> 'content-type': 'application/json; charset=UTF-8'}>, content <{
> "error": {
> "code": 429,
> "message": "Quota exceeded for quota metric 
> 'dataflow.googleapis.com/create_requests' and limit 
> 'CreateRequestsPerMinutePerUser' of service 'dataflow.googleapis.com' for 
> consumer 'project_number:844138762903'.",
> "status": "RESOURCE_EXHAUSTED",
> "details": [
> {
> "@type": "type.googleapis.com/google.rpc.Help",
> "links": [
> {
> "description": "Google developer console API key",
> "url": 
> "https://console.developers.google.com/project/844138762903/apiui/credential;
> }
> {noformat}
>  
> [https://builds.apache.org/job/beam_PostCommit_Java/3617/]
> {noformat}
> java.lang.RuntimeException : Failed to create a workflow job: The service is 
> currently unavailable.
> Caused by: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException:
> 503 Service Unavailable
> {
>   "code" : 503,
>   "errors" : [ {
> "domain" : "global",
> "message" : "The service is currently unavailable.",
> "reason" : "backendError"
>   } ],
>   "message" : "The service is currently unavailable.",
>   "status" : "UNAVAILABLE"
> }
> Dataflow SDK version: 2.15.0-SNAPSHOT{noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=269414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269414
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:25
Start Date: 28/Jun/19 17:25
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8872: [BEAM-3645] add 
ParallelBundleManager
URL: https://github.com/apache/beam/pull/8872#issuecomment-506579569
 
 
   Run Python PreCommit - tests got stuck.
   https://builds.apache.org/job/beam_PreCommit_Python_Phrase/584/console
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269414)
Time Spent: 18.5h  (was: 18h 20m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 18.5h
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-3645) Support multi-process execution on the FnApiRunner

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3645?focusedWorklogId=269409=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269409
 ]

ASF GitHub Bot logged work on BEAM-3645:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:24
Start Date: 28/Jun/19 17:24
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8872: [BEAM-3645] add 
ParallelBundleManager
URL: https://github.com/apache/beam/pull/8872#issuecomment-506548287
 
 
   Run Python PreCommit - tests running on dataflow were not able to run 
because of dataflow issue. No failures with other tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269409)
Time Spent: 18h 20m  (was: 18h 10m)

> Support multi-process execution on the FnApiRunner
> --
>
> Key: BEAM-3645
> URL: https://issues.apache.org/jira/browse/BEAM-3645
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Charles Chen
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 18h 20m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance 
> gain over the previous DirectRunner.  We can do even better in multi-core 
> environments by supporting multi-process execution in the FnApiRunner, to 
> scale past Python GIL limitations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7548) test_approximate_unique_global_by_error is flaky

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7548?focusedWorklogId=269398=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269398
 ]

ASF GitHub Bot logged work on BEAM-7548:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:15
Start Date: 28/Jun/19 17:15
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #8959: [BEAM-7548] 
Cherry pick - fix flaky tests for ApproximateUnique
URL: https://github.com/apache/beam/pull/8959#issuecomment-506811071
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269398)
Time Spent: 5h 40m  (was: 5.5h)

> test_approximate_unique_global_by_error is flaky
> 
>
> Key: BEAM-7548
> URL: https://issues.apache.org/jira/browse/BEAM-7548
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, test-failures
>Reporter: Valentyn Tymofieiev
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> The error happened on Jenkins in Python 3.5 suite, which currently uses 
> Python 3.5.2 interpreter:
> {noformat}
> 11:57:47 
> ==
> 11:57:47 ERROR: test_approximate_unique_global_by_error 
> (apache_beam.transforms.stats_test.ApproximateUniqueTest)
> 11:57:47 
> --
> 11:57:47 Traceback (most recent call last):
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/transforms/stats_test.py",
>  line 236, in test_approximate_unique_global_by_error
> 11:57:47 pipeline.run()
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/testing/test_pipeline.py",
>  line 107, in run
> 11:57:47 else test_runner_api))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 406, in run
> 11:57:47 self._options).run(False)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/pipeline.py",
>  line 419, in run
> 11:57:47 return self.runner.run_pipeline(self, self._options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/direct/direct_runner.py",
>  line 128, in run_pipeline
> 11:57:47 return runner.run_pipeline(pipeline, options)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 289, in run_pipeline
> 11:57:47 default_environment=self._default_environment))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 293, in run_via_runner_api
> 11:57:47 return self.run_stages(*self.create_stages(pipeline_proto))
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 369, in run_stages
> 11:57:47 stage_context.safe_coders)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 531, in run_stage
> 11:57:47 data_input, data_output)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 1235, in process_bundle
> 11:57:47 result_future = 
> self._controller.control_handler.push(process_bundle)
> 11:57:47   File 
> "/home/jenkins/jenkins-slave/workspace/beam_PreCommit_Python_Commit/src/sdks/python/test-suites/tox/py35/build/srcs/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
>  line 851, in push
> 11:57:47 response = 

[jira] [Work logged] (BEAM-7547) StreamingDataflowWorker can observe inconsistent cache for stale work items

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7547?focusedWorklogId=269394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269394
 ]

ASF GitHub Bot logged work on BEAM-7547:


Author: ASF GitHub Bot
Created on: 28/Jun/19 17:09
Start Date: 28/Jun/19 17:09
Worklog Time Spent: 10m 
  Work Description: scwhittle commented on issue #8842: [BEAM-7547] Avoid 
WindmillStateCache cache hits for stale work.
URL: https://github.com/apache/beam/pull/8842#issuecomment-506808925
 
 
   @lukecwik PTAL the tests are passing now. The build issue was due to a 
recent commit which I hadn't merged but which apparently Jenkins merged
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269394)
Time Spent: 2h 20m  (was: 2h 10m)

> StreamingDataflowWorker can observe inconsistent cache for stale work items
> ---
>
> Key: BEAM-7547
> URL: https://issues.apache.org/jira/browse/BEAM-7547
> Project: Beam
>  Issue Type: Bug
>  Components: runner-dataflow
>Reporter: Sam Whittle
>Assignee: Sam Whittle
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> 1. Dataflow backend generates a work item with a cache token C.
> 2. StreamingDataflowWorker receives the work item and reads the state using 
> C, it either hits the cache or performs a read.
> 3. Dataflow backend sends a retry of the work item (possibly because it 
> thinks original work item never reached the StreamingDataflowWorker).
> 4. StreamingDataflowWorker commits the work item and gets ack from dataflow 
> backend.  It caches the state for the key using C.
> 5. StreamingDataflowWorker receives the retried work item with cache token C. 
>  It uses the cached state and causes possible user consistency failures 
> because the cache view is of after the work item completed processing.
> Note that this will not cause corrupted Dataflow persistent state because the 
> commit of the retried work item using the inconsistent cache will fail. 
> However it may cause failures in user logic for example if they keep the set 
> of all seen items in state and throw an exception on duplicates which should 
> have been removed by an upstream stage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7590?focusedWorklogId=269388=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269388
 ]

ASF GitHub Bot logged work on BEAM-7590:


Author: ASF GitHub Bot
Created on: 28/Jun/19 16:54
Start Date: 28/Jun/19 16:54
Worklog Time Spent: 10m 
  Work Description: riazela commented on pull request #8928: [DO NOT MERGE] 
[BEAM-7590] Converting JDBC Pipeline Options Map to PipelineOptions.
URL: https://github.com/apache/beam/pull/8928#discussion_r298671093
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsReflectionSetter.java
 ##
 @@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import java.beans.IntrospectionException;
+import java.beans.Introspector;
+import java.beans.PropertyDescriptor;
+import java.lang.reflect.InvocationTargetException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.beam.sdk.util.StringUtils;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Sets;
+
+/** This is a utility class to set and remove options individually. */
+public class PipelineOptionsReflectionSetter {
+  private static final boolean STRICT_PARSING = true;
+
+  @SuppressWarnings("unchecked")
+  public static Class getPipelineOptionsInterface(
+  PipelineOptions options) {
+if (options.getClass().getInterfaces().length != 1) {
 
 Review comment:
   If I call options.getClass() it will not return DataflowPipelineOptions, it 
will return a proxy class. For instance BigQueryOptions also extends multiple 
interfaces (Similar to DataflowPipelineOptions); however, if I run the 
following code:
   
   BigQueryOptions options = 
PipelineOptionsFactory.as(BigQueryOptions.class);
   System.out.println(options.getClass());
   System.out.println(options.getClass().isInterface());
   
System.out.println(PipelineOptionsReflectionSetter.getPipelineOptionsInterface(options));
   
   There will be no exception and the output will be:
   class com.sun.proxy.$Proxy25
   false
   interface org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions
   
   The reason that I need this method is when the user tries to use reset 
command, I should set it to its default value. In order to do that, first I try 
to see what type of pipelineOptions this object is implementing and then using 
PipelineOptionsFactory I construct an instance of that options and get its 
default value. 
   
   Currently with our use cases and the expected behavior of 
PipelineOptionsFactory, this should not fail in any use case. Because the 
method .as() in PipelineOptions and PipelineOptionsFactory returns a proxy 
object that implements one interface. I think currently, if the user needs its 
own piplineOptions they need to declare an interface extending PipelineOptions 
and then use PipelineOptionsFactory to create an instance of it. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269388)
Time Spent: 4h  (was: 3h 50m)

> Convert PipelineOptionsMap to PipelineOption
> 
>
> Key: BEAM-7590
> URL: https://issues.apache.org/jira/browse/BEAM-7590
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza Samadianzakaria
>Priority: 

[jira] [Work logged] (BEAM-7590) Convert PipelineOptionsMap to PipelineOption

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7590?focusedWorklogId=269382=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269382
 ]

ASF GitHub Bot logged work on BEAM-7590:


Author: ASF GitHub Bot
Created on: 28/Jun/19 16:49
Start Date: 28/Jun/19 16:49
Worklog Time Spent: 10m 
  Work Description: riazela commented on pull request #8928: [DO NOT MERGE] 
[BEAM-7590] Converting JDBC Pipeline Options Map to PipelineOptions.
URL: https://github.com/apache/beam/pull/8928#discussion_r298671093
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/options/PipelineOptionsReflectionSetter.java
 ##
 @@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.options;
+
+import java.beans.IntrospectionException;
+import java.beans.Introspector;
+import java.beans.PropertyDescriptor;
+import java.lang.reflect.InvocationTargetException;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.beam.sdk.util.StringUtils;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ImmutableListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Iterables;
+import 
org.apache.beam.vendor.guava.v20_0.com.google.common.collect.ListMultimap;
+import org.apache.beam.vendor.guava.v20_0.com.google.common.collect.Sets;
+
+/** This is a utility class to set and remove options individually. */
+public class PipelineOptionsReflectionSetter {
+  private static final boolean STRICT_PARSING = true;
+
+  @SuppressWarnings("unchecked")
+  public static Class getPipelineOptionsInterface(
+  PipelineOptions options) {
+if (options.getClass().getInterfaces().length != 1) {
 
 Review comment:
   If I call options.getClass() it will not return DataflowPipelineOptions, it 
will return a proxy class. For instance BigQueryOptions also extends multiple 
interfaces (Similar to DataflowPipelineOptions); however, if I run the 
following code:
   
   BigQueryOptions options = 
PipelineOptionsFactory.as(BigQueryOptions.class);
   System.out.println(options.getClass());
   System.out.println(options.getClass().isInterface());
   
System.out.println(PipelineOptionsReflectionSetter.getPipelineOptionsInterface(options));
   
   There will be no exception and the output will be:
   class com.sun.proxy.$Proxy25
   false
   interface org.apache.beam.sdk.io.gcp.bigquery.BigQueryOptions
   
   The reason that I need this method is when the user tries to use reset 
command, I should set it to its default value. In order to do that, first I try 
to see what type of pipelineOptions this object is implementing and then using 
PipelineOptionsFactory I construct an instance of that options and get its 
default value. 
   
   Currently with our use cases and the expected behavior of 
PipelineOptionsFactory, this should not fail in any use case. Because the 
method .as() in PipelineOptions and PipelineOptionsFactory returns a proxy 
object that implements one interface. I think currently, if the user needs its 
own piplineOptions they need to declare an interface extending PipelineOptions 
and then use PipelineOptionsFactory to create an instance of it. 
   
   Hope this makes sense.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269382)
Time Spent: 3h 50m  (was: 3h 40m)

> Convert PipelineOptionsMap to PipelineOption
> 
>
> Key: BEAM-7590
> URL: https://issues.apache.org/jira/browse/BEAM-7590
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Alireza Samadianzakaria
>Assignee: Alireza 

[jira] [Work logged] (BEAM-6740) Combine.Globally translation is never called

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6740?focusedWorklogId=269378=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269378
 ]

ASF GitHub Bot logged work on BEAM-6740:


Author: ASF GitHub Bot
Created on: 28/Jun/19 16:47
Start Date: 28/Jun/19 16:47
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8964: [BEAM-6740] Add 
PTransformTranslator for Combine.Globally
URL: https://github.com/apache/beam/pull/8964#issuecomment-506801786
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269378)
Time Spent: 1.5h  (was: 1h 20m)

> Combine.Globally translation is never called
> 
>
> Key: BEAM-6740
> URL: https://issues.apache.org/jira/browse/BEAM-6740
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Ismaël Mejía
>Priority: Major
>  Labels: portability
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Beam translates Combine.Globally as a composite transform composed of:
>  * Map that assigns Void keys
>  * Combine.PerKey
> on spark: As Combine.Perkey uses a spark GBK inside it, the runner adds its 
> own translation of Combine.Globally to avoid less performant GBK. This 
> translation should be called in place of entering the composite transform 
> translation.A pipeline like this: 
> {code:java}
> PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 
> 9, 10));
> input.apply(
>  Combine.globally(new IntegerCombineFn()));
> {code}
> {code:java}
>   private static class IntegerCombineFn extends Combine.CombineFn Integer, Integer> {
> @Override
> public Integer createAccumulator() {
>   return 0;
> }
> @Override
> public Integer addInput(Integer accumulator, Integer input) {
>   return accumulator + input;
> }
> @Override
> public Integer mergeAccumulators(Iterable accumulators) {
>   Integer result = 0;
>   for (Integer value : accumulators) {
> result += value;
>   }
>   return result;
> }
> @Override
> public Integer extractOutput(Integer accumulator) {
>   return accumulator;
> }
>   }
> {code}
> is translated as the above composite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6740) Combine.Globally translation is never called

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6740?focusedWorklogId=269366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269366
 ]

ASF GitHub Bot logged work on BEAM-6740:


Author: ASF GitHub Bot
Created on: 28/Jun/19 16:16
Start Date: 28/Jun/19 16:16
Worklog Time Spent: 10m 
  Work Description: iemejia commented on issue #8964: [BEAM-6740] Add 
PTransformTranslator for Combine.Globally
URL: https://github.com/apache/beam/pull/8964#issuecomment-506791858
 
 
   Run Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269366)
Time Spent: 1h 20m  (was: 1h 10m)

> Combine.Globally translation is never called
> 
>
> Key: BEAM-6740
> URL: https://issues.apache.org/jira/browse/BEAM-6740
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Ismaël Mejía
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Beam translates Combine.Globally as a composite transform composed of:
>  * Map that assigns Void keys
>  * Combine.PerKey
> on spark: As Combine.Perkey uses a spark GBK inside it, the runner adds its 
> own translation of Combine.Globally to avoid less performant GBK. This 
> translation should be called in place of entering the composite transform 
> translation.A pipeline like this: 
> {code:java}
> PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 
> 9, 10));
> input.apply(
>  Combine.globally(new IntegerCombineFn()));
> {code}
> {code:java}
>   private static class IntegerCombineFn extends Combine.CombineFn Integer, Integer> {
> @Override
> public Integer createAccumulator() {
>   return 0;
> }
> @Override
> public Integer addInput(Integer accumulator, Integer input) {
>   return accumulator + input;
> }
> @Override
> public Integer mergeAccumulators(Iterable accumulators) {
>   Integer result = 0;
>   for (Integer value : accumulators) {
> result += value;
>   }
>   return result;
> }
> @Override
> public Integer extractOutput(Integer accumulator) {
>   return accumulator;
> }
>   }
> {code}
> is translated as the above composite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-6740) Combine.Globally translation is never called

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6740?focusedWorklogId=269365=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269365
 ]

ASF GitHub Bot logged work on BEAM-6740:


Author: ASF GitHub Bot
Created on: 28/Jun/19 16:15
Start Date: 28/Jun/19 16:15
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8964: [BEAM-6740] 
Add PTransformTranslator for Combine.Globally
URL: https://github.com/apache/beam/pull/8964
 
 
   It seems that we missed the payloads for Combine.Globally that can have 
specific translations as is the case of the spark runner.
   
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269365)
Time Spent: 1h 10m  (was: 1h)

> Combine.Globally translation is never called
> 
>
> Key: BEAM-6740
> URL: https://issues.apache.org/jira/browse/BEAM-6740
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Ismaël Mejía
>Priority: Major
>  Labels: portability
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Beam translates Combine.Globally as a composite transform composed of:
>  * Map that assigns Void keys
>  * Combine.PerKey
> on spark: As Combine.Perkey uses a spark GBK inside it, the runner adds its 
> own translation of Combine.Globally to avoid less performant GBK. This 
> translation should be called in place of entering the composite transform 
> translation.A pipeline like this: 
> {code:java}
> PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 
> 9, 10));
> input.apply(
>  Combine.globally(new IntegerCombineFn()));
> {code}
> {code:java}
>   private static class IntegerCombineFn extends Combine.CombineFn Integer, Integer> {
> @Override
> public Integer createAccumulator() {
>   return 0;
> }
> @Override
> public Integer addInput(Integer accumulator, Integer input) {
>   return accumulator + input;
> }
> @Override
> public Integer mergeAccumulators(Iterable accumulators) {
>   Integer result = 0;
>   for (Integer value : accumulators) {
> result += value;
>   }
>   return result;
> }
> @Override
> public Integer extractOutput(Integer accumulator) {
>   return accumulator;
> }
>   }
> {code}
> is translated as the above composite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269361=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269361
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:54
Start Date: 28/Jun/19 15:54
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298653010
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -587,20 +589,35 @@ public PDone expand(PCollection input) {
 
 private static class KinesisWriterFn extends DoFn {
 
-  private static final int MAX_NUM_RECORDS = 100 * 1000;
   private static final int MAX_NUM_FAILURES = 10;
 
   private final KinesisIO.Write spec;
-  private transient IKinesisProducer producer;
+  private static transient IKinesisProducer producer = null;
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269361)
Time Spent: 4.5h  (was: 4h 20m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()

[jira] [Updated] (BEAM-7653) Combine.GroupedValues translation is never called

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7653:
---
Labels: portability  (was: )

> Combine.GroupedValues translation is never called
> -
>
> Key: BEAM-7653
> URL: https://issues.apache.org/jira/browse/BEAM-7653
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: portability
>
> This issue is similar to BEAM-6740. When a runner overrides the translation 
> of Combine.Values is not being called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7653) Combine.GroupedValues translation is never called

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7653:
---
Description: This issue is similar to BEAM-6740. When a runner overrides 
the translation of Combine.Values using URNs the translator is not being 
called.  (was: This issue is similar to BEAM-6740. When a runner overrides the 
translation of Combine.Values is not being called.)

> Combine.GroupedValues translation is never called
> -
>
> Key: BEAM-7653
> URL: https://issues.apache.org/jira/browse/BEAM-7653
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Labels: portability
>
> This issue is similar to BEAM-6740. When a runner overrides the translation 
> of Combine.Values using URNs the translator is not being called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6740) Combine.Globally translation is never called

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-6740:
---
Summary: Combine.Globally translation is never called  (was: 
Combine.globally translation is never called)

> Combine.Globally translation is never called
> 
>
> Key: BEAM-6740
> URL: https://issues.apache.org/jira/browse/BEAM-6740
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Ismaël Mejía
>Priority: Major
>  Labels: portability
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Beam translates Combine.Globally as a composite transform composed of:
>  * Map that assigns Void keys
>  * Combine.PerKey
> on spark: As Combine.Perkey uses a spark GBK inside it, the runner adds its 
> own translation of Combine.Globally to avoid less performant GBK. This 
> translation should be called in place of entering the composite transform 
> translation.A pipeline like this: 
> {code:java}
> PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 
> 9, 10));
> input.apply(
>  Combine.globally(new IntegerCombineFn()));
> {code}
> {code:java}
>   private static class IntegerCombineFn extends Combine.CombineFn Integer, Integer> {
> @Override
> public Integer createAccumulator() {
>   return 0;
> }
> @Override
> public Integer addInput(Integer accumulator, Integer input) {
>   return accumulator + input;
> }
> @Override
> public Integer mergeAccumulators(Iterable accumulators) {
>   Integer result = 0;
>   for (Integer value : accumulators) {
> result += value;
>   }
>   return result;
> }
> @Override
> public Integer extractOutput(Integer accumulator) {
>   return accumulator;
> }
>   }
> {code}
> is translated as the above composite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7653) Combine.GroupedValues translation is never called

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7653:
---
Summary: Combine.GroupedValues translation is never called  (was: 
Combine.Values translation is never called)

> Combine.GroupedValues translation is never called
> -
>
> Key: BEAM-7653
> URL: https://issues.apache.org/jira/browse/BEAM-7653
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> This issue is similar to BEAM-6740. When a runner overrides the translation 
> of Combine.Values is not being called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269356
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:35
Start Date: 28/Jun/19 15:35
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298645866
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -741,28 +750,11 @@ private void checkForFailures() throws IOException {
 }
 failures.clear();
 
-String message =
+String errorMessage =
 String.format(
 "Some errors occurred writing to Kinesis. First %d errors: %s",
 i, logEntry.toString());
-throw new IOException(message);
-  }
-
-  private class UserRecordResultFutureCallback implements 
FutureCallback {
-
-@Override
-public void onFailure(Throwable cause) {
-  failures.offer(new KinesisWriteException(cause));
-}
-
-@Override
-public void onSuccess(UserRecordResult result) {
-  if (!result.isSuccessful()) {
-failures.offer(
-new KinesisWriteException(
-"Put record was not successful.", new 
UserRecordFailedException(result)));
-  }
-}
+throw new IOException(errorMessage);
 
 Review comment:
   For the refactor to return String is just to have a more composable (and 
testable) signature, for the return time it is just because errorMessage is 
never used afterwards.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269356)
Time Spent: 4h 20m  (was: 4h 10m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269355
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:28
Start Date: 28/Jun/19 15:28
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298642872
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -741,28 +750,11 @@ private void checkForFailures() throws IOException {
 }
 failures.clear();
 
-String message =
+String errorMessage =
 String.format(
 "Some errors occurred writing to Kinesis. First %d errors: %s",
 i, logEntry.toString());
-throw new IOException(message);
-  }
-
-  private class UserRecordResultFutureCallback implements 
FutureCallback {
-
-@Override
-public void onFailure(Throwable cause) {
-  failures.offer(new KinesisWriteException(cause));
-}
-
-@Override
-public void onSuccess(UserRecordResult result) {
-  if (!result.isSuccessful()) {
-failures.offer(
-new KinesisWriteException(
-"Put record was not successful.", new 
UserRecordFailedException(result)));
-  }
-}
+throw new IOException(errorMessage);
 
 Review comment:
   What is a reason for that? Not clear for me
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269355)
Time Spent: 4h 10m  (was: 4h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail 

[jira] [Created] (BEAM-7653) Combine.Values translation is never called

2019-06-28 Thread JIRA
Ismaël Mejía created BEAM-7653:
--

 Summary: Combine.Values translation is never called
 Key: BEAM-7653
 URL: https://issues.apache.org/jira/browse/BEAM-7653
 Project: Beam
  Issue Type: Bug
  Components: runner-core
Reporter: Ismaël Mejía
Assignee: Ismaël Mejía


This issue is similar to BEAM-6740. When a runner overrides the translation of 
Combine.Values is not being called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7653) Combine.Values translation is never called

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7653:
---
Status: Open  (was: Triage Needed)

> Combine.Values translation is never called
> --
>
> Key: BEAM-7653
> URL: https://issues.apache.org/jira/browse/BEAM-7653
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>
> This issue is similar to BEAM-6740. When a runner overrides the translation 
> of Combine.Values is not being called.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-6740) Combine.globally translation is never called

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-6740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-6740:
---
Labels: portability  (was: )

> Combine.globally translation is never called
> 
>
> Key: BEAM-6740
> URL: https://issues.apache.org/jira/browse/BEAM-6740
> Project: Beam
>  Issue Type: Bug
>  Components: runner-core
>Reporter: Etienne Chauchot
>Assignee: Ismaël Mejía
>Priority: Major
>  Labels: portability
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Beam translates Combine.Globally as a composite transform composed of:
>  * Map that assigns Void keys
>  * Combine.PerKey
> on spark: As Combine.Perkey uses a spark GBK inside it, the runner adds its 
> own translation of Combine.Globally to avoid less performant GBK. This 
> translation should be called in place of entering the composite transform 
> translation.A pipeline like this: 
> {code:java}
> PCollection input = pipeline.apply(Create.of(1, 2, 3, 4, 5, 6, 7, 8, 
> 9, 10));
> input.apply(
>  Combine.globally(new IntegerCombineFn()));
> {code}
> {code:java}
>   private static class IntegerCombineFn extends Combine.CombineFn Integer, Integer> {
> @Override
> public Integer createAccumulator() {
>   return 0;
> }
> @Override
> public Integer addInput(Integer accumulator, Integer input) {
>   return accumulator + input;
> }
> @Override
> public Integer mergeAccumulators(Iterable accumulators) {
>   Integer result = 0;
>   for (Integer value : accumulators) {
> result += value;
>   }
>   return result;
> }
> @Override
> public Integer extractOutput(Integer accumulator) {
>   return accumulator;
> }
>   }
> {code}
> is translated as the above composite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (BEAM-7637) Migrate S3FileSystem to AWS SDK for Java 2

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-7637:
--

Assignee: Ismaël Mejía

> Migrate S3FileSystem to AWS SDK for Java 2
> --
>
> Key: BEAM-7637
> URL: https://issues.apache.org/jira/browse/BEAM-7637
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (BEAM-7652) Allow to parametrize the ClientConfiguration and ApacheHttpClient for AWS operations

2019-06-28 Thread JIRA
Ismaël Mejía created BEAM-7652:
--

 Summary: Allow to parametrize the ClientConfiguration and 
ApacheHttpClient for AWS operations
 Key: BEAM-7652
 URL: https://issues.apache.org/jira/browse/BEAM-7652
 Project: Beam
  Issue Type: Sub-task
  Components: io-java-aws
Reporter: Ismaël Mejía


Hand tuning of Http connections and Client Configuration has been divided in 
two different objects for AWS SDK for Java 2. We need to expose this as part of 
AWSOptions so users that require to refine connections or other client details 
may use them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (BEAM-7652) Allow to parametrize the ClientConfiguration and ApacheHttpClient for AWS operations

2019-06-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/BEAM-7652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated BEAM-7652:
---
Status: Open  (was: Triage Needed)

> Allow to parametrize the ClientConfiguration and ApacheHttpClient for AWS 
> operations
> 
>
> Key: BEAM-7652
> URL: https://issues.apache.org/jira/browse/BEAM-7652
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-aws
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Hand tuning of Http connections and Client Configuration has been divided in 
> two different objects for AWS SDK for Java 2. We need to expose this as part 
> of AWSOptions so users that require to refine connections or other client 
> details may use them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269350
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:19
Start Date: 28/Jun/19 15:19
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298639114
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/test/java/org/apache/beam/sdk/io/kinesis/KinesisProducerMock.java
 ##
 @@ -120,8 +125,6 @@ public void flush() {
 
   @Override
   public synchronized void flushSync() {
-if (getOutstandingRecordsCount() > 0) {
-  flush();
-}
+throw new RuntimeException("Not implemented");
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269350)
Time Spent: 4h  (was: 3h 50m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269348
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:16
Start Date: 28/Jun/19 15:16
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298638372
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -741,28 +750,11 @@ private void checkForFailures() throws IOException {
 }
 failures.clear();
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269348)
Time Spent: 3h 50m  (was: 3h 40m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269344=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269344
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:12
Start Date: 28/Jun/19 15:12
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298636706
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -587,20 +589,35 @@ public PDone expand(PCollection input) {
 
 private static class KinesisWriterFn extends DoFn {
 
-  private static final int MAX_NUM_RECORDS = 100 * 1000;
   private static final int MAX_NUM_FAILURES = 10;
 
   private final KinesisIO.Write spec;
-  private transient IKinesisProducer producer;
+  private static transient IKinesisProducer producer = null;
   private transient KinesisPartitioner partitioner;
   private transient LinkedBlockingDeque failures;
+  private transient List> putFutures;
 
   public KinesisWriterFn(KinesisIO.Write spec) {
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269344)
Time Spent: 3h 40m  (was: 3.5h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269338=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269338
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:11
Start Date: 28/Jun/19 15:11
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298636277
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
-int numOutstandingRecords = producer.getOutstandingRecordsCount();
+int numFailedRecords;
 int retryTimeout = 1000; // initial timeout, 1 sec
+String message = "";
 
-while (numOutstandingRecords > numMax && retries-- > 0) {
+do {
+  numFailedRecords = 0;
   producer.flush();
+
+  // Wait for puts to finish and check the results
+  for (Future f : putFutures) {
+UserRecordResult result = f.get(); // this does block
+if (!result.isSuccessful()) {
+  numFailedRecords++;
+}
+  }
+
   // wait until outstanding records will be flushed
   Thread.sleep(retryTimeout);
-  numOutstandingRecords = producer.getOutstandingRecordsCount();
   retryTimeout *= 2; // exponential backoff
-}
+} while (numFailedRecords > 0 && retries-- > 0);
+
+if (numFailedRecords > 0) {
+  for (Future f : putFutures) {
+UserRecordResult result = f.get();
+if (!result.isSuccessful()) {
+  failures.offer(
+  new KinesisWriteException(
+  "Put record was not successful.", new 
UserRecordFailedException(result)));
+}
+  }
 
-if (numOutstandingRecords > numMax) {
-  String message =
+  message =
   String.format(
-  "After [%d] retries, number of outstanding records [%d] is 
still greater than "
-  + "required [%d].",
-  spec.getRetries(), numOutstandingRecords, numMax);
+  "After [%d] retries, number of failed records [%d] is still 
greater than 0",
+  spec.getRetries(), numFailedRecords);
   LOG.error(message);
-  throw new IOException(message);
 }
-  }
 
-  private void flushAll() throws InterruptedException, IOException {
-flush(0);
+checkForFailures(message);
   }
 
   /** If any write has asynchronously failed, fail the bundle with a 
useful error. */
-  private void checkForFailures() throws IOException {
-// Note that this function is never called by multiple threads and is 
the only place that
-// we remove from failures, so this code is safe.
+  private void checkForFailures(String message)
+  throws IOException, InterruptedException, ExecutionException {
 if (failures.isEmpty()) {
   return;
 }
 
 StringBuilder logEntry = new StringBuilder();
+logEntry.append(message).append("\n");
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269336=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269336
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:09
Start Date: 28/Jun/19 15:09
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298635576
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
-int numOutstandingRecords = producer.getOutstandingRecordsCount();
+int numFailedRecords;
 int retryTimeout = 1000; // initial timeout, 1 sec
+String message = "";
 
-while (numOutstandingRecords > numMax && retries-- > 0) {
+do {
+  numFailedRecords = 0;
   producer.flush();
+
+  // Wait for puts to finish and check the results
+  for (Future f : putFutures) {
+UserRecordResult result = f.get(); // this does block
+if (!result.isSuccessful()) {
+  numFailedRecords++;
+}
+  }
+
   // wait until outstanding records will be flushed
   Thread.sleep(retryTimeout);
-  numOutstandingRecords = producer.getOutstandingRecordsCount();
   retryTimeout *= 2; // exponential backoff
-}
+} while (numFailedRecords > 0 && retries-- > 0);
+
+if (numFailedRecords > 0) {
+  for (Future f : putFutures) {
+UserRecordResult result = f.get();
+if (!result.isSuccessful()) {
+  failures.offer(
+  new KinesisWriteException(
+  "Put record was not successful.", new 
UserRecordFailedException(result)));
+}
+  }
 
-if (numOutstandingRecords > numMax) {
-  String message =
+  message =
   String.format(
-  "After [%d] retries, number of outstanding records [%d] is 
still greater than "
-  + "required [%d].",
-  spec.getRetries(), numOutstandingRecords, numMax);
+  "After [%d] retries, number of failed records [%d] is still 
greater than 0",
+  spec.getRetries(), numFailedRecords);
   LOG.error(message);
-  throw new IOException(message);
 }
-  }
 
-  private void flushAll() throws InterruptedException, IOException {
-flush(0);
+checkForFailures(message);
   }
 
   /** If any write has asynchronously failed, fail the bundle with a 
useful error. */
-  private void checkForFailures() throws IOException {
-// Note that this function is never called by multiple threads and is 
the only place that
-// we remove from failures, so this code is safe.
+  private void checkForFailures(String message)
+  throws IOException, InterruptedException, ExecutionException {
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269334=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269334
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:07
Start Date: 28/Jun/19 15:07
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298634768
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
-int numOutstandingRecords = producer.getOutstandingRecordsCount();
+int numFailedRecords;
 int retryTimeout = 1000; // initial timeout, 1 sec
+String message = "";
 
-while (numOutstandingRecords > numMax && retries-- > 0) {
+do {
+  numFailedRecords = 0;
 
 Review comment:
   oh yes sorry my bad.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269334)
Time Spent: 3h 10m  (was: 3h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269332=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269332
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:06
Start Date: 28/Jun/19 15:06
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298634184
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -639,12 +649,6 @@ public void setup() throws Exception {
*/
   @ProcessElement
   public void processElement(ProcessContext c) throws Exception {
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269332)
Time Spent: 3h  (was: 2h 50m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269331=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269331
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:06
Start Date: 28/Jun/19 15:06
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298634035
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -587,20 +589,35 @@ public PDone expand(PCollection input) {
 
 private static class KinesisWriterFn extends DoFn {
 
-  private static final int MAX_NUM_RECORDS = 100 * 1000;
   private static final int MAX_NUM_FAILURES = 10;
 
   private final KinesisIO.Write spec;
-  private transient IKinesisProducer producer;
+  private static transient IKinesisProducer producer = null;
   private transient KinesisPartitioner partitioner;
   private transient LinkedBlockingDeque failures;
+  private transient List> putFutures;
 
   public KinesisWriterFn(KinesisIO.Write spec) {
 this.spec = spec;
+initKinesisProducer();
   }
 
   @Setup
   public void setup() throws Exception {
 
 Review comment:
   Ok
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269331)
Time Spent: 2h 50m  (was: 2h 40m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269330=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269330
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:05
Start Date: 28/Jun/19 15:05
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298633706
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
 
 Review comment:
   If it's empty (which is unlikely since it's called from `@finishBundle` 
method) than other part of code will be executed very quickly. And anyway, we 
need to call `checkForFailures` before to make sure that there are no other 
failures.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269330)
Time Spent: 2h 40m  (was: 2.5h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269329=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269329
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:05
Start Date: 28/Jun/19 15:05
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298633706
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
 
 Review comment:
   If it's empty (which is unlikely since it's call from `@finishBundle` 
method) than other part of code will be executed very quickly. And anyway, we 
need to call `checkForFailures` before to make sure that there are no other 
failures.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269329)
Time Spent: 2.5h  (was: 2h 20m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269327
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 15:00
Start Date: 28/Jun/19 15:00
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #8955: 
[BEAM-7589] Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298631868
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
-int numOutstandingRecords = producer.getOutstandingRecordsCount();
+int numFailedRecords;
 int retryTimeout = 1000; // initial timeout, 1 sec
+String message = "";
 
-while (numOutstandingRecords > numMax && retries-- > 0) {
+do {
+  numFailedRecords = 0;
 
 Review comment:
   We need to reset the value of `numFailedRecords` on every loop iteration, 
otherwise the final value won't be correct.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269327)
Time Spent: 2h 20m  (was: 2h 10m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269325=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269325
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 14:57
Start Date: 28/Jun/19 14:57
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#issuecomment-50894
 
 
   I also ran `KinesisIOIT` test on my local environment against real Kinesis 
instance with `targetParallelism=8`. No issues have been seen so far.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269325)
Time Spent: 2h 10m  (was: 2h)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> UUID.randomUUID().toString()
> Hope this gave you some more context on my problem.
> Another suggestion I have, can you try fix the issue as I suggest and provide 
> me some specific version for testing? without merging it to master? (I would 
> di it myself, but I had truobles building locally the hue repository of 
> apache beam..)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7640) Create amazon-web-services2 module and AwsOptions

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7640?focusedWorklogId=269324=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269324
 ]

ASF GitHub Bot logged work on BEAM-7640:


Author: ASF GitHub Bot
Created on: 28/Jun/19 14:55
Start Date: 28/Jun/19 14:55
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8963: [BEAM-7640] 
Create amazon-web-services2 module and AwsOptions
URL: https://github.com/apache/beam/pull/8963
 
 
   This creates the canvas for the new Amazon Web Services IO module based on 
AWS SDK for Java 2.
   So far the PR includes the AwsOptions Definition as well as the Module to 
map objects from/to json.
   
   R: @aromanenko-dev 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python3_Verify/lastCompletedBuild/)
 | --- | [![Build 

[jira] [Work logged] (BEAM-7414) RabbitMqMessage can't be serialized due to LongString in headers

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7414?focusedWorklogId=269307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269307
 ]

ASF GitHub Bot logged work on BEAM-7414:


Author: ASF GitHub Bot
Created on: 28/Jun/19 14:26
Start Date: 28/Jun/19 14:26
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8677: [BEAM-7414] fix for 
message being not serializable due to LongString in headers
URL: https://github.com/apache/beam/pull/8677#issuecomment-506753561
 
 
   There have been issues with the jenkins executor randomly causing job issues.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269307)
Time Spent: 2h  (was: 1h 50m)

> RabbitMqMessage can't be serialized due to LongString in headers
> 
>
> Key: BEAM-7414
> URL: https://issues.apache.org/jira/browse/BEAM-7414
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.12.0
> Environment: dataflow runner
>Reporter: Nicolas Delsaux
>Assignee: Nicolas Delsaux
>Priority: Major
>  Labels: rabbitmq, serializable
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When trying to read messages from RabbitMq, I get systematic
> {{{color:#00}java.lang.IllegalArgumentException: Unable to encode element 
> 'ValueWithRecordId\{id=[], 
> value=org.apache.beam.sdk.io.rabbitmq.RabbitMqMessage@234080e1}' with coder 
> 'ValueWithRecordId$ValueWithRecordIdCoder(org.apache.beam.sdk.coders.SerializableCoder@206641ef)'.
>  org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300) 
> org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291) 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
>  
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
>  
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
>  
> org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745) Caused by: 
> java.io.NotSerializableException: 
> com.rabbitmq.client.impl.LongStringHelper$ByteArrayLongString 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) 
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) 
> java.util.HashMap.internalWriteEntries(HashMap.java:1785) 
> java.util.HashMap.writeObject(HashMap.java:1362) 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
> 

[jira] [Work logged] (BEAM-7414) RabbitMqMessage can't be serialized due to LongString in headers

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7414?focusedWorklogId=269308=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269308
 ]

ASF GitHub Bot logged work on BEAM-7414:


Author: ASF GitHub Bot
Created on: 28/Jun/19 14:26
Start Date: 28/Jun/19 14:26
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #8677: [BEAM-7414] fix for 
message being not serializable due to LongString in headers
URL: https://github.com/apache/beam/pull/8677#issuecomment-506753610
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269308)
Time Spent: 2h 10m  (was: 2h)

> RabbitMqMessage can't be serialized due to LongString in headers
> 
>
> Key: BEAM-7414
> URL: https://issues.apache.org/jira/browse/BEAM-7414
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.12.0
> Environment: dataflow runner
>Reporter: Nicolas Delsaux
>Assignee: Nicolas Delsaux
>Priority: Major
>  Labels: rabbitmq, serializable
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When trying to read messages from RabbitMq, I get systematic
> {{{color:#00}java.lang.IllegalArgumentException: Unable to encode element 
> 'ValueWithRecordId\{id=[], 
> value=org.apache.beam.sdk.io.rabbitmq.RabbitMqMessage@234080e1}' with coder 
> 'ValueWithRecordId$ValueWithRecordIdCoder(org.apache.beam.sdk.coders.SerializableCoder@206641ef)'.
>  org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300) 
> org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291) 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
>  
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
>  
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
>  
> org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745) Caused by: 
> java.io.NotSerializableException: 
> com.rabbitmq.client.impl.LongStringHelper$ByteArrayLongString 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) 
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) 
> java.util.HashMap.internalWriteEntries(HashMap.java:1785) 
> java.util.HashMap.writeObject(HashMap.java:1362) 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
> 

[jira] [Work logged] (BEAM-6611) A Python Sink for BigQuery with File Loads in Streaming

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=269303=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269303
 ]

ASF GitHub Bot logged work on BEAM-6611:


Author: ASF GitHub Bot
Created on: 28/Jun/19 14:17
Start Date: 28/Jun/19 14:17
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery 
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-506750497
 
 
   Run Python PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269303)
Time Spent: 1h 20m  (was: 1h 10m)

> A Python Sink for BigQuery with File Loads in Streaming
> ---
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Tanay Tummalapalli
>Priority: Major
>  Labels: gsoc, gsoc2019, mentor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery, 
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and 
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR 
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads 
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts 
> (although they also are slower for the records to show up in the table).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7437) Integration Test for BQ streaming inserts for streaming pipelines

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7437?focusedWorklogId=269302=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269302
 ]

ASF GitHub Bot logged work on BEAM-7437:


Author: ASF GitHub Bot
Created on: 28/Jun/19 14:15
Start Date: 28/Jun/19 14:15
Worklog Time Spent: 10m 
  Work Description: ttanay commented on issue #8934: [BEAM-7437] Add 
streaming flag to BQ streaming inserts IT test
URL: https://github.com/apache/beam/pull/8934#issuecomment-506749987
 
 
   Hi @udim 
   Made the changes. PTAL.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269302)
Time Spent: 4h 10m  (was: 4h)

> Integration Test for BQ streaming inserts for streaming pipelines
> -
>
> Key: BEAM-7437
> URL: https://issues.apache.org/jira/browse/BEAM-7437
> Project: Beam
>  Issue Type: Test
>  Components: io-python-gcp
>Affects Versions: 2.12.0
>Reporter: Tanay Tummalapalli
>Assignee: Tanay Tummalapalli
>Priority: Minor
>  Labels: test
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Integration Test for BigQuery Sink using Streaming Inserts for streaming 
> pipelines.
> Integration tests currently exist for batch pipelines, it can also be added 
> for streaming pipelines using TestStream. This will be a precursor to the 
> failing integration test to be added for [BEAM-6611| 
> https://issues.apache.org/jira/browse/BEAM-6611].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7414) RabbitMqMessage can't be serialized due to LongString in headers

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7414?focusedWorklogId=269257=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269257
 ]

ASF GitHub Bot logged work on BEAM-7414:


Author: ASF GitHub Bot
Created on: 28/Jun/19 13:06
Start Date: 28/Jun/19 13:06
Worklog Time Spent: 10m 
  Work Description: Riduidel commented on issue #8677: [BEAM-7414] fix for 
message being not serializable due to LongString in headers
URL: https://github.com/apache/beam/pull/8677#issuecomment-506726851
 
 
   Why did the first check failed ? It seems unrelated to my code, no ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269257)
Time Spent: 1h 50m  (was: 1h 40m)

> RabbitMqMessage can't be serialized due to LongString in headers
> 
>
> Key: BEAM-7414
> URL: https://issues.apache.org/jira/browse/BEAM-7414
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-rabbitmq
>Affects Versions: 2.12.0
> Environment: dataflow runner
>Reporter: Nicolas Delsaux
>Assignee: Nicolas Delsaux
>Priority: Major
>  Labels: rabbitmq, serializable
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When trying to read messages from RabbitMq, I get systematic
> {{{color:#00}java.lang.IllegalArgumentException: Unable to encode element 
> 'ValueWithRecordId\{id=[], 
> value=org.apache.beam.sdk.io.rabbitmq.RabbitMqMessage@234080e1}' with coder 
> 'ValueWithRecordId$ValueWithRecordIdCoder(org.apache.beam.sdk.coders.SerializableCoder@206641ef)'.
>  org.apache.beam.sdk.coders.Coder.getEncodedElementByteSize(Coder.java:300) 
> org.apache.beam.sdk.coders.Coder.registerByteSizeObserver(Coder.java:291) 
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:564)
>  
> org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:480)
>  
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:400)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:125)
>  
> org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:64)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:201)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:159)
>  
> org.apache.beam.runners.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1283)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:147)
>  
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1020)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.run(Thread.java:745) Caused by: 
> java.io.NotSerializableException: 
> com.rabbitmq.client.impl.LongStringHelper$ByteArrayLongString 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1184) 
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) 
> java.util.HashMap.internalWriteEntries(HashMap.java:1785) 
> java.util.HashMap.writeObject(HashMap.java:1362) 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028) 
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
> 

[jira] [Work logged] (BEAM-6675) The JdbcIO sink should accept schemas

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6675?focusedWorklogId=269258=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269258
 ]

ASF GitHub Bot logged work on BEAM-6675:


Author: ASF GitHub Bot
Created on: 28/Jun/19 13:06
Start Date: 28/Jun/19 13:06
Worklog Time Spent: 10m 
  Work Description: JawadHyder commented on issue #8962: [BEAM-6675] 
Generate JDBC statement and preparedStatementSetter automatically when schema 
is available
URL: https://github.com/apache/beam/pull/8962#issuecomment-506727007
 
 
   R: @reuvenlax 
   R: @jbonofre 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269258)
Time Spent: 0.5h  (was: 20m)

> The JdbcIO sink should accept schemas
> -
>
> Key: BEAM-6675
> URL: https://issues.apache.org/jira/browse/BEAM-6675
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-jdbc
>Reporter: Reuven Lax
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> If the input has a schema, there should be a default mapping to a 
> PreparedStatement for writing based on that schema.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269254=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269254
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 13:02
Start Date: 28/Jun/19 13:02
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298566035
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
-int numOutstandingRecords = producer.getOutstandingRecordsCount();
+int numFailedRecords;
 int retryTimeout = 1000; // initial timeout, 1 sec
+String message = "";
 
-while (numOutstandingRecords > numMax && retries-- > 0) {
+do {
+  numFailedRecords = 0;
 
 Review comment:
   move the initialization of numFailedRecords to the place where it is defined
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269254)
Time Spent: 2h  (was: 1h 50m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269253=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269253
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 13:02
Start Date: 28/Jun/19 13:02
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298565241
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -587,20 +589,35 @@ public PDone expand(PCollection input) {
 
 private static class KinesisWriterFn extends DoFn {
 
-  private static final int MAX_NUM_RECORDS = 100 * 1000;
   private static final int MAX_NUM_FAILURES = 10;
 
   private final KinesisIO.Write spec;
-  private transient IKinesisProducer producer;
+  private static transient IKinesisProducer producer = null;
 
 Review comment:
   No need to do ` = null;`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 269253)
Time Spent: 1h 50m  (was: 1h 40m)

> Kinesis IO.write throws LimitExceededException
> --
>
> Key: BEAM-7589
> URL: https://issues.apache.org/jira/browse/BEAM-7589
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.11.0
>Reporter: Anton Kedin
>Assignee: Alexey Romanenko
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Follow up from https://issues.apache.org/jira/browse/BEAM-7357:
>  
> 
> Brachi Packter added a comment - 13/Jun/19 09:05
>  [~aromanenko] I think I find what makes the shard map update now.
> You create a producer per bundle (in SetUp function) and if I multiply it by 
> the number of workers, this gives huge amount of producers, I belive this 
> make the "update shard map" call.
> If I copy your code and create *one* producer ** for every wroker, then this 
> error disappear.
> Can you just remove the producer creation from setUp method, and move it to 
> some static field in the class, that created once the class is initiated.
> See similar issue that was with JDBCIO, connection pool was created per setup 
> method, and we moved it to be a static member, and then we will have one pool 
> for JVM. ask [~iemejia] for more detail.
> 
> Alexey Romanenko added a comment  -14/Jun/19 14:31-  edited
>   
>  [~brachi_packter] What kind of error do you have in this case? Could you 
> post an error stacktrace / exception message? 
>  Also, it would be helpful (if it's possible) if you could provide more 
> details about your environment and pipeline, like what is your pipeline 
> topology, which runner do you use, number of workers in your cluster, etc. 
>  For now, I can't reproduce it on my side, so all additional info will be 
> helpful.
> 
> Brachi Packter added a comment - 16/Jun/19 06:44
>  I get same Same error:
> {code:java}
> [0x1728][0x7f13ed4c4700] [error] [shard_map.cc:150] Shard map update 
> for stream "**" failed. Code: LimitExceededException Message: Rate exceeded 
> for stream poc-test under account **.; retrying in 5062 ms
> {code}
> I'm not seeing full stack trace, but can see in log also this:
> {code:java}
> [2019-06-13 08:29:09.427018] [0x07e1][0x7f8d508d3700] [warning] [AWS 
> Log: WARN](AWSErrorMarshaller)Encountered AWSError Throttling Rate exceeded
> {code}
> More details:
>  I'm using DataFlow runner, java SDK 2.11.
> 60 workers initally, (with auto scalling and also with flag 
> "enableStreamingEngine")
> Normally, I'm producing 4-5k per second, but when I have latency, this can be 
> even multiply by 3-4 times.
> When I'm starting the DataFlow job I have latency, so I produce more data, 
> and I fail immediately.
> Also, I have consumers, 3rd party tool, I know that they call describe stream 
> each 30 seconds.
> My job pipeline, running on GCP, reading data from PubSub, it read around 
> 20,000 record per second (in regular time, and in latency time even 100,000 
> records per second) , it does many aggregation and counting base on some 
> diamnesions (Using Beam sql) , This is done for 1 minutes window slide, and 
> wrting the result of aggregations to Kinesis stream.
> My stream has 10 shards, and my partition key logic is generating UUid per 
> each record: 
> 

[jira] [Work logged] (BEAM-7589) Kinesis IO.write throws LimitExceededException

2019-06-28 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7589?focusedWorklogId=269250=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-269250
 ]

ASF GitHub Bot logged work on BEAM-7589:


Author: ASF GitHub Bot
Created on: 28/Jun/19 13:02
Start Date: 28/Jun/19 13:02
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #8955: [BEAM-7589] 
Use only one KinesisProducer instance per JVM 
URL: https://github.com/apache/beam/pull/8955#discussion_r298569841
 
 

 ##
 File path: 
sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisIO.java
 ##
 @@ -657,67 +661,72 @@ public void processElement(ProcessContext c) throws 
Exception {
 
 ListenableFuture f =
 producer.addUserRecord(spec.getStreamName(), partitionKey, 
explicitHashKey, data);
-Futures.addCallback(f, new UserRecordResultFutureCallback());
+putFutures.add(f);
   }
 
   @FinishBundle
   public void finishBundle() throws Exception {
-// Flush all outstanding records, blocking call
-flushAll();
-
-checkForFailures();
-  }
-
-  @Teardown
-  public void tearDown() throws Exception {
-if (producer != null) {
-  producer.destroy();
-  producer = null;
-}
+flushBundle();
   }
 
   /**
-   * Flush outstanding records until the total number will be less than 
required or the number
-   * of retries will be exhausted. The retry timeout starts from 1 second 
and it doubles on
-   * every iteration.
+   * Flush outstanding records until the total number of failed records 
will be less than 0 or
+   * the number of retries will be exhausted. The retry timeout starts 
from 1 second and it
+   * doubles on every iteration.
*/
-  private void flush(int numMax) throws InterruptedException, IOException {
+  private void flushBundle() throws InterruptedException, 
ExecutionException, IOException {
 int retries = spec.getRetries();
-int numOutstandingRecords = producer.getOutstandingRecordsCount();
+int numFailedRecords;
 int retryTimeout = 1000; // initial timeout, 1 sec
+String message = "";
 
-while (numOutstandingRecords > numMax && retries-- > 0) {
+do {
+  numFailedRecords = 0;
   producer.flush();
+
+  // Wait for puts to finish and check the results
+  for (Future f : putFutures) {
+UserRecordResult result = f.get(); // this does block
+if (!result.isSuccessful()) {
+  numFailedRecords++;
+}
+  }
+
   // wait until outstanding records will be flushed
   Thread.sleep(retryTimeout);
-  numOutstandingRecords = producer.getOutstandingRecordsCount();
   retryTimeout *= 2; // exponential backoff
-}
+} while (numFailedRecords > 0 && retries-- > 0);
+
+if (numFailedRecords > 0) {
+  for (Future f : putFutures) {
+UserRecordResult result = f.get();
+if (!result.isSuccessful()) {
+  failures.offer(
+  new KinesisWriteException(
+  "Put record was not successful.", new 
UserRecordFailedException(result)));
+}
+  }
 
-if (numOutstandingRecords > numMax) {
-  String message =
+  message =
   String.format(
-  "After [%d] retries, number of outstanding records [%d] is 
still greater than "
-  + "required [%d].",
-  spec.getRetries(), numOutstandingRecords, numMax);
+  "After [%d] retries, number of failed records [%d] is still 
greater than 0",
+  spec.getRetries(), numFailedRecords);
   LOG.error(message);
-  throw new IOException(message);
 }
-  }
 
-  private void flushAll() throws InterruptedException, IOException {
-flush(0);
+checkForFailures(message);
   }
 
   /** If any write has asynchronously failed, fail the bundle with a 
useful error. */
-  private void checkForFailures() throws IOException {
-// Note that this function is never called by multiple threads and is 
the only place that
-// we remove from failures, so this code is safe.
+  private void checkForFailures(String message)
+  throws IOException, InterruptedException, ExecutionException {
 if (failures.isEmpty()) {
   return;
 }
 
 StringBuilder logEntry = new StringBuilder();
+logEntry.append(message).append("\n");
 
 Review comment:
   Use `System.lineSeparator()` maybe instead of "\n" in this and other uses
 

This is an automated message from the Apache Git Service.
To 

  1   2   >