[jira] [Work logged] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8415?focusedWorklogId=330844=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330844
 ]

ASF GitHub Bot logged work on BEAM-8415:


Author: ASF GitHub Bot
Created on: 19/Oct/19 02:21
Start Date: 19/Oct/19 02:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9812: [BEAM-8415] Improving 
error message when applying PTransform with a n…
URL: https://github.com/apache/beam/pull/9812#issuecomment-544055770
 
 
   Thanks David!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330844)
Time Spent: 1.5h  (was: 1h 20m)

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8415?focusedWorklogId=330845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330845
 ]

ASF GitHub Bot logged work on BEAM-8415:


Author: ASF GitHub Bot
Created on: 19/Oct/19 02:21
Start Date: 19/Oct/19 02:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9812: [BEAM-8415] 
Improving error message when applying PTransform with a n…
URL: https://github.com/apache/beam/pull/9812
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330845)
Time Spent: 1h 40m  (was: 1.5h)

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330838
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 19/Oct/19 01:20
Start Date: 19/Oct/19 01:20
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336716237
 
 

 ##
 File path: sdks/python/apache_beam/coders/standard_coders_test.py
 ##
 @@ -133,11 +171,17 @@ def parse_coder(self, spec):
  for c in spec.get('components', ())]
 context.coders.put_proto(coder_id, beam_runner_api_pb2.Coder(
 spec=beam_runner_api_pb2.FunctionSpec(
-urn=spec['urn'], payload=spec.get('payload')),
+urn=spec['urn'], payload=spec.get('payload', '').encode()),
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330838)
Time Spent: 13h 50m  (was: 13h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8438) Update Python/Streaming IO Documentation

2019-10-18 Thread Brian Hulette (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955026#comment-16955026
 ] 

Brian Hulette commented on BEAM-8438:
-

[~altay] do we still support a different set of IOs for Python streaming vs. 
batch? Pablo pointed out on slack that at least apache_beam.io.fileio should 
work for streaming now.

I'm happy to update this documentation, I'm just not sure what the correct set 
of IOs is for Python/Streaming.

> Update Python/Streaming IO Documentation
> 
>
> Key: BEAM-8438
> URL: https://issues.apache.org/jira/browse/BEAM-8438
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Brian Hulette
>Priority: Major
>
> Built-in IO documentation states that Python/Streaming only supports pubsub 
> and BQ, which is out of date.
> https://beam.apache.org/documentation/io/built-in/
> This came up on 
> [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8438) Update Python/Streaming IO Documentation

2019-10-18 Thread Brian Hulette (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated BEAM-8438:

Description: 
Built-in IO documentation states that Python/Streaming only supports pubsub and 
BQ, which is out of date.

https://beam.apache.org/documentation/io/built-in/

This came up on 
[slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]

  was:
Built-in IO documentation states that Python/Streaming only supports pubsub and 
BQ, which is out of date.

https://beam.apache.org/documentation/io/built-in/

This came up on [slack|
https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]


> Update Python/Streaming IO Documentation
> 
>
> Key: BEAM-8438
> URL: https://issues.apache.org/jira/browse/BEAM-8438
> Project: Beam
>  Issue Type: Task
>  Components: website
>Reporter: Brian Hulette
>Priority: Major
>
> Built-in IO documentation states that Python/Streaming only supports pubsub 
> and BQ, which is out of date.
> https://beam.apache.org/documentation/io/built-in/
> This came up on 
> [slack|https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8438) Update Python/Streaming IO Documentation

2019-10-18 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-8438:
---

 Summary: Update Python/Streaming IO Documentation
 Key: BEAM-8438
 URL: https://issues.apache.org/jira/browse/BEAM-8438
 Project: Beam
  Issue Type: Task
  Components: website
Reporter: Brian Hulette


Built-in IO documentation states that Python/Streaming only supports pubsub and 
BQ, which is out of date.

https://beam.apache.org/documentation/io/built-in/

This came up on [slack|
https://the-asf.slack.com/archives/CBDNLQZM1/p157141041000]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8434) Allow trigger transcript tests to be run as ValidatesRunner tests.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8434?focusedWorklogId=330835=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330835
 ]

ASF GitHub Bot logged work on BEAM-8434:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:32
Start Date: 19/Oct/19 00:32
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9832: [BEAM-8434] 
Translate trigger transcripts into validates runner tests.
URL: https://github.com/apache/beam/pull/9832#issuecomment-544023904
 
 
   R: @ liumomo315
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330835)
Time Spent: 20m  (was: 10m)

> Allow trigger transcript tests to be run as ValidatesRunner tests. 
> ---
>
> Key: BEAM-8434
> URL: https://issues.apache.org/jira/browse/BEAM-8434
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8435?focusedWorklogId=330834=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330834
 ]

ASF GitHub Bot logged work on BEAM-8435:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:27
Start Date: 19/Oct/19 00:27
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9836: [BEAM-8435] 
Implement PaneInfo computation for Python.
URL: https://github.com/apache/beam/pull/9836
 
 
   Also enable retrieval as a DoFn parameter.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330832=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330832
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:16
Start Date: 19/Oct/19 00:16
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-544019379
 
 
   isort and pylint are crazy and contradicting about where you should put 
`import apache_beam as beam`. And the order suggested changes over time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330832)
Time Spent: 6h 40m  (was: 6.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=330831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330831
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:13
Start Date: 19/Oct/19 00:13
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r336712062
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,90 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
 
 Review comment:
   Ok, I created [BEAM-8437](https://issues.apache.org/jira/browse/BEAM-8437) 
lets take this conversation over there.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330831)
Time Spent: 13h 40m  (was: 13.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8437) Consider using native doubles in standard_coders_test.yaml

2019-10-18 Thread Brian Hulette (Jira)
Brian Hulette created BEAM-8437:
---

 Summary: Consider using native doubles in standard_coders_test.yaml
 Key: BEAM-8437
 URL: https://issues.apache.org/jira/browse/BEAM-8437
 Project: Beam
  Issue Type: Task
  Components: testing
Reporter: Brian Hulette


Context: https://github.com/apache/beam/pull/9188#discussion_r332233044



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330829=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330829
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:10
Start Date: 19/Oct/19 00:10
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336711762
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -219,11 +227,15 @@ def run(self):
 install_requires=REQUIRED_PACKAGES,
 python_requires=python_requires,
 test_suite='nose.collector',
-tests_require=REQUIRED_TEST_PACKAGES,
+tests_require=[
+REQUIRED_TEST_PACKAGES,
+INTERACTIVE_BEAM,
+],
 extras_require={
 'docs': ['Sphinx>=1.5.2,<2.0'],
 'test': REQUIRED_TEST_PACKAGES,
 'gcp': GCP_REQUIREMENTS,
+'ib': INTERACTIVE_BEAM,
 
 Review comment:
   Agreed!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330829)
Time Spent: 6.5h  (was: 6h 20m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330828
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 19/Oct/19 00:10
Start Date: 19/Oct/19 00:10
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336711748
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -219,11 +227,15 @@ def run(self):
 install_requires=REQUIRED_PACKAGES,
 python_requires=python_requires,
 test_suite='nose.collector',
-tests_require=REQUIRED_TEST_PACKAGES,
+tests_require=[
+REQUIRED_TEST_PACKAGES,
+INTERACTIVE_BEAM,
 
 Review comment:
   Because there are unit tests around interactive beam packages.
   When those packages are under REQUIRED_PACKAGES, they are always installed 
and available during tests.
   Once we move them into an extras_require, they are not installed nor 
available during tests anymore.
   For example, *-gcp tests will install [gcp,test] to pick up the 
extras_package GCP_REQUIREMENTS.
   Since we don't have/need *-interactive test suites, we can make the group to 
always be installed as required test packages for tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330828)
Time Spent: 6h 20m  (was: 6h 10m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330826=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330826
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 23:56
Start Date: 18/Oct/19 23:56
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336710275
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -219,11 +227,15 @@ def run(self):
 install_requires=REQUIRED_PACKAGES,
 python_requires=python_requires,
 test_suite='nose.collector',
-tests_require=REQUIRED_TEST_PACKAGES,
+tests_require=[
+REQUIRED_TEST_PACKAGES,
+INTERACTIVE_BEAM,
 
 Review comment:
   Why tests would require interactive beam packages?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330826)
Time Spent: 6h 10m  (was: 6h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330824=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330824
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 23:55
Start Date: 18/Oct/19 23:55
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336710191
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -219,11 +227,15 @@ def run(self):
 install_requires=REQUIRED_PACKAGES,
 python_requires=python_requires,
 test_suite='nose.collector',
-tests_require=REQUIRED_TEST_PACKAGES,
+tests_require=[
+REQUIRED_TEST_PACKAGES,
+INTERACTIVE_BEAM,
+],
 extras_require={
 'docs': ['Sphinx>=1.5.2,<2.0'],
 'test': REQUIRED_TEST_PACKAGES,
 'gcp': GCP_REQUIREMENTS,
+'ib': INTERACTIVE_BEAM,
 
 Review comment:
   how about `interactive` instead of `ib`. This leads to a more readable `pip 
install apache_beam[interactive]` instead of `pip install apache_beam[ib]`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330824)
Time Spent: 6h  (was: 5h 50m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8146) SchemaCoder/RowCoder have no equals() function

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8146?focusedWorklogId=330819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330819
 ]

ASF GitHub Bot logged work on BEAM-8146:


Author: ASF GitHub Bot
Created on: 18/Oct/19 23:36
Start Date: 18/Oct/19 23:36
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9493: 
[BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and 
RowCoder
URL: https://github.com/apache/beam/pull/9493#issuecomment-544007362
 
 
   That's a good point, I don't think so. Initially I added the type descriptor 
as something we can verify as a proxy for `toRow`/`fromRow` but now that I've 
made the functions created by our `SchemaProvider` implementations comparable, 
that check is redundant. Do you think I should back out the type descriptor 
changes to make this simpler? It's mostly its own commit so it wouldn't be too 
hard.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330819)
Time Spent: 2h 50m  (was: 2h 40m)

> SchemaCoder/RowCoder have no equals() function
> --
>
> Key: BEAM-8146
> URL: https://issues.apache.org/jira/browse/BEAM-8146
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-core
>Affects Versions: 2.15.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> SchemaCoder has no equals function, so it can't be compared in tests, like 
> CloudComponentsTests$DefaultCoders, which is being re-enabled in 
> https://github.com/apache/beam/pull/9446



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8396) Default to LOOPBACK mode for local flink (spark, ...) runner.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8396?focusedWorklogId=330815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330815
 ]

ASF GitHub Bot logged work on BEAM-8396:


Author: ASF GitHub Bot
Created on: 18/Oct/19 23:10
Start Date: 18/Oct/19 23:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9833: [BEAM-8396] Default 
to LOOPBACK mode for local flink runner.
URL: https://github.com/apache/beam/pull/9833#issuecomment-543997995
 
 
   Thanks for the quick review!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330815)
Time Spent: 20m  (was: 10m)

> Default to LOOPBACK mode for local flink (spark, ...) runner.
> -
>
> Key: BEAM-8396
> URL: https://issues.apache.org/jira/browse/BEAM-8396
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As well as being lower overhead, this will avoid surprises about workers 
> operating within the docker filesystem, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8436) Interactive runner incompatible with experiments=beam_fn_api

2019-10-18 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-8436:
--
Description: 
When this is enabled one gets


{code}
ERROR: test_wordcount 
(apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest)
--
Traceback (most recent call last):
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py",
 line 85, in test_wordcount
result = p.run()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 406, in run
self._options).run(False)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 419, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py",
 line 136, in run_pipeline
self._desired_cache_labels)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 73, in __init__
self._analyze_pipeline()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 93, in _analyze_pipeline
desired_pcollections = self._desired_pcollections(self._pipeline_info)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 313, in _desired_pcollections
cache_label = pipeline_info.cache_label(pcoll_id)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 397, in cache_label
return self._derivation(pcoll_id).cache_label()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
...
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
  File 
"/Users/robertwb/Work/beam/venv3/bin/../lib/python3.6/_collections_abc.py", 
line 678, in items
return ItemsView(self)
RecursionError: maximum recursion depth exceeded while calling a Python object
{code}


  was:
When this is enabled one gets

{{ERROR: test_wordcount 
(apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest)
--
Traceback (most recent call last):
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py",
 line 85, in test_wordcount
result = p.run()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 406, in run
self._options).run(False)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 419, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py",
 line 136, in run_pipeline
self._desired_cache_labels)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 73, in __init__
self._analyze_pipeline()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 93, in _analyze_pipeline
desired_pcollections = self._desired_pcollections(self._pipeline_info)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 313, in _desired_pcollections
cache_label = pipeline_info.cache_label(pcoll_id)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 397, in cache_label
return self._derivation(pcoll_id).cache_label()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
...
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
  File 
"/Users/robertwb/Work/beam/venv3/bin/../lib/python3.6/_collections_abc.py", 
line 678, in items
return ItemsView(self)
RecursionError: maximum recursion depth exceeded while calling a Python object
}}


> Interactive runner incompatible with experiments=beam_fn_api
> 
>
> 

[jira] [Updated] (BEAM-8436) Interactive runner incompatible with experiments=beam_fn_api

2019-10-18 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw updated BEAM-8436:
--
Description: 
When this is enabled one gets

{{ERROR: test_wordcount 
(apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest)
--
Traceback (most recent call last):
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py",
 line 85, in test_wordcount
result = p.run()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 406, in run
self._options).run(False)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 419, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py",
 line 136, in run_pipeline
self._desired_cache_labels)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 73, in __init__
self._analyze_pipeline()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 93, in _analyze_pipeline
desired_pcollections = self._desired_pcollections(self._pipeline_info)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 313, in _desired_pcollections
cache_label = pipeline_info.cache_label(pcoll_id)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 397, in cache_label
return self._derivation(pcoll_id).cache_label()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
...
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
  File 
"/Users/robertwb/Work/beam/venv3/bin/../lib/python3.6/_collections_abc.py", 
line 678, in items
return ItemsView(self)
RecursionError: maximum recursion depth exceeded while calling a Python object
}}

  was:
When this is enabled one gets

{{
ERROR: test_wordcount 
(apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest)
--
Traceback (most recent call last):
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py",
 line 85, in test_wordcount
result = p.run()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 406, in run
self._options).run(False)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 419, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py",
 line 136, in run_pipeline
self._desired_cache_labels)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 73, in __init__
self._analyze_pipeline()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 93, in _analyze_pipeline
desired_pcollections = self._desired_pcollections(self._pipeline_info)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 313, in _desired_pcollections
cache_label = pipeline_info.cache_label(pcoll_id)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 397, in cache_label
return self._derivation(pcoll_id).cache_label()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
...
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
  File 
"/Users/robertwb/Work/beam/venv3/bin/../lib/python3.6/_collections_abc.py", 
line 678, in items
return ItemsView(self)
RecursionError: maximum recursion depth exceeded while calling a Python object
}}


> Interactive runner incompatible with experiments=beam_fn_api
> 
>
> Key: 

[jira] [Created] (BEAM-8436) Interactive runner incompatible with experiments=beam_fn_api

2019-10-18 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8436:
-

 Summary: Interactive runner incompatible with 
experiments=beam_fn_api
 Key: BEAM-8436
 URL: https://issues.apache.org/jira/browse/BEAM-8436
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw


When this is enabled one gets

{{
ERROR: test_wordcount 
(apache_beam.runners.interactive.interactive_runner_test.InteractiveRunnerTest)
--
Traceback (most recent call last):
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner_test.py",
 line 85, in test_wordcount
result = p.run()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 406, in run
self._options).run(False)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/pipeline.py", 
line 419, in run
return self.runner.run_pipeline(self, self._options)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/interactive_runner.py",
 line 136, in run_pipeline
self._desired_cache_labels)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 73, in __init__
self._analyze_pipeline()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 93, in _analyze_pipeline
desired_pcollections = self._desired_pcollections(self._pipeline_info)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 313, in _desired_pcollections
cache_label = pipeline_info.cache_label(pcoll_id)
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 397, in cache_label
return self._derivation(pcoll_id).cache_label()
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
...
  File 
"/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/runners/interactive/pipeline_analyzer.py",
 line 405, in _derivation
for input_tag, input_id in transform_proto.inputs.items()
  File 
"/Users/robertwb/Work/beam/venv3/bin/../lib/python3.6/_collections_abc.py", 
line 678, in items
return ItemsView(self)
RecursionError: maximum recursion depth exceeded while calling a Python object
}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330814=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330814
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 22:53
Start Date: 18/Oct/19 22:53
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9831: 
[BEAM-8433] DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330814)
Time Spent: 1h 20m  (was: 1h 10m)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8415) Improve error message when adding a PTransform with a name that already exists in the pipeline

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8415?focusedWorklogId=330813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330813
 ]

ASF GitHub Bot logged work on BEAM-8415:


Author: ASF GitHub Bot
Created on: 18/Oct/19 22:47
Start Date: 18/Oct/19 22:47
Worklog Time Spent: 10m 
  Work Description: davidyan74 commented on issue #9812: [BEAM-8415] 
Improving error message when applying PTransform with a n…
URL: https://github.com/apache/beam/pull/9812#issuecomment-543990209
 
 
   Fixed. Thanks Robert!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330813)
Time Spent: 1h 20m  (was: 1h 10m)

> Improve error message when adding a PTransform with a name that already 
> exists in the pipeline
> --
>
> Key: BEAM-8415
> URL: https://issues.apache.org/jira/browse/BEAM-8415
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: David Yan
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, when trying to apply a PTransform with a name that already exists 
> in the pipeline, it returns a confusing error:
> Transform "XXX" does not have a stable unique label. This will prevent 
> updating of pipelines. To apply a transform with a specified label write 
> pvalue | "label" >> transform
> We'd like to improve this error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8396) Default to LOOPBACK mode for local flink (spark, ...) runner.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8396?focusedWorklogId=330808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330808
 ]

ASF GitHub Bot logged work on BEAM-8396:


Author: ASF GitHub Bot
Created on: 18/Oct/19 22:12
Start Date: 18/Oct/19 22:12
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9833: [BEAM-8396] 
Default to LOOPBACK mode for local flink runner.
URL: https://github.com/apache/beam/pull/9833
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build

[jira] [Assigned] (BEAM-8396) Default to LOOPBACK mode for local flink (spark, ...) runner.

2019-10-18 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw reassigned BEAM-8396:
-

Assignee: Robert Bradshaw

> Default to LOOPBACK mode for local flink (spark, ...) runner.
> -
>
> Key: BEAM-8396
> URL: https://issues.apache.org/jira/browse/BEAM-8396
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>
> As well as being lower overhead, this will avoid surprises about workers 
> operating within the docker filesystem, etc. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=330799=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330799
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:57
Start Date: 18/Oct/19 21:57
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9803: [BEAM-8372] 
Follow-up to Flink UberJar submission.
URL: https://github.com/apache/beam/pull/9803#discussion_r336690127
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -32,18 +32,18 @@
 
 class FlinkRunner(portable_runner.PortableRunner):
   def default_job_server(self, options):
-flink_master_url = options.view_as(FlinkRunnerOptions).flink_master_url
-if flink_master_url == '[local]' or sys.version_info < (3, 6):
+flink_master = options.view_as(FlinkRunnerOptions).flink_master
 
 Review comment:
   Thanks, let me know when you have a PR. I'd like to get this in by 2.17 so 
we can advertise this as a release thats really easy to use with Flink. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330799)
Time Spent: 5h 10m  (was: 5h)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8418?focusedWorklogId=330793=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330793
 ]

ASF GitHub Bot logged work on BEAM-8418:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:45
Start Date: 18/Oct/19 21:45
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9822: [BEAM-8418] Use 
base64 string for representing impulse payload in DF runner legacy codepath.
URL: https://github.com/apache/beam/pull/9822#issuecomment-543410624
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330793)
Time Spent: 2h 20m  (was: 2h 10m)

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8418?focusedWorklogId=330792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330792
 ]

ASF GitHub Bot logged work on BEAM-8418:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:45
Start Date: 18/Oct/19 21:45
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9822: [BEAM-8418] Use 
base64 string for representing impulse payload in DF runner legacy codepath.
URL: https://github.com/apache/beam/pull/9822#issuecomment-543440776
 
 
   Run Python 3.6 PostCommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330792)
Time Spent: 2h 10m  (was: 2h)

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8418?focusedWorklogId=330791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330791
 ]

ASF GitHub Bot logged work on BEAM-8418:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:44
Start Date: 18/Oct/19 21:44
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9822: [BEAM-8418] Use 
base64 string for representing impulse payload in DF runner legacy codepath.
URL: https://github.com/apache/beam/pull/9822#issuecomment-543477168
 
 
   Run Python 3.6 PostCommit
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330791)
Time Spent: 2h  (was: 1h 50m)

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8418) Fix handling of Impulse transform in Dataflow runner.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8418?focusedWorklogId=330790=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330790
 ]

ASF GitHub Bot logged work on BEAM-8418:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:44
Start Date: 18/Oct/19 21:44
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9822: [BEAM-8418] Use 
base64 string for representing impulse payload in DF runner legacy codepath.
URL: https://github.com/apache/beam/pull/9822#issuecomment-543507060
 
 
   Run Python 3.6 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330790)
Time Spent: 1h 50m  (was: 1h 40m)

> Fix handling of Impulse transform in Dataflow runner. 
> --
>
> Key: BEAM-8418
> URL: https://issues.apache.org/jira/browse/BEAM-8418
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Following pipeline fails on Dataflow runner unless we use beam_fn_api 
> experiment.
> {noformat}
> class NoOpDoFn(beam.DoFn):
>   def process(self, element):
> return element
> p = beam.Pipeline(options=pipeline_options)
> _ = p | beam.Impulse() | beam.ParDo(NoOpDoFn())
> result = p.run()
> {noformat}
> The reason is that we encode Impluse payload using url-escaping in [1], while 
> Dataflow runner expects base64 encoding in non-fnapi mode. In FnApi mode, DF 
> runner expects URL escaping.
> We should fix or reconcile the encoding in non-FnAPI path, and add a 
> ValidatesRunner test that catches this error.   
> [1] 
> https://github.com/apache/beam/blob/12d07745835e1b9c1e824b83beeeadf63ab4b234/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py#L633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8434) Allow trigger transcript tests to be run as ValidatesRunner tests.

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8434?focusedWorklogId=330780=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330780
 ]

ASF GitHub Bot logged work on BEAM-8434:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:31
Start Date: 18/Oct/19 21:31
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9832: [BEAM-8434] 
Translate trigger transcripts into validates runner tests.
URL: https://github.com/apache/beam/pull/9832
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | 

[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330778
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 21:25
Start Date: 18/Oct/19 21:25
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831#issuecomment-543926793
 
 
   I tried `Junit Parameterized Test` idea. It turned out that that framework 
add `[` `]` to test name, and our testing infra use test name for temporarily 
BQ table name.
   
   The problem is I believe BQ table name does not allow `[` and `]`. That's 
why I ended with the implementation in this PR. 
   
   The future of duplicating as many as tests for both dialects will require 
take care of such details.
   
   
   A sample of error mesage:
   ```
   {
 "code" : 400,
 "errors" : [ {
   "domain" : "global",
   "message" : "Invalid table ID 
\"DataCatalogBigQueryIT_testReadWrite[org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner]_2019_10_18_20_10_21_376_5037052097078144632\".",
   "reason" : "invalid"
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330778)
Time Spent: 1h 10m  (was: 1h)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8435) Allow access to PaneInfo from Python DoFns

2019-10-18 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8435:
-

 Summary: Allow access to PaneInfo from Python DoFns
 Key: BEAM-8435
 URL: https://issues.apache.org/jira/browse/BEAM-8435
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Robert Bradshaw


PaneInfoParam exists, but the plumbing to actually populate it at runtime was 
never added. (Nor, clearly, were any tests...)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8434) Allow trigger transcript tests to be run as ValidatesRunner tests.

2019-10-18 Thread Robert Bradshaw (Jira)
Robert Bradshaw created BEAM-8434:
-

 Summary: Allow trigger transcript tests to be run as 
ValidatesRunner tests. 
 Key: BEAM-8434
 URL: https://issues.apache.org/jira/browse/BEAM-8434
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Robert Bradshaw
Assignee: Robert Bradshaw






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330767=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330767
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:41
Start Date: 18/Oct/19 20:41
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831#issuecomment-543934812
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330767)
Time Spent: 1h  (was: 50m)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330758=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330758
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:31
Start Date: 18/Oct/19 20:31
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831#issuecomment-543926793
 
 
   I tried `Junit Parameterized Test` idea. It turned out that that framework 
add `[` `]` to test name, and our testing infra use test name for temporarily 
BQ table name.
   
   The problem is I believe BQ table name does not allow `[` and `]`. That's 
why I ended with the implementation in this PR. 
   
   The future of duplicates as many as tests for both dialects will require 
take care of such details.
   
   
   A sample of error mesage:
   ```
   {
 "code" : 400,
 "errors" : [ {
   "domain" : "global",
   "message" : "Invalid table ID 
\"DataCatalogBigQueryIT_testReadWrite[org.apache.beam.sdk.extensions.sql.impl.CalciteQueryPlanner]_2019_10_18_20_10_21_376_5037052097078144632\".",
   "reason" : "invalid"
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330758)
Time Spent: 50m  (was: 40m)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330750
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:25
Start Date: 18/Oct/19 20:25
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831#issuecomment-543926793
 
 
   I tried `Junit Parameterized Test` idea. It turned out that that framework 
add `[` `]` to test name, and our testing infra use test name for temporarily 
BQ table name.
   
   The problem is I believe BQ table name does not allow `[` and `]`. That's 
why I ended with the implementation in this PR. 
   
   The future of duplicates as many as tests for both dialects will require 
take care of such details.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330750)
Time Spent: 0.5h  (was: 20m)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330752
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:25
Start Date: 18/Oct/19 20:25
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831#issuecomment-543926877
 
 
   R: @kennknowles 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330752)
Time Spent: 40m  (was: 0.5h)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330747
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:23
Start Date: 18/Oct/19 20:23
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831#issuecomment-543925505
 
 
   Run SQL Postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330747)
Time Spent: 20m  (was: 10m)

> DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects
> 
>
> Key: BEAM-8433
> URL: https://issues.apache.org/jira/browse/BEAM-8433
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8433?focusedWorklogId=330746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330746
 ]

ASF GitHub Bot logged work on BEAM-8433:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:22
Start Date: 18/Oct/19 20:22
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9831: [BEAM-8433] 
DataCatalogBigQueryIT runs for both Calcite and ZetaSQL d…
URL: https://github.com/apache/beam/pull/9831
 
 
   …ialects.
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8433) DataCatalogBigQueryIT runs for both Calcite and ZetaSQL dialects

2019-10-18 Thread Rui Wang (Jira)
Rui Wang created BEAM-8433:
--

 Summary: DataCatalogBigQueryIT runs for both Calcite and ZetaSQL 
dialects
 Key: BEAM-8433
 URL: https://issues.apache.org/jira/browse/BEAM-8433
 Project: Beam
  Issue Type: Improvement
  Components: dsl-sql
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?focusedWorklogId=330740=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330740
 ]

ASF GitHub Bot logged work on BEAM-8424:


Author: ASF GitHub Bot
Created on: 18/Oct/19 20:08
Start Date: 18/Oct/19 20:08
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on issue #9819: [BEAM-8424] Prevent 
Java 11 VR tests from timeouting
URL: https://github.com/apache/beam/pull/9819#issuecomment-543918486
 
 
   Sounds good. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330740)
Time Spent: 1h  (was: 50m)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330700=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330700
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 18:26
Start Date: 18/Oct/19 18:26
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336621115
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import timeloop
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(ValueError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_pcoll_visualization_generate_unique_display_id(self):
+pv_1 = pv.PCollVisualization(self._pcoll)
+pv_2 = pv.PCollVisualization(self._pcoll)
+self.assertNotEqual(pv_1._dive_display_id, pv_2._dive_display_id)
+self.assertNotEqual(pv_1._overview_display_id, pv_2._overview_display_id)
+self.assertNotEqual(pv_1._df_display_id, pv_2._df_display_id)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', lambda x: [1, 2, 3])
+  def test_one_shot_visualization_not_return_handle(self):
+self.assertIsNone(pv.visualize(self._pcoll))
+
+  def _mock_to_element_list(self):
+yield [1, 2, 3]
+yield [1, 2, 3, 4]
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  def test_dynamical_plotting_return_handle(self):
+h = pv.visualize(self._pcoll, dynamical_plotting_interval=1)
+self.assertIsInstance(h, timeloop.Timeloop)
+h.stop()
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization.display_facets')
+  def test_dynamical_plotting_update_same_display(self,
+  mocked_display_facets):
+# Starts async dynamical plotting.
+h = pv.visualize(self._pcoll, dynamical_plotting_interval=0.001)
+# Blocking so the above async task can execute a few iterations.
+time.sleep(0.1)
 
 Review comment:
   Using while loop querying for an ending condition to block then. Also added 
a timeout to ensure the while loop to never exceed 0.1 

[jira] [Work logged] (BEAM-8428) [SQL] BigQuery should support project push-down in DIRECT_READ mode

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8428?focusedWorklogId=330694=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330694
 ]

ASF GitHub Bot logged work on BEAM-8428:


Author: ASF GitHub Bot
Created on: 18/Oct/19 18:03
Start Date: 18/Oct/19 18:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9823: [BEAM-8428] [SQL] 
Add project push-down for BigQuery
URL: https://github.com/apache/beam/pull/9823#issuecomment-543864425
 
 
   Run Direct Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330694)
Time Spent: 50m  (was: 40m)

> [SQL] BigQuery should support project push-down in DIRECT_READ mode
> ---
>
> Key: BEAM-8428
> URL: https://issues.apache.org/jira/browse/BEAM-8428
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> BigQuery should perform project push-down for read pipelines when applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8428) [SQL] BigQuery should support project push-down in DIRECT_READ mode

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8428?focusedWorklogId=330695=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330695
 ]

ASF GitHub Bot logged work on BEAM-8428:


Author: ASF GitHub Bot
Created on: 18/Oct/19 18:03
Start Date: 18/Oct/19 18:03
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on issue #9823: [BEAM-8428] [SQL] 
Add project push-down for BigQuery
URL: https://github.com/apache/beam/pull/9823#issuecomment-543864425
 
 
   Run Direct Runner Nexmark Tests
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330695)
Time Spent: 1h  (was: 50m)

> [SQL] BigQuery should support project push-down in DIRECT_READ mode
> ---
>
> Key: BEAM-8428
> URL: https://issues.apache.org/jira/browse/BEAM-8428
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> BigQuery should perform project push-down for read pipelines when applicable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=330691=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330691
 ]

ASF GitHub Bot logged work on BEAM-4132:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:55
Start Date: 18/Oct/19 17:55
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9810: [BEAM-4132] 
Support multi-output type inference
URL: https://github.com/apache/beam/pull/9810#discussion_r336608361
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -571,6 +573,10 @@ def _infer_result_type(self, transform, inputs, 
result_pcollection):
   else:
 result_pcollection.element_type = transform.infer_output_type(
 input_element_type)
+elif isinstance(result_pcollection, pvalue.DoOutputsTuple):
+  # Single-input, multi-output inference.
+  for pcoll in result_pcollection:
+self._infer_result_type(transform, inputs, pcoll)
 
 Review comment:
   This indeed does set the same output type for each pcoll. Maybe there's a 
way to infer more deeply. I can look at this later if this would be of value 
(I'm thinking that we should come up with a way to specify multi-output type 
hints).
   Currently this is better that leaving pcoll.element_type as None, which 
fails type checking.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330691)
Time Spent: 1h 10m  (was: 1h)

> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8405) Python: Datastore: add support for embedded entities

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8405?focusedWorklogId=330686=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330686
 ]

ASF GitHub Bot logged work on BEAM-8405:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:49
Start Date: 18/Oct/19 17:49
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9805: 
[BEAM-8405] Support embedded Datastore entities
URL: https://github.com/apache/beam/pull/9805#discussion_r336605472
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/datastore/v1new/types_test.py
 ##
 @@ -47,30 +49,52 @@ def setUp(self):
 # Don't do any network requests.
 _http=mock.MagicMock())
 
+  def _assert_keys_equal(self, beam_type, client_type, expected_project):
+self.assertEqual(beam_type.path_elements[0], client_type.kind)
+self.assertEqual(beam_type.path_elements[1], client_type.id)
+self.assertEqual(expected_project, client_type.project)
+
   def testEntityToClientEntity(self):
+# Test conversion from Beam type to client type.
 k = Key(['kind', 1234], project=self._PROJECT)
 kc = k.to_client_key()
-exclude_from_indexes = ('efi1', 'efi2')
+exclude_from_indexes = ('datetime', 'key')
 e = Entity(k, exclude_from_indexes=exclude_from_indexes)
-ref = Key(['kind2', 1235])
-e.set_properties({'efi1': 'value', 'property': 'value', 'ref': ref})
+properties = {
+  'datetime': datetime.datetime.utcnow(),
+  'key_ref': Key(['kind2', 1235]),
+  'bool': True,
+  'float': 1.21,
+  'int': 1337,
+  'unicode': 'text',
+  'bytes': b'bytes',
+  'geopoint': GeoPoint(0.123, 0.456),
+  'none': None,
+  'list': [1, 2, 3],
+  'entity': Entity(Key(['kind', 111])),
+  'dict': {'property': 5},
+}
+e.set_properties(properties)
 ec = e.to_client_entity()
 self.assertEqual(kc, ec.key)
 self.assertSetEqual(set(exclude_from_indexes), ec.exclude_from_indexes)
 self.assertEqual('kind', ec.kind)
 self.assertEqual(1234, ec.id)
-self.assertEqual('kind2', ec['ref'].kind)
-self.assertEqual(1235, ec['ref'].id)
-self.assertEqual(self._PROJECT, ec['ref'].project)
-
-  def testEntityFromClientEntity(self):
 
 Review comment:
   We don't need this test anymore ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330686)
Time Spent: 0.5h  (was: 20m)

> Python: Datastore: add support for embedded entities 
> -
>
> Key: BEAM-8405
> URL: https://issues.apache.org/jira/browse/BEAM-8405
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The conversion methods to/from the client entity type should be updated to 
> support an embedded Entity.
> https://github.com/apache/beam/blob/603d68aafe9bdcd124d28ad62ad36af01e7a7403/sdks/python/apache_beam/io/gcp/datastore/v1new/types.py#L216-L240



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8406) TextTable support JSON format

2019-10-18 Thread Kirill Kozlov (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954862#comment-16954862
 ] 

Kirill Kozlov commented on BEAM-8406:
-

I think utilizing JsonToRow.withSchema(schema)  transform may be useful here.

> TextTable support JSON format
> -
>
> Key: BEAM-8406
> URL: https://issues.apache.org/jira/browse/BEAM-8406
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Rui Wang
>Priority: Major
>
> Have a JSON table implementation similar to [1].
> [1]: 
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/text/TextTable.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=330684=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330684
 ]

ASF GitHub Bot logged work on BEAM-4132:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:44
Start Date: 18/Oct/19 17:44
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9810: [BEAM-4132] 
Support multi-output type inference
URL: https://github.com/apache/beam/pull/9810#discussion_r336603704
 
 

 ##
 File path: sdks/python/apache_beam/typehints/typed_pipeline_test.py
 ##
 @@ -131,6 +131,41 @@ def filter_fn(data):
 
 self.assertEqual([1, 3], [1, 2, 3] | beam.Filter(filter_fn))
 
+  def test_partition(self):
+p = TestPipeline()
+even, odd = (p
+ | beam.Create([1, 2, 3])
+ | 'even_odd' >> beam.Partition(lambda e, _: e % 2, 2))
+# Test that the element type of even and odd is int.
+res_even = (even
+| 'id_even' >> beam.ParDo(lambda e: [e]).with_input_types(int))
+res_odd = (odd
+   | 'id_odd' >> beam.ParDo(lambda e: [e]).with_input_types(int))
+assert_that(res_even, equal_to([2]), label='even_check')
+assert_that(res_odd, equal_to([1, 3]), label='odd_check')
+p.run()
+
+  def test_typed_dofn_multi_output(self):
+class MyDoFn(beam.DoFn):
+  def process(self, element):
+if element % 2:
+  yield beam.pvalue.TaggedOutput('odd', element)
+else:
+  yield beam.pvalue.TaggedOutput('even', element)
+
+p = TestPipeline()
+res = (p
+   | beam.Create([1, 2, 3])
+   | beam.ParDo(MyDoFn()).with_outputs('odd', 'even'))
+# Test that the element type of even and odd is int.
 
 Review comment:
   More precisely it should read that we're testing it's not None and 
consistent with int.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330684)
Time Spent: 1h  (was: 50m)

> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7732) Allow access to SpannerOptions in Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7732?focusedWorklogId=330669=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330669
 ]

ASF GitHub Bot logged work on BEAM-7732:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:24
Start Date: 18/Oct/19 17:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9048: [BEAM-7732] 
Enable setting custom SpannerOptions.
URL: https://github.com/apache/beam/pull/9048#issuecomment-543847351
 
 
   Can we get additional options as KVs and generate SpannerOptions from that ? 
I would like not to leak client library classes through Beam SpannerIO API if 
possible.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330669)
Time Spent: 3.5h  (was: 3h 20m)

> Allow access to SpannerOptions in Beam
> --
>
> Key: BEAM-7732
> URL: https://issues.apache.org/jira/browse/BEAM-7732
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.12.0, 2.13.0
>Reporter: Niel Markwick
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Beam hides the 
> [SpannerOptions|https://github.com/googleapis/google-cloud-java/blob/master/google-cloud-clients/google-cloud-spanner/src/main/java/com/google/cloud/spanner/SpannerOptions.java]
>  object behind a 
> [SpannerConfig|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/spanner/SpannerConfig.java]
>  object because the SpannerOptions object is not serializable. 
> This means that the only options that can be set are those that can be 
> specified in SpannerConfig - limited to host, project, instance, database.
> Suggestion: add the possibility to set a SpannerOptionsFactory in 
> SpannerConfig:
> {code:java}
> public interface SpannerOptionsFactory extends Serializable {
>    public SpannerOptions create();
> }
> {code}
> This would allow the user use this factory class to specify custom 
> SpannerOptions before they are passed onto the connectToSpanner() method; 
> connectToSpanner() would then become: 
> {code:java}
> public SpannerAccessor connectToSpanner() {
>   
>   SpannerOptions.Builder builder = spannerOptionsFactory.create().toBuilder();
>   // rest of connectToSpanner follows, setting project, host, etc.
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330668=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330668
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:23
Start Date: 18/Oct/19 17:23
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336595299
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
 
 Review comment:
   It automatically checks the running job's pipeline result and stop itself 
when the job is in an terminated state.
   The end user could also assign the returned handle from visualize() to a 
variable, say `handle = visualize(pcoll)` and call `handle.stop()` to 
explicitly manually stop the visualization from anywhere in the notebook.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330668)
Time Spent: 5.5h  (was: 5h 20m)

> Visualize PCollection with Interactive 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330670=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330670
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:25
Start Date: 18/Oct/19 17:25
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336592169
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -105,3 +113,33 @@ def set_cache_manager(self, cache_manager):
   def cache_manager(self):
 """Gets the cache manager held by current Interactive Environment."""
 return self._cache_manager
+
+  def set_pipeline_result(self, pipeline, result):
 
 Review comment:
   It's not used yet. It should be invoked by the InteractiveRunner once we 
have all the building blocks ready.
   What we are planning to do is to only allow one job for one user pipeline 
running at the same time within a runner instance.
   So if the user re-executes p.run() for an async running job.  The runner 
should cancel the running job using current tracked pipeline_result and start a 
new one, then track the new  pipeline_result.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330670)
Time Spent: 5h 40m  (was: 5.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8430) sdk_worker_parallelism default is inconsistent between Py and Java SDKs

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8430?focusedWorklogId=330666=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330666
 ]

ASF GitHub Bot logged work on BEAM-8430:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:21
Start Date: 18/Oct/19 17:21
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9829: [BEAM-8430] Change py 
default sdk_worker_parallelism to 1
URL: https://github.com/apache/beam/pull/9829#issuecomment-543845796
 
 
   The mismatch actually goes back to 
https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153
 and was masked by the job server option that got removed now.
   
   Please also add a comment to `PortableOptions` that they need to remain in 
sync with the Java version.
   
   Longer term it would be nice to define such options in proto.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330666)
Time Spent: 0.5h  (was: 20m)

> sdk_worker_parallelism default is inconsistent between Py and Java SDKs
> ---
>
> Key: BEAM-8430
> URL: https://issues.apache.org/jira/browse/BEAM-8430
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Default is currently 1 in Java, 0 in Python.
> https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L73
> https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/python/apache_beam/options/pipeline_options.py#L848



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330665=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330665
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:21
Start Date: 18/Oct/19 17:21
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336594249
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
+  """Visualizes the data of a given PCollection. Optionally enables dynamical
+  plotting with interval in seconds if the PCollection is being produced by a
+  running pipeline or the pipeline is streaming indefinitely. The function
+  always returns immediately and is asynchronous when dynamical plotting is on.
+
+  If dynamical plotting enabled, the visualization is updated continuously 
until
+  the pipeline producing the PCollection is in an end state. The visualization
+  would be anchored to the notebook cell output area. The function
+  asynchronously returns a handle to the visualization job immediately. The 
user
+  could manually do::
+
+# In one notebook cell, enable dynamical plotting every 1 second:
+handle = visualize(pcoll, dynamical_plotting_interval=1)
+# Visualization anchored to the cell's 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330663=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330663
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:18
Start Date: 18/Oct/19 17:18
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336593305
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
+  """Visualizes the data of a given PCollection. Optionally enables dynamical
+  plotting with interval in seconds if the PCollection is being produced by a
+  running pipeline or the pipeline is streaming indefinitely. The function
+  always returns immediately and is asynchronous when dynamical plotting is on.
+
+  If dynamical plotting enabled, the visualization is updated continuously 
until
+  the pipeline producing the PCollection is in an end state. The visualization
+  would be anchored to the notebook cell output area. The function
+  asynchronously returns a handle to the visualization job immediately. The 
user
+  could manually do::
+
+# In one notebook cell, enable dynamical plotting every 1 second:
+handle = visualize(pcoll, dynamical_plotting_interval=1)
+# Visualization anchored to the cell's 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330661=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330661
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:17
Start Date: 18/Oct/19 17:17
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336592842
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import timeloop
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(ValueError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
 
 Review comment:
   Yes, you can do it by putting skipIf at the class level.
   It's just one of the tests is depending on (3,6,3) (where assert_called is 
introduced in unittest). So not all tests are depending on the same version. 
Decorating at function level gives more flexibility.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330661)
Time Spent: 4h 50m  (was: 4h 40m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330660=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330660
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:15
Start Date: 18/Oct/19 17:15
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336592169
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -105,3 +113,33 @@ def set_cache_manager(self, cache_manager):
   def cache_manager(self):
 """Gets the cache manager held by current Interactive Environment."""
 return self._cache_manager
+
+  def set_pipeline_result(self, pipeline, result):
 
 Review comment:
   It's not used yet. It should be invoked by the InteractiveRunner once we 
have all the building blocks ready.
   What we are planning to do is to only allow one job for one user pipeline 
running at the same time within a runner instance.
   So if the user re-executes p.run() for an async running job.  The runner 
should cancel the running job using current tracked pipeline_result and starts 
a new one, then track the new  pipeline_result.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330660)
Time Spent: 4h 40m  (was: 4.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330658=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330658
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:13
Start Date: 18/Oct/19 17:13
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336591010
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -107,13 +107,17 @@ def get_version():
 'crcmod>=1.7,<2.0',
 # Dill doesn't guarantee comatibility between releases within minor 
version.
 'dill>=0.3.0,<0.3.1',
+'facets-overview>=1.0.0,<2',
 
 Review comment:
   Thanks for the suggestions!
   Yes, I feel the same way but didn't know what to do about it. Will make the 
change.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330658)
Time Spent: 4.5h  (was: 4h 20m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330653
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:04
Start Date: 18/Oct/19 17:04
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336587840
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import timeloop
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(ValueError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_pcoll_visualization_generate_unique_display_id(self):
+pv_1 = pv.PCollVisualization(self._pcoll)
+pv_2 = pv.PCollVisualization(self._pcoll)
+self.assertNotEqual(pv_1._dive_display_id, pv_2._dive_display_id)
+self.assertNotEqual(pv_1._overview_display_id, pv_2._overview_display_id)
+self.assertNotEqual(pv_1._df_display_id, pv_2._df_display_id)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', lambda x: [1, 2, 3])
+  def test_one_shot_visualization_not_return_handle(self):
+self.assertIsNone(pv.visualize(self._pcoll))
+
+  def _mock_to_element_list(self):
+yield [1, 2, 3]
+yield [1, 2, 3, 4]
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  def test_dynamical_plotting_return_handle(self):
+h = pv.visualize(self._pcoll, dynamical_plotting_interval=1)
+self.assertIsInstance(h, timeloop.Timeloop)
+h.stop()
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization.display_facets')
+  def test_dynamical_plotting_update_same_display(self,
+  mocked_display_facets):
+# Starts async dynamical plotting.
+h = pv.visualize(self._pcoll, dynamical_plotting_interval=0.001)
+# Blocking so the above async task can execute a few iterations.
+time.sleep(0.1)
 
 Review comment:
   Please don't use time.sleep() in unit tests. Not only does it slow down the 
testing, but it also generally leads to flaky tests, 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330649=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330649
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:02
Start Date: 18/Oct/19 17:02
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336582418
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import timeloop
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(ValueError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
 
 Review comment:
   Is there a way to do this check only once? Maybe in the module's "if 
\_\_name\_\_ == "main""?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330649)
Time Spent: 4h 10m  (was: 4h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330648=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330648
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 17:01
Start Date: 18/Oct/19 17:01
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336586553
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
 
 Review comment:
   How does one stop visualizing a PCollection?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330648)
Time Spent: 4h  (was: 3h 50m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330647=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330647
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:59
Start Date: 18/Oct/19 16:59
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336586086
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
+  """Visualizes the data of a given PCollection. Optionally enables dynamical
+  plotting with interval in seconds if the PCollection is being produced by a
+  running pipeline or the pipeline is streaming indefinitely. The function
+  always returns immediately and is asynchronous when dynamical plotting is on.
+
+  If dynamical plotting enabled, the visualization is updated continuously 
until
+  the pipeline producing the PCollection is in an end state. The visualization
+  would be anchored to the notebook cell output area. The function
+  asynchronously returns a handle to the visualization job immediately. The 
user
+  could manually do::
+
+# In one notebook cell, enable dynamical plotting every 1 second:
+handle = visualize(pcoll, dynamical_plotting_interval=1)
+# Visualization anchored to the 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330646=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330646
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:57
Start Date: 18/Oct/19 16:57
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336585148
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
+  """Visualizes the data of a given PCollection. Optionally enables dynamical
+  plotting with interval in seconds if the PCollection is being produced by a
+  running pipeline or the pipeline is streaming indefinitely. The function
+  always returns immediately and is asynchronous when dynamical plotting is on.
+
+  If dynamical plotting enabled, the visualization is updated continuously 
until
+  the pipeline producing the PCollection is in an end state. The visualization
+  would be anchored to the notebook cell output area. The function
+  asynchronously returns a handle to the visualization job immediately. The 
user
+  could manually do::
+
+# In one notebook cell, enable dynamical plotting every 1 second:
+handle = visualize(pcoll, dynamical_plotting_interval=1)
+# Visualization anchored to the 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330645=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330645
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:54
Start Date: 18/Oct/19 16:54
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336583985
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
+  """Visualizes the data of a given PCollection. Optionally enables dynamical
+  plotting with interval in seconds if the PCollection is being produced by a
+  running pipeline or the pipeline is streaming indefinitely. The function
+  always returns immediately and is asynchronous when dynamical plotting is on.
+
+  If dynamical plotting enabled, the visualization is updated continuously 
until
+  the pipeline producing the PCollection is in an end state. The visualization
+  would be anchored to the notebook cell output area. The function
+  asynchronously returns a handle to the visualization job immediately. The 
user
+  could manually do::
+
+# In one notebook cell, enable dynamical plotting every 1 second:
+handle = visualize(pcoll, dynamical_plotting_interval=1)
+# Visualization anchored to the 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330642=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330642
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:52
Start Date: 18/Oct/19 16:52
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336583208
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,258 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+from IPython.core.display import HTML
+from IPython.core.display import Javascript
+from IPython.core.display import display
+from IPython.core.display import display_javascript
+from IPython.core.display import update_display
+from pandas.io.json import json_normalize
+from timeloop import Timeloop
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").data = {jsonstr};
+"""
+_OVERVIEW_SCRIPT_TEMPLATE = """
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+  """
+_OVERVIEW_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
+https://raw.githubusercontent.com/PAIR-code/facets/1.0.0/facets-dist/facets-jupyter.html;>
+
+
+  document.querySelector("#{display_id}").protoInput = 
"{protostr}";
+"""
+_DATAFRAME_PAGINATION_TEMPLATE = """
+https://ajax.googleapis.com/ajax/libs/jquery/2.2.2/jquery.min.js";>
 
+https://cdn.datatables.net/1.10.16/js/jquery.dataTables.js";> 
+https://cdn.datatables.net/1.10.16/css/jquery.dataTables.css;>
+{dataframe_html}
+
+  $("#{table_id}").DataTable();
+"""
+
+
+def visualize(pcoll, dynamical_plotting_interval=None):
+  """Visualizes the data of a given PCollection. Optionally enables dynamical
+  plotting with interval in seconds if the PCollection is being produced by a
+  running pipeline or the pipeline is streaming indefinitely. The function
+  always returns immediately and is asynchronous when dynamical plotting is on.
+
+  If dynamical plotting enabled, the visualization is updated continuously 
until
+  the pipeline producing the PCollection is in an end state. The visualization
+  would be anchored to the notebook cell output area. The function
+  asynchronously returns a handle to the visualization job immediately. The 
user
+  could manually do::
+
+# In one notebook cell, enable dynamical plotting every 1 second:
+handle = visualize(pcoll, dynamical_plotting_interval=1)
 
 Review comment:
   Please change 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330639=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330639
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:50
Start Date: 18/Oct/19 16:50
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336582418
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,133 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import timeloop
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(ValueError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
 
 Review comment:
   Is there a way to do this check only once? Maybe in the module's "if 
__name__ == "main""?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330639)
Time Spent: 3h 10m  (was: 3h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=330638=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330638
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:48
Start Date: 18/Oct/19 16:48
Worklog Time Spent: 10m 
  Work Description: rohdesamuel commented on pull request #9741: 
[BEAM-7926] Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r336581554
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/interactive_environment.py
 ##
 @@ -105,3 +113,33 @@ def set_cache_manager(self, cache_manager):
   def cache_manager(self):
 """Gets the cache manager held by current Interactive Environment."""
 return self._cache_manager
+
+  def set_pipeline_result(self, pipeline, result):
 
 Review comment:
   Why do we need this method? It looks like it's only being used in tests.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330638)
Time Spent: 3h  (was: 2h 50m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-4132) Element type inference doesn't work for multi-output DoFns

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4132?focusedWorklogId=330637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330637
 ]

ASF GitHub Bot logged work on BEAM-4132:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:46
Start Date: 18/Oct/19 16:46
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9810: [BEAM-4132] 
Support multi-output type inference
URL: https://github.com/apache/beam/pull/9810#discussion_r336580986
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -571,6 +573,10 @@ def _infer_result_type(self, transform, inputs, 
result_pcollection):
   else:
 result_pcollection.element_type = transform.infer_output_type(
 input_element_type)
+elif isinstance(result_pcollection, pvalue.DoOutputsTuple):
+  # Single-input, multi-output inference.
+  for pcoll in result_pcollection:
+self._infer_result_type(transform, inputs, pcoll)
 
 Review comment:
   I do not believe it does, I missed this while reviewing.
   
   I have a more fundamental question. Is it possible to infer types for 
multiple outputs without explicit typehints from the user? DoFn will choose 
their taggedoutput dynamically based on the element. And I am not sure how we 
can infer different output types only by knowing the input type and all list of 
possible return types from a function. In other words, is it possible to infer 
output types (without explicit hints) to be anything other than union of all 
possible output types from that transform?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330637)
Time Spent: 50m  (was: 40m)

> Element type inference doesn't work for multi-output DoFns
> --
>
> Key: BEAM-4132
> URL: https://issues.apache.org/jira/browse/BEAM-4132
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.4.0
>Reporter: Chuan Yu Foo
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> TLDR: if you have a multi-output DoFn, then the non-main PCollections with 
> incorrectly have their element types set to None. This affects type checking 
> for pipelines involving these PCollections.
> Minimal example:
> {code}
> import apache_beam as beam
> class TripleDoFn(beam.DoFn):
>   def process(self, elem):
> yield_elem
> if elem % 2 == 0:
>   yield beam.pvalue.TaggedOutput('ten_times', elem * 10)
> if elem % 3 == 0:
>   yield beam.pvalue.TaggedOutput('hundred_times', elem * 100)
>   
> @beam.typehints.with_input_types(int)
> @beam.typehints.with_output_types(int)
> class MultiplyBy(beam.DoFn):
>   def __init__(self, multiplier):
> self._multiplier = multiplier
>   def process(self, elem):
> return elem * self._multiplier
>   
> def main():
>   with beam.Pipeline() as p:
> x, a, b = (
>   p
>   | 'Create' >> beam.Create([1, 2, 3])
>   | 'TripleDo' >> beam.ParDo(TripleDoFn()).with_outputs(
> 'ten_times', 'hundred_times', main='main_output'))
> _ = a | 'MultiplyBy2' >> beam.ParDo(MultiplyBy(2))
> if __name__ == '__main__':
>   main()
> {code}
> Running this yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got None for elem
> {noformat}
> Replacing {{a}} with {{b}} yields the same error. Replacing {{a}} with {{x}} 
> instead yields the following error:
> {noformat}
> apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 
> 'MultiplyBy2': requires  but got Union[TaggedOutput, int] for elem
> {noformat}
> I would expect Beam to correctly infer that {{a}} and {{b}} have element 
> types of {{int}} rather than {{None}}, and I would also expect Beam to 
> correctly figure out that the element types of {{x}} are compatible with 
> {{int}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8432?focusedWorklogId=330636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330636
 ]

ASF GitHub Bot logged work on BEAM-8432:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:45
Start Date: 18/Oct/19 16:45
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #9830: [BEAM-8432] 
Move javaVersion to gradle.properties
URL: https://github.com/apache/beam/pull/9830
 
 
   This is done in order to be able to change this parameter easily using the 
command line, eg:
   
   ./gradlew clean build -PjavaVersion=11
   
   It's needed for providing java 11 support.
   
   @kennknowles could you take a look?
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-10-18 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8432:
---

 Summary: Parametrize source & target compatibility for beam Java 
modules
 Key: BEAM-8432
 URL: https://issues.apache.org/jira/browse/BEAM-8432
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Lukasz Gajowy
Assignee: Lukasz Gajowy
 Fix For: Not applicable


Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
[JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].

For the sake of migrating the project to Java 11 we could use a mechanism that 
will allow parametrizing the version from the command line, e.g:
{code:java}
// this could set source and target compatibility to 11:

./gradlew clean build -PjavaVersion=11{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=330597=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330597
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 18/Oct/19 16:13
Start Date: 18/Oct/19 16:13
Worklog Time Spent: 10m 
  Work Description: bmv126 commented on issue #9567: [BEAM-5690] Fix Zero 
value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#issuecomment-543815179
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330597)
Time Spent: 3h  (was: 2h 50m)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8430) sdk_worker_parallelism default is inconsistent between Py and Java SDKs

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8430?focusedWorklogId=330589=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330589
 ]

ASF GitHub Bot logged work on BEAM-8430:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:55
Start Date: 18/Oct/19 15:55
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9829: [BEAM-8430] Change py 
default sdk_worker_parallelism to 1
URL: https://github.com/apache/beam/pull/9829#issuecomment-543808313
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330589)
Time Spent: 20m  (was: 10m)

> sdk_worker_parallelism default is inconsistent between Py and Java SDKs
> ---
>
> Key: BEAM-8430
> URL: https://issues.apache.org/jira/browse/BEAM-8430
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Default is currently 1 in Java, 0 in Python.
> https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L73
> https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/python/apache_beam/options/pipeline_options.py#L848



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8348) Portable Python job name hard-coded to "job"

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8348?focusedWorklogId=330581=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330581
 ]

ASF GitHub Bot logged work on BEAM-8348:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:47
Start Date: 18/Oct/19 15:47
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9789: [BEAM-8348] set 
job_name in portable_runner.py job request
URL: https://github.com/apache/beam/pull/9789#discussion_r336557297
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/portable_runner.py
 ##
 @@ -296,7 +297,8 @@ def add_runner_options(parser):
 
 prepare_response = job_service.Prepare(
 beam_job_api_pb2.PrepareJobRequest(
-job_name='job', pipeline=proto_pipeline,
+job_name=options.view_as(GoogleCloudOptions).job_name or 'job',
 
 Review comment:
   It's a good idea. Unfortunately, it looks like the override of `__settatr__` 
would take precedence over the property's setter. I'm not too familiar with how 
all these Python internals work, though, so I could be missing something 
obvious?
   
   
https://github.com/apache/beam/blob/16b4afa4469ee2d2fd5a5f4aafd77a156ee14e7b/sdks/python/apache_beam/options/pipeline_options.py#L336
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330581)
Time Spent: 2h 20m  (was: 2h 10m)

> Portable Python job name hard-coded to "job"
> 
>
> Key: BEAM-8348
> URL: https://issues.apache.org/jira/browse/BEAM-8348
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> See [1]. `job_name` is already taken by Google Cloud options [2], so I guess 
> we should create a new option (maybe `portable_job_name` to avoid disruption).
> [[1] 
> https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294|https://github.com/apache/beam/blob/55588e91ed8e3e25bb661a6202c31e99297e0e79/sdks/python/apache_beam/runners/portability/portable_runner.py#L294]
> [2] 
> [https://github.com/apache/beam/blob/c5bbb51014f7506a2651d6070f27fb3c3dc0da8f/sdks/python/apache_beam/options/pipeline_options.py#L438]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=330578=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330578
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:40
Start Date: 18/Oct/19 15:40
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: [BEAM-8382] 
Add polling interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-543801925
 
 
   @jfarr Regarding backoff strategy  - I think it's a good way in case if your 
backend doesn't provide any statistics in advance at which rate you have to 
perform next queries (we don't know about other consumers of the same shard). 
We know only maximum limits and the result of last `kinesis.getRecords()` 
execution. So, starting with recommended default value and use proper backoff 
strategy in case of failures it could be optimal solution when you need to 
limit rate of requests.  Otherwise, there are big chances that backend 
resources won't be utilised efficiently. `FluentBackoff` is used in many Beam 
IOs for similar reasons.
   
   I'd like to ask you questions about your use case. Let's suppose we can have 
ability set static polling interval with `withPollingInterval(Duration)` (as 
implemented in current PR). What would be default value in your case? 1 sec? 
What would you do in case if it won't be enough for already running pipeline? 
Would your need to change this value and restart pipeline manually?
   
   Btw, default delay of 1 sec looks quite pessimistic since according to AWS 
doc `each shard can support up to five read transactions per second.` So, in 
case of one consumer per shard (I can guess this is a quite common case) it 
should be 200 ms as default value. 
   
   PS: The goal of minimizing the number of tuning knobs is to reduce the 
number of possible config combinations (which grows exponentially) in case if 
it's possible to configure this automatically [1]. If it's not possible then 
the better solution will be to provide flexible API, especially for unbounded 
sources (as Kinesis, Kafka, etc), that are being used in long-running 
pipelines.  
   
   [1] https://beam.apache.org/contribute/ptransform-style-guide/#configuration
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330578)
Time Spent: 2h 10m  (was: 2h)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8382) Add polling interval to KinesisIO.Read

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8382?focusedWorklogId=330576=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330576
 ]

ASF GitHub Bot logged work on BEAM-8382:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:39
Start Date: 18/Oct/19 15:39
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9765: [BEAM-8382] 
Add polling interval to KinesisIO.Read
URL: https://github.com/apache/beam/pull/9765#issuecomment-543801925
 
 
   @jfarr Regarding backoff strategy  - I think it's a good way in case if your 
backend doesn't provide any statistics in advance at which rate you have to 
perform next queries. We know only maximum limits and the result of last 
`kinesis.getRecords()` execution. So, starting with recommended default value 
and use proper backoff strategy in case of failures it could be optimal 
solution when you need to limit rate of requests.  Otherwise, there are big 
chances that backend resources won't be utilised efficiently. `FluentBackoff` 
is used in many Beam IOs for similar reasons.
   
   I'd like to ask you questions about your use case. Let's suppose we can have 
ability set static polling interval with `withPollingInterval(Duration)` (as 
implemented in current PR). What would be default value in your case? 1 sec? 
What would you do in case if it won't be enough for already running pipeline? 
Would your need to change this value and restart pipeline manually?
   
   Btw, default delay of 1 sec looks quite pessimistic since according to AWS 
doc `each shard can support up to five read transactions per second.` So, in 
case of one consumer per shard (I can guess this is a quite common case) it 
should be 200 ms as default value. 
   
   PS: The goal of minimizing the number of tuning knobs is to reduce the 
number of possible config combinations (which grows exponentially) in case if 
it's possible to configure this automatically [1]. If it's not possible then 
the better solution will be to provide flexible API, especially for unbounded 
sources (as Kinesis, Kafka, etc), that are being used in long-running 
pipelines.  
   
   [1] https://beam.apache.org/contribute/ptransform-style-guide/#configuration
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330576)
Time Spent: 2h  (was: 1h 50m)

> Add polling interval to KinesisIO.Read
> --
>
> Key: BEAM-8382
> URL: https://issues.apache.org/jira/browse/BEAM-8382
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-kinesis
>Affects Versions: 2.13.0, 2.14.0, 2.15.0
>Reporter: Jonothan Farr
>Assignee: Jonothan Farr
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> With the current implementation we are observing Kinesis throttling due to 
> ReadProvisionedThroughputExceeded on the order of hundreds of times per 
> second, regardless of the actual Kinesis throughput. This is because the 
> ShardReadersPool readLoop() method is polling getRecords() as fast as 
> possible.
> From the KDS documentation:
> {quote}Each shard can support up to five read transactions per second.
> {quote}
> and
> {quote}For best results, sleep for at least 1 second (1,000 milliseconds) 
> between calls to getRecords to avoid exceeding the limit on getRecords 
> frequency.
> {quote}
> [https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html]
> [https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-sdk.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330565=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330565
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336525227
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception {
   }
 
   /**
-   * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll 
sketch is computed by
-   * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies 
that we can run {@link
-   * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam 
to get the correct
-   * estimated count.
+   * Test that non-empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
+   * verifies that we can run {@link HllCount.MergePartial} and {@link 
HllCount.Extract} on the
+   * sketch in Beam to get the correct estimated count.
+   */
+  @Test
+  public void testReadNonEmptySketchFromBigQuery() {
+readSketchFromBigQuery(DATA_TABLE_ID_NON_EMPTY, EXPECTED_COUNT_NON_EMPTY);
+  }
+
+  /**
+   * Test that empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
 
 Review comment:
   (same here: "Test that an empty...", "The HLL sketch...")
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330565)
Time Spent: 35h 20m  (was: 35h 10m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330563=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330563
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336535846
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -96,28 +106,37 @@
   public static void prepareDatasetAndDataTable() throws Exception {
 BIGQUERY_CLIENT.createNewDataset(PROJECT_ID, DATASET_ID);
 
-// Create Data Table
 TableSchema dataTableSchema =
 new TableSchema()
 .setFields(
 Collections.singletonList(
 new 
TableFieldSchema().setName(DATA_FIELD_NAME).setType(DATA_FIELD_TYPE)));
-Table dataTable =
+
+Table dataTableNonEmpty =
 new Table()
 .setSchema(dataTableSchema)
 .setTableReference(
 new TableReference()
 .setProjectId(PROJECT_ID)
 .setDatasetId(DATASET_ID)
-.setTableId(DATA_TABLE_ID));
-BIGQUERY_CLIENT.createNewTable(PROJECT_ID, DATASET_ID, dataTable);
-
+.setTableId(DATA_TABLE_ID_NON_EMPTY));
+BIGQUERY_CLIENT.createNewTable(PROJECT_ID, DATASET_ID, dataTableNonEmpty);
 // Prepopulate test data to Data Table
 
 Review comment:
   "Prepopulate data tables with test data"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330563)
Time Spent: 35h 10m  (was: 35h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330564=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330564
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336534833
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -96,28 +106,37 @@
   public static void prepareDatasetAndDataTable() throws Exception {
 
 Review comment:
   Maybe rename to "...AndDataTables()"?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330564)
Time Spent: 35h 20m  (was: 35h 10m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330566=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330566
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336524398
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception {
   }
 
   /**
-   * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll 
sketch is computed by
-   * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies 
that we can run {@link
-   * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam 
to get the correct
-   * estimated count.
+   * Test that non-empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
 
 Review comment:
   Nit: "Test that a HLL++ sketch...", and "The HLL sketch is computed...". 
Otherwise, LGTM!
   Also, all Javadoc should be in third person ("Tests that..." instead of 
"Test that"; see 
https://www.oracle.com/technetwork/articles/java/index-137868.html, "Use 3rd 
person..."). Sorry that I missed this in the first version of this code! 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330566)
Time Spent: 35.5h  (was: 35h 20m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330561=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330561
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336542489
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -65,23 +66,32 @@
   private static final List TEST_DATA =
   Arrays.asList("Apple", "Orange", "Banana", "Orange");
 
-  // Data Table: used by testReadSketchFromBigQuery())
+  // Data Table: used by tests reading sketches from BigQuery
   // Schema: only one STRING field named "data".
-  // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
   private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
-  private static final Long EXPECTED_COUNT = 3L;
 
-  // Sketch Table: used by testWriteSketchToBigQuery()
+  // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
+  private static final String DATA_TABLE_ID_NON_EMPTY = "hll_data_non_empty";
+  private static final Long EXPECTED_COUNT_NON_EMPTY = 3L;
+
+  // Content: empty
+  private static final String DATA_TABLE_ID_EMPTY = "hll_data_empty";
 
 Review comment:
   Does the aggregation (HLL_COUNT.INIT) over an empty table return a NULL 
sketch, as expected? I.e., did the test fail before you modified 
'parseQueryResultToByteArray' to deal with NULLs? I think it should (according 
to 
https://plx.corp.google.com/scripts2/script_5d._6fe78c__2144_8a38_883d24fc4a60,
 last SELECT), but double-checking. 
   I'd wish we could add an assert, but that doesn't work with the utility 
method. 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330561)
Time Spent: 34h 50m  (was: 34h 40m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 34h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=330562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330562
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 18/Oct/19 15:20
Start Date: 18/Oct/19 15:20
Worklog Time Spent: 10m 
  Work Description: zfraa commented on pull request #9778: [BEAM-7013] 
Update BigQueryHllSketchCompatibilityIT to cover empty sketch cases
URL: https://github.com/apache/beam/pull/9778#discussion_r336529565
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -126,22 +145,49 @@ public static void deleteDataset() throws Exception {
   }
 
   /**
-   * Test that HLL++ sketch computed in BigQuery can be processed by Beam. Hll 
sketch is computed by
-   * {@code HLL_COUNT.INIT} in BigQuery and read into Beam; the test verifies 
that we can run {@link
-   * HllCount.MergePartial} and {@link HllCount.Extract} on the sketch in Beam 
to get the correct
-   * estimated count.
+   * Test that non-empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
+   * verifies that we can run {@link HllCount.MergePartial} and {@link 
HllCount.Extract} on the
+   * sketch in Beam to get the correct estimated count.
+   */
+  @Test
+  public void testReadNonEmptySketchFromBigQuery() {
+readSketchFromBigQuery(DATA_TABLE_ID_NON_EMPTY, EXPECTED_COUNT_NON_EMPTY);
+  }
+
+  /**
+   * Test that empty HLL++ sketch computed in BigQuery can be processed by 
Beam.
+   *
+   * Hll sketch is computed by {@code HLL_COUNT.INIT} in BigQuery and read 
into Beam; the test
+   * verifies that we can run {@link HllCount.MergePartial} and {@link 
HllCount.Extract} on the
+   * sketch in Beam to get the correct estimated count.
*/
   @Test
-  public void testReadSketchFromBigQuery() {
-String tableSpec = String.format("%s.%s", DATASET_ID, DATA_TABLE_ID);
+  public void testReadEmptySketchFromBigQuery() {
+readSketchFromBigQuery(DATA_TABLE_ID_EMPTY, EXPECTED_COUNT_EMPTY);
+  }
+
+  private void readSketchFromBigQuery(String tableId, Long expectedCount) {
+String tableSpec = String.format("%s.%s", DATASET_ID, tableId);
 String query =
 String.format(
 "SELECT HLL_COUNT.INIT(%s) AS %s FROM %s",
 DATA_FIELD_NAME, QUERY_RESULT_FIELD_NAME, tableSpec);
+
 SerializableFunction parseQueryResultToByteArray =
-(SchemaAndRecord schemaAndRecord) ->
-// BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type
-((ByteBuffer) 
schemaAndRecord.getRecord().get(QUERY_RESULT_FIELD_NAME)).array();
+input -> {
+  // BigQuery BYTES type corresponds to Java java.nio.ByteBuffer type
+  ByteBuffer sketch = (ByteBuffer) 
input.getRecord().get(QUERY_RESULT_FIELD_NAME);
+  if (sketch == null) {
+// Empty sketch is represented by null in BigQuery and by empty 
byte array in Beam
+return new byte[0];
+  } else {
+byte[] result = new byte[sketch.remaining()];
 
 Review comment:
   why not 
   `return sketch.array()`
   as previously? Since we can't be 100% sure that the ByteBuffer is backed by 
an accessible array?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330562)
Time Spent: 35h  (was: 34h 50m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 35h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8431) Option job_name should be in StandardOptions

2019-10-18 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8431:
--
Status: Open  (was: Triage Needed)

> Option job_name should be in StandardOptions
> 
>
> Key: BEAM-8431
> URL: https://issues.apache.org/jira/browse/BEAM-8431
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>
> jobName is used by multiple runners on the Java side of things, so it should 
> not be limited to GoogleCloudOptions in Python.
> https://github.com/apache/beam/blob/16b4afa4469ee2d2fd5a5f4aafd77a156ee14e7b/sdks/python/apache_beam/options/pipeline_options.py#L438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8430) sdk_worker_parallelism default is inconsistent between Py and Java SDKs

2019-10-18 Thread Kyle Weaver (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyle Weaver updated BEAM-8430:
--
Status: Open  (was: Triage Needed)

> sdk_worker_parallelism default is inconsistent between Py and Java SDKs
> ---
>
> Key: BEAM-8430
> URL: https://issues.apache.org/jira/browse/BEAM-8430
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core, sdk-py-core
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Default is currently 1 in Java, 0 in Python.
> https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L73
> https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/python/apache_beam/options/pipeline_options.py#L848



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8431) Option job_name should be in StandardOptions

2019-10-18 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8431:
-

 Summary: Option job_name should be in StandardOptions
 Key: BEAM-8431
 URL: https://issues.apache.org/jira/browse/BEAM-8431
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Kyle Weaver
Assignee: Kyle Weaver


jobName is used by multiple runners on the Java side of things, so it should 
not be limited to GoogleCloudOptions in Python.

https://github.com/apache/beam/blob/16b4afa4469ee2d2fd5a5f4aafd77a156ee14e7b/sdks/python/apache_beam/options/pipeline_options.py#L438



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8410) JdbcIO should support setConnectionInitSqls in its DataSource

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8410?focusedWorklogId=330539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330539
 ]

ASF GitHub Bot logged work on BEAM-8410:


Author: ASF GitHub Bot
Created on: 18/Oct/19 14:36
Start Date: 18/Oct/19 14:36
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9808: [BEAM-8410] 
JdbcIO should support setConnectionInitSqls in its DataSource
URL: https://github.com/apache/beam/pull/9808#issuecomment-543609627
 
 
   I think having a test for failing case should be enough. Please, do just two 
things before it will be ready for merging:
   - Jenkins jobs fails because of `spotless` - run `gradlew spotlessApply` 
before commit to fix it.
   - Squash commits into one with appropriate commit message.
   
   Hint: run `./gradlew -p path/to/changed/package check` every time before 
pushing to verify that build is green.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330539)
Time Spent: 1h 20m  (was: 1h 10m)

> JdbcIO should support setConnectionInitSqls in its DataSource
> -
>
> Key: BEAM-8410
> URL: https://issues.apache.org/jira/browse/BEAM-8410
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This property, connectionInitSqls, is very handy for anyone who use MySql and 
> Mariadb, to set any init sql statements to be executed at connection time. 
> Note: but it's not applicable across databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8430) sdk_worker_parallelism default is inconsistent between Py and Java SDKs

2019-10-18 Thread Kyle Weaver (Jira)
Kyle Weaver created BEAM-8430:
-

 Summary: sdk_worker_parallelism default is inconsistent between Py 
and Java SDKs
 Key: BEAM-8430
 URL: https://issues.apache.org/jira/browse/BEAM-8430
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core, sdk-py-core
Reporter: Kyle Weaver
Assignee: Kyle Weaver


Default is currently 1 in Java, 0 in Python.

https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L73

https://github.com/apache/beam/blob/7b67a926b8939ede8f2e33c85579b540d18afccf/sdks/python/apache_beam/options/pipeline_options.py#L848



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=330497=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330497
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 18/Oct/19 13:31
Start Date: 18/Oct/19 13:31
Worklog Time Spent: 10m 
  Work Description: bmv126 commented on issue #9567: [BEAM-5690] Fix Zero 
value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#issuecomment-543747540
 
 
   @echauchot  I have addressed the review comment, can you have a look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330497)
Time Spent: 2h 50m  (was: 2h 40m)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?focusedWorklogId=330496=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330496
 ]

ASF GitHub Bot logged work on BEAM-5690:


Author: ASF GitHub Bot
Created on: 18/Oct/19 13:29
Start Date: 18/Oct/19 13:29
Worklog Time Spent: 10m 
  Work Description: bmv126 commented on pull request #9567: [BEAM-5690] Fix 
Zero value issue with GroupByKey/CountByKey in SparkRunner
URL: https://github.com/apache/beam/pull/9567#discussion_r336490749
 
 

 ##
 File path: 
runners/spark/src/main/java/org/apache/beam/runners/spark/stateful/SparkGroupAlsoByWindowViaWindowSet.java
 ##
 @@ -338,6 +339,18 @@ public void outputWindowedValue(
   outputHolder.getWindowedValues();
 
   if (!outputs.isEmpty() || !stateInternals.getState().isEmpty()) {
+Collection filteredTimers =
+timerInternals.getTimers().stream()
+.filter(
+timer ->
+timer
+.getTimestamp()
+.plus(windowingStrategy.getAllowedLateness())
+
.isBefore(timerInternals.currentInputWatermarkTime()))
+.collect(Collectors.toList());
+
+filteredTimers.forEach(timerInternals::deleteTimer);
+
 
 Review comment:
   I have moved the code to a method in LateDataUtils.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330496)
Time Spent: 2h 40m  (was: 2.5h)

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3718) ClassNotFoundException on CloudResourceManager$Builder

2019-10-18 Thread Luis (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954587#comment-16954587
 ] 

Luis commented on BEAM-3718:


It looks like this issue still happens on Beam SDK 2.16 with JVM 11.
{code:java}
java.lang.RuntimeException: Failed to construct instance from factory method 
DataflowRunner#fromOptions(interface 
org.apache.beam.sdk.options.PipelineOptions)
at 
org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:224)
at 
org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:155)
at 
org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:55)
at org.apache.beam.sdk.Pipeline.create(Pipeline.java:147)
at com.spotify.dbeam.jobs.JdbcAvroJob.create(JdbcAvroJob.java:70)
at com.spotify.dbeam.jobs.JdbcAvroJob.create(JdbcAvroJob.java:77)
at com.spotify.dbeam.jobs.JdbcAvroJob.create(JdbcAvroJob.java:83)
at com.spotify.dbeam.jobs.JdbcAvroJob.main(JdbcAvroJob.java:153)
Caused by: java.lang.reflect.InvocationTargetException
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:214)
... 7 more
Caused by: java.lang.IllegalArgumentException: Unable to use ClassLoader to 
detect classpath elements. Current ClassLoader is 
jdk.internal.loader.ClassLoaders$AppClassLoader@799f7e29, only URLClassLoaders 
are supported.
at 
org.apache.beam.runners.core.construction.PipelineResources.detectClassPathResourcesToStage(PipelineResources.java:58)
at 
org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:285)
... 12 more
 {code}

> ClassNotFoundException on CloudResourceManager$Builder
> --
>
> Key: BEAM-3718
> URL: https://issues.apache.org/jira/browse/BEAM-3718
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Reporter: Yunis Arif Said
>Priority: Trivial
>
> In a spring boot application running google cloud dataflow code. The dataflow 
> takes data from google PubSub, transform incoming data and output result to 
> bigquery for storage. The code does not have any syntax errors. The problem 
> is when the application is run, the following exception is thrown. 
>  
> {code:java}
>  Exception in thread "main" java.lang.RuntimeException: Failed to construct 
> instance from factory method DataflowRunner#fromOptions(interface 
> org.apache.beam.sdk.options.PipelineOptions)
>  at 
> org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:233)
>  at org.apache.beam.sdk.util.InstanceBuilder.build(InstanceBuilder.java:162)
>  at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:52)
>  at org.apache.beam.sdk.Pipeline.create(Pipeline.java:142)
>  at com.trackers.exlon.ExlonApplication.main(ExlonApplication.java:69)
>  
>  Caused by: java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> org.apache.beam.sdk.util.InstanceBuilder.buildFromMethod(InstanceBuilder.java:222)
>  ... 4 more
> Caused by: java.lang.NoClassDefFoundError: 
> com/google/api/services/cloudresourcemanager/CloudResourceManager$Builder
>  at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.newCloudResourceManagerClient(GcpOptions.java:369)
>  at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:240)
>  at 
> org.apache.beam.sdk.extensions.gcp.options.GcpOptions$GcpTempLocationFactory.create(GcpOptions.java:228)
>  at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.returnDefaultHelper(ProxyInvocationHandler.java:592)
>  at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.getDefault(ProxyInvocationHandler.java:533)
>  at 
> org.apache.beam.sdk.options.ProxyInvocationHandler.invoke(ProxyInvocationHandler.java:156)
>  at com.sun.proxy.$Proxy85.getGcpTempLocation(Unknown Source)
>  at 
> org.apache.beam.runners.dataflow.DataflowRunner.fromOptions(DataflowRunner.java:223)
>  ... 9 more
> Caused by: java.lang.ClassNotFoundException: 
> com.google.api.services.cloudresourcemanager.CloudResourceManager$Builder
>  at 

[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=330482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330482
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Oct/19 12:52
Start Date: 18/Oct/19 12:52
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-543727429
 
 
   Run Java_Examples_Dataflow PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330482)
Time Spent: 15h 40m  (was: 15.5h)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=330483=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330483
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Oct/19 12:52
Start Date: 18/Oct/19 12:52
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-543727505
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330483)
Time Spent: 15h 50m  (was: 15h 40m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8410) JdbcIO should support setConnectionInitSqls in its DataSource

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8410?focusedWorklogId=330470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330470
 ]

ASF GitHub Bot logged work on BEAM-8410:


Author: ASF GitHub Bot
Created on: 18/Oct/19 12:28
Start Date: 18/Oct/19 12:28
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9808: [BEAM-8410] 
JdbcIO should support setConnectionInitSqls in its DataSource
URL: https://github.com/apache/beam/pull/9808#issuecomment-543609627
 
 
   I think having a test for failing case should be enough. Please, do just two 
things before it will be ready for merging:
   - Jenkins jobs fails because of `spotless` - run `gradlew spotlessApply` 
before commit to fix it.
   - Squash commits into one with appropriate commit message.
   
   Hint: run `./gradlew -p path/to/changed/package check` every time before 
pushing to verify that build is green.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330470)
Time Spent: 1h 10m  (was: 1h)

> JdbcIO should support setConnectionInitSqls in its DataSource
> -
>
> Key: BEAM-8410
> URL: https://issues.apache.org/jira/browse/BEAM-8410
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-jdbc
>Reporter: Cam Mach
>Assignee: Cam Mach
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This property, connectionInitSqls, is very handy for anyone who use MySql and 
> Mariadb, to set any init sql statements to be executed at connection time. 
> Note: but it's not applicable across databases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8383) Add metrics to Python state cache

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8383?focusedWorklogId=330417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330417
 ]

ASF GitHub Bot logged work on BEAM-8383:


Author: ASF GitHub Bot
Created on: 18/Oct/19 10:38
Start Date: 18/Oct/19 10:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9769: [BEAM-8383] Add metrics 
to the Python state cache
URL: https://github.com/apache/beam/pull/9769#issuecomment-543664340
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330417)
Time Spent: 40m  (was: 0.5h)

> Add metrics to Python state cache
> -
>
> Key: BEAM-8383
> URL: https://issues.apache.org/jira/browse/BEAM-8383
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> For more insight into how effectively the cache works, metrics should be 
> added to the Python SDK. All the state operations should be counted, as well 
> as metrics like the current size of the cache, cache hits/misses, and the 
> capacity of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8383) Add metrics to Python state cache

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8383?focusedWorklogId=330416=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330416
 ]

ASF GitHub Bot logged work on BEAM-8383:


Author: ASF GitHub Bot
Created on: 18/Oct/19 10:38
Start Date: 18/Oct/19 10:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9769: [BEAM-8383] Add metrics 
to the Python state cache
URL: https://github.com/apache/beam/pull/9769#issuecomment-543664281
 
 
   Run Python2_PVR_Flink PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330416)
Time Spent: 0.5h  (was: 20m)

> Add metrics to Python state cache
> -
>
> Key: BEAM-8383
> URL: https://issues.apache.org/jira/browse/BEAM-8383
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For more insight into how effectively the cache works, metrics should be 
> added to the Python SDK. All the state operations should be counted, as well 
> as metrics like the current size of the cache, cache hits/misses, and the 
> capacity of the cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8412) Py2 post-commit test crossLanguagePortableWordCount failing

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8412?focusedWorklogId=330410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330410
 ]

ASF GitHub Bot logged work on BEAM-8412:


Author: ASF GitHub Bot
Created on: 18/Oct/19 10:30
Start Date: 18/Oct/19 10:30
Worklog Time Spent: 10m 
  Work Description: mxm commented on pull request #9817:  [BEAM-8412] xlang 
test: set sdk worker parallelism to 1
URL: https://github.com/apache/beam/pull/9817#discussion_r336424962
 
 

 ##
 File path: sdks/python/test-suites/portable/py2/build.gradle
 ##
 @@ -128,6 +128,7 @@ task crossLanguagePortableWordCount {
 "--shutdown_sources_on_final_watermark",
 "--environment_cache_millis=1",
 "--expansion_service_jar=${testServiceExpansionJar}",
+"--sdk_worker_parallelism=1"
 
 Review comment:
   Let's add a comment here why this is necessary. Otherwise LGTM.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330410)
Time Spent: 1h 40m  (was: 1.5h)

> Py2 post-commit test crossLanguagePortableWordCount  failing
> 
>
> Key: BEAM-8412
> URL: https://issues.apache.org/jira/browse/BEAM-8412
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness, test-failures
>Reporter: Ahmet Altay
>Assignee: Chamikara Madhusanka Jayalath
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Error is: 13:09:02 RuntimeError: IOError: [Errno 2] No such file or 
> directory: 
> '/tmp/beam-temp-py-wordcount-portable-67ce50e0eebe11e9892842010a80009c/fe77e28c-2e44-4914-8460-76ab4dbb8579.py-wordcount-portable'
>  [while running 'write/Write/WriteImpl/WriteBundles']
> Log:  
> https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Python2/706/consoleFull
> Last passing run was 705. Between 705 and 706 the change is 
> (https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Python2/706/)
>  : https://github.com/apache/beam/pull/9785 -- Although this PR looks 
> unrelated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8281) Resize file-based IOITs

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8281?focusedWorklogId=330405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330405
 ]

ASF GitHub Bot logged work on BEAM-8281:


Author: ASF GitHub Bot
Created on: 18/Oct/19 10:13
Start Date: 18/Oct/19 10:13
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #9638: [BEAM-8281] 
Resize IOITs datasets
URL: https://github.com/apache/beam/pull/9638
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330405)
Time Spent: 10h  (was: 9h 50m)

> Resize file-based IOITs 
> 
>
> Key: BEAM-8281
> URL: https://issues.apache.org/jira/browse/BEAM-8281
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> Resize the IOITs to use known amounts of data saved on GCS so that it's 
> possible to report dataset size and measure throughput as part of the test 
> metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=330394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330394
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Oct/19 09:51
Start Date: 18/Oct/19 09:51
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-543640928
 
 
   Run Samza ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330394)
Time Spent: 15h 20m  (was: 15h 10m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=330395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330395
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Oct/19 09:51
Start Date: 18/Oct/19 09:51
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-543641135
 
 
   Run Flink ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330395)
Time Spent: 15.5h  (was: 15h 20m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=330393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330393
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Oct/19 09:50
Start Date: 18/Oct/19 09:50
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-543640860
 
 
   Run Flink ValidaatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330393)
Time Spent: 15h 10m  (was: 15h)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7520) DirectRunner timers are not strictly time ordered

2019-10-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7520?focusedWorklogId=330392=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-330392
 ]

ASF GitHub Bot logged work on BEAM-7520:


Author: ASF GitHub Bot
Created on: 18/Oct/19 09:50
Start Date: 18/Oct/19 09:50
Worklog Time Spent: 10m 
  Work Description: je-ik commented on issue #9190: [BEAM-7520] Fix timer 
firing order in DirectRunner
URL: https://github.com/apache/beam/pull/9190#issuecomment-543640775
 
 
   Run Dataflow ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 330392)
Time Spent: 15h  (was: 14h 50m)

> DirectRunner timers are not strictly time ordered
> -
>
> Key: BEAM-7520
> URL: https://issues.apache.org/jira/browse/BEAM-7520
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Affects Versions: 2.13.0
>Reporter: Jan Lukavský
>Assignee: Jan Lukavský
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Let's suppose we have the following situation:
>  - statful ParDo with two timers - timerA and timerB
>  - timerA is set for window.maxTimestamp() + 1
>  - timerB is set anywhere between  timerB.timestamp
>  - input watermark moves to BoundedWindow.TIMESTAMP_MAX_VALUE
> Then the order of timers is as follows (correct):
>  - timerB
>  - timerA
> But, if timerB sets another timer (say for timerB.timestamp + 1), then the 
> order of timers will be:
>  - timerB (timerB.timestamp)
>  - timerA (BoundedWindow.TIMESTAMP_MAX_VALUE)
>  - timerB (timerB.timestamp + 1)
> Which is not ordered by timestamp. The reason for this is that when the input 
> watermark update is evaluated, the WatermarkManager,extractFiredTimers() will 
> produce both timerA and timerB. That would be correct, but when timerB sets 
> another timer, that breaks this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8399) Python HDFS implementation should support filenames of the format "hdfs://namenodehost/parent/child"

2019-10-18 Thread alionkun (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954414#comment-16954414
 ] 

alionkun commented on BEAM-8399:


facing same problem on machine learning task based on tensorflow and beam.

tensorflow and other components working with HDFS accept url like 
"hdfs://namenodehost/parent/child" which is semantically complete.

but beam accpets url like "hdfs://parent/child".

problem is we have to keep two styles for each hdfs url.

> Python HDFS implementation should support filenames of the format 
> "hdfs://namenodehost/parent/child"
> 
>
> Key: BEAM-8399
> URL: https://issues.apache.org/jira/browse/BEAM-8399
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chamikara Madhusanka Jayalath
>Priority: Major
>
> "hdfs://namenodehost/parent/child" and "/parent/child" seems to be the 
> correct filename formats for HDFS based on [1] but we currently support 
> format "hdfs://parent/child".
> To not break existing users, we have to either (1) somehow support both 
> versions by default (based on [2] seems like HDFS does not allow colons in 
> file path so this might be possible) (2) make  
> "hdfs://namenodehost/parent/child" optional for now and change it to default 
> after few versions.
> We should also make sure that Beam Java and Python HDFS file-system 
> implementations are consistent in this regard.
>  
> [1][https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html]
> [2] https://issues.apache.org/jira/browse/HDFS-13
>  
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   >