[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=334495=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334495
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 26/Oct/19 05:24
Start Date: 26/Oct/19 05:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9844: [BEAM-8372] 
Support both flink_master and flink_master_url parameter
URL: https://github.com/apache/beam/pull/9844#discussion_r339286935
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -43,7 +43,13 @@ def default_job_server(self, options):
 class FlinkRunnerOptions(pipeline_options.PipelineOptions):
   @classmethod
   def _add_argparse_args(cls, parser):
-parser.add_argument('--flink_master', default='[local]')
+parser.add_argument('--flink_master',
+default='[auto]',
+help='Flink master address (host:port) to submit the'
+ ' job against. Use "[local]" to start a local'
+ ' cluster for the execution. Use "[auto]" if you'
+ ' plan to either execute locally or submit 
through'
 
 Review comment:
   How is this choice made if `[auto]` is passed? Should it really be the 
default? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334495)
Time Spent: 7h  (was: 6h 50m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=334493=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334493
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 26/Oct/19 05:24
Start Date: 26/Oct/19 05:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9844: [BEAM-8372] Support 
both flink_master and flink_master_url parameter
URL: https://github.com/apache/beam/pull/9844#issuecomment-546571130
 
 
   I guess it's `--flink_master` that's new (in Python's FlinkRunner). But if 
we don't have the http than we need to change the 
https://github.com/apache/beam/pull/9775 to add it. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334493)
Time Spent: 6h 50m  (was: 6h 40m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=334494=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334494
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 26/Oct/19 05:24
Start Date: 26/Oct/19 05:24
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9844: [BEAM-8372] 
Support both flink_master and flink_master_url parameter
URL: https://github.com/apache/beam/pull/9844#discussion_r339286838
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/flink_runner.py
 ##
 @@ -43,7 +43,13 @@ def default_job_server(self, options):
 class FlinkRunnerOptions(pipeline_options.PipelineOptions):
   @classmethod
   def _add_argparse_args(cls, parser):
-parser.add_argument('--flink_master', default='[local]')
+parser.add_argument('--flink_master',
+default='[auto]',
 
 Review comment:
   `[local]` is also referenced at 
https://github.com/apache/beam/blob/release-2.17.0/sdks/python/apache_beam/runners/portability/flink_runner.py#L36
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334494)
Time Spent: 7h  (was: 6h 50m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8495) Change Dataflow byte size estimate for compressed files for FileBasedSources

2019-10-25 Thread Ahmet Altay (Jira)
Ahmet Altay created BEAM-8495:
-

 Summary: Change Dataflow byte size estimate for compressed files 
for FileBasedSources
 Key: BEAM-8495
 URL: https://issues.apache.org/jira/browse/BEAM-8495
 Project: Beam
  Issue Type: Bug
  Components: io-java-files, io-py-files
Reporter: Ahmet Altay
Assignee: Chamikara Madhusanka Jayalath


Consider change the estimated byte size for file based sources for better 
initial autoscaling. Currently the estimate is based on total size, for 
compressed file we could add a factor multiplier (~5) for compressed files in a 
blob go produce a better estimate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to infinite recursion during pickling on Python 3.7.

2019-10-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8397:
--
Summary: DataflowRunnerTest.test_remote_runner_display_data sometimes fails 
due to infinite recursion during pickling on Python 3.7.  (was: 
DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
recursion during pickling.)

> DataflowRunnerTest.test_remote_runner_display_data sometimes fails due to 
> infinite recursion during pickling on Python 3.7.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
> ...
> {noformat}
> cc: [~yoshiki.obata]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8494) Python 3.8 Support

2019-10-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960250#comment-16960250
 ] 

Valentyn Tymofieiev edited comment on BEAM-8494 at 10/26/19 3:14 AM:
-

Anticipated work here (we'll need to create subtasks for it):
- Chase all Beam dependencies that do not support Python 3.8.
- Add tox suites.
- Add Post Commit suites.
- Add Python 3.8 containers. 
- Inventory new syntactic features of Python 3.8 and add tests to make sure we 
support them. For example: positional-only arguments.  
- Add Python 3.8 to release qualification/automation.
- Update supported Python version qualifiers in setup.py.
- Add support for Python 3.8 in Beam runners.


was (Author: tvalentyn):
Anticipated work here (we'll need to create subtasks for it):
- Chase all Beam dependencies that do not support Python 3.8.
- Add tox suites.
- Add Post Commit suites.
- Add Python 3.8 containers. 
- Inventory new syntactic features of Python 3.8 and add tests to make sure we 
support them. For example: positional-only arguments.  
- Add Python 3.8 to release qualification/automation.
- Add support for Python 3.8 in Beam runners.

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8494) Python 3.8 Support

2019-10-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960250#comment-16960250
 ] 

Valentyn Tymofieiev edited comment on BEAM-8494 at 10/26/19 3:12 AM:
-

Anticipated work here (we'll need to create subtasks for it):
- Chase all Beam dependencies that do not support Python 3.8.
- Add tox suites.
- Add Post Commit suites.
- Add Python 3.8 containers. 
- Inventory new syntactic features of Python 3.8 and add tests to make sure we 
support them. For example: positional-only arguments.  
- Add Python 3.8 to release qualification/automation.
- Add support for Python 3.8 in Beam runners.


was (Author: tvalentyn):
Anticipated work here (we'll need to create subtasks for it):
- Chase all Beam dependencies that do not support Python 3.8.
- Add tox suites.
- Add Post Commit suites.
- Add Python 3.8 containers. 
- Inventory new syntactic features of Python 3.8 and add tests to make sure we 
support them. For example: positional-only arguments.  
- Add Python 3.8 to release qualification/automation.


> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8494) Python 3.8 Support

2019-10-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960250#comment-16960250
 ] 

Valentyn Tymofieiev commented on BEAM-8494:
---

Anticipated work here (we'll need to create subtasks for it):
- Chase all Beam dependencies that do not support Python 3.8.
- Add tox suites.
- Add Post Commit suites.
- Add Python 3.8 containers. 
- Inventory new syntactic features of Python 3.8 and add tests to make sure we 
support them. For example: positional-only arguments.  
- Add Python 3.8 to release qualification/automation.


> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8494) Python 3.8 Support

2019-10-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8494:
--
Summary: Python 3.8 Support  (was: Support Python 3.8)

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8494) Python 3.8 Support

2019-10-25 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8494:
--
Status: Open  (was: Triage Needed)

> Python 3.8 Support
> --
>
> Key: BEAM-8494
> URL: https://issues.apache.org/jira/browse/BEAM-8494
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8494) Support Python 3.8

2019-10-25 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8494:
-

 Summary: Support Python 3.8
 Key: BEAM-8494
 URL: https://issues.apache.org/jira/browse/BEAM-8494
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Valentyn Tymofieiev






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-10-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960246#comment-16960246
 ] 

Valentyn Tymofieiev commented on BEAM-5878:
---

This was expected to complete in 2.16.0, but we were not able to release this 
feature with 2.16.0, due to a well-masked but unrelated issue (BEAM-8397). 
https://github.com/apache/beam/pull/9881 should address BEAM-8397 and unblock 
this issue.

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=334481=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334481
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 26/Oct/19 01:58
Start Date: 26/Oct/19 01:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9896: [BEAM-8481] 
Increasing Py3.7 postcommit job timeout after 120 minutes instead of 100
URL: https://github.com/apache/beam/pull/9896
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334481)
Time Spent: 50m  (was: 40m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334475=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334475
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 01:07
Start Date: 26/Oct/19 01:07
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339279510
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,156 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
 
 Review comment:
   I think this part is OK. We will later have to search the codebase for all 
comments including Python2, Py2, Python 3, etc for this cleanup. If you want a 
JIRA, feel free to use BEAM-7372.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334475)
Time Spent: 16h 50m  (was: 16h 40m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 16h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334470=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334470
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:41
Start Date: 26/Oct/19 00:41
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277535
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   As per BEAM-366, Runner API does not plumb through display_data for 
composite transforms. With this change we no longer have recursion errors. The 
recursion errors were caught in [1] and triggered the same codepath as when we 
set test_runner_api=False in [2], that's why the test was passing. However 
sometimes recursion error was not caught in [1], but was caught by Python 
IDE/debugger, and sometimes Python 3.7 does emit/catch RecursionError [3]. In 
these scenarios, the test used to fail.
   [1] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L599
   [2] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L402
   [3] https://bugs.python.org/issue38593
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334470)
Time Spent: 2h 10m  (was: 2h)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in 

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334469=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334469
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:41
Start Date: 26/Oct/19 00:41
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277535
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   As per BEAM-366, Runner API does not plumb through display_data for 
composite transforms. With this change we no longer have recursion errors. The 
recursion errors were caught in [1] and triggered the same codepath as when we 
set test_runner_api=False in [2], that's why the test was passing. However 
sometimes recursion error was not caught in [1], but was caught by Python 
IDE/debugger, and sometimes [3] Python 3.7 does emit/catch RecursionError. In 
these scenarios, the test used to fail.
   [1] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L599
   [2] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L402
   [3] https://bugs.python.org/issue38593
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334469)
Time Spent: 2h  (was: 1h 50m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
> 

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334467=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334467
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:40
Start Date: 26/Oct/19 00:40
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277943
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -279,8 +282,7 @@ def process(self):
  {'type': 'INTEGER', 'namespace': nspace+'SpecialDoFn',
   'value': 42, 'key': 'dofn_value'}]
 expected_data = sorted(expected_data, key=lambda x: 
x['namespace']+x['key'])
 
 Review comment:
   ah good catch, not required. PTAL
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334467)
Time Spent: 1h 50m  (was: 1h 40m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", 

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334465=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334465
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:36
Start Date: 26/Oct/19 00:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277698
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   Overall, I think BEAM-366 should capture the underlying reason here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334465)
Time Spent: 1h 40m  (was: 1.5h)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334464=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334464
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:36
Start Date: 26/Oct/19 00:36
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277535
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   As per BEAM-366, Runner API does not plumb through display_data for 
composite transforms. With this change we no longer have recursion errors. The 
recursion errors are caught in [1] and triggered the same codepath as when we 
set test_runner_api=False in [2], that's why the test was passing. However 
sometimes recursion error was not caught in [1], but was caught by Python 
IDE/debugger, and sometimes [3] Python 3.7 does emit/catch RecursionError. In 
these scenarios, the test used to fail.
   [1] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L599
   [2] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L402
   [3] https://bugs.python.org/issue38593
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334464)
Time Spent: 1.5h  (was: 1h 20m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in 

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334463=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334463
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:35
Start Date: 26/Oct/19 00:35
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277535
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   As per BEAM-366, Runner API does not plumb through display_data for 
composite transforms. With this change we no longer have recursion errors. The 
recursion errors are caught in [1] and triggered the same codepath as when we 
set test_runner_api=False in [2], that's why the test was passing. However 
sometimes recursion error was not caught in [1], but was caught by Python 
IDE/debugger, and sometimes [3] Python 3.7 does not catch it. In these 
scenarios, the test used to fail.
   [1] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L599
   [2] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L402
   [3] https://bugs.python.org/issue38593
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334463)
Time Spent: 1h 20m  (was: 1h 10m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334462
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:34
Start Date: 26/Oct/19 00:34
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9881: [BEAM-8397] 
Fix infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339277535
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   As per BEAM-366, Runner API does not plumb through display_data for 
composite transforms. With this change we no longer have recursion errors. The 
recursion errors are caught in [1] and triggered the same codepath as when we 
set test_runner_api=False in [2], that's why the test was passing. However 
sometimes recursion error was not caught in [1], but was caught by Python 
IDE/debugger, and sometimes [3] Python 3.7 does not catch it. In these 
scenarios, the test used to fail.
   [1] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L402
   [2] 
https://github.com/apache/beam/blob/c3c5999e3a82d865810f799767c179e2c17d304b/sdks/python/apache_beam/pipeline.py#L599
   [3] https://bugs.python.org/issue38593
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334462)
Time Spent: 1h 10m  (was: 1h)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334459=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334459
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:31
Start Date: 26/Oct/19 00:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339277056
 
 

 ##
 File path: sdks/python/test-suites/tox/py37/build.gradle
 ##
 @@ -40,6 +40,10 @@ test.dependsOn testPy37Gcp
 
 toxTask "testPy37Cython", "py37-cython"
 test.dependsOn testPy37Cython
+
+toxTask "testPy37Interactive", "py37-interactive"
 
 Review comment:
   Maybe just limit tests to py 3.7 to limit the added initial test time.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334459)
Time Spent: 16h 40m  (was: 16.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 16h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334458=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334458
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:31
Start Date: 26/Oct/19 00:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339277198
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,156 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import apache_beam as beam
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
 
 Review comment:
   Could you add a py2 todo here, for this to be removed later. You might be 
able to use BEAM-1251 or maybe @tvalentyn can suggest a better JIRA.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334458)
Time Spent: 16.5h  (was: 16h 20m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334457=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334457
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:31
Start Date: 26/Oct/19 00:31
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339276997
 
 

 ##
 File path: sdks/python/apache_beam/runners/runner.py
 ##
 @@ -84,6 +85,10 @@ def create_runner(runner_name):
 raise ImportError(
 'Google Cloud Dataflow runner not available, '
 'please install apache_beam[gcp]')
+  elif 'interactive' in runner_name.lower():
+raise ImportError(
 
 Review comment:
   Have you tested this to be working as expected?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334457)
Time Spent: 16h 20m  (was: 16h 10m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=334455=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334455
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:28
Start Date: 26/Oct/19 00:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9896: [BEAM-8481] 
Increasing Py3.7 postcommit job timeout after 120 minutes instead of 100
URL: https://github.com/apache/beam/pull/9896#issuecomment-546551151
 
 
   I couldn't find any examples: 
https://github.com/apache/beam/search?p=1=setTopLevelMainJobProperties_q=setTopLevelMainJobProperties
   
   : /
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334455)
Time Spent: 40m  (was: 0.5h)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334456=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334456
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:28
Start Date: 26/Oct/19 00:28
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546551178
 
 
   alright. It seems reasonable. I'll look into that monday.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334456)
Time Spent: 1.5h  (was: 1h 20m)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=334454=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334454
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:26
Start Date: 26/Oct/19 00:26
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9896: [BEAM-8481] 
Increasing Py3.7 postcommit job timeout after 120 minutes instead of 100
URL: https://github.com/apache/beam/pull/9896#issuecomment-546551039
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334454)
Time Spent: 0.5h  (was: 20m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8493) Add standard double coder to Go SDK

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8493?focusedWorklogId=334453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334453
 ]

ASF GitHub Bot logged work on BEAM-8493:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:18
Start Date: 26/Oct/19 00:18
Worklog Time Spent: 10m 
  Work Description: youngoli commented on issue #9897: [BEAM-8493] Add 
standard double coder to Go SDK.
URL: https://github.com/apache/beam/pull/9897#issuecomment-546550023
 
 
   R: @lostluck 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334453)
Time Spent: 20m  (was: 10m)

> Add standard double coder to Go SDK
> ---
>
> Key: BEAM-8493
> URL: https://issues.apache.org/jira/browse/BEAM-8493
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For upcoming features, we need to add support for the standard double coder: 
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L573]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334450=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334450
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:12
Start Date: 26/Oct/19 00:12
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-546549335
 
 
   > Looks cleaner. Could you ping once all open comments are resolved.
   > 
   > (I will be OOO most of next week but @pabloem can finish the review.)
   
   Sure, thanks! I think all the open comments are marked as resolved. I'll let 
Pablo go over it once more next week. Have a nice weekend~
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334450)
Time Spent: 16h 10m  (was: 16h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 16h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334448=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334448
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:11
Start Date: 26/Oct/19 00:11
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9881: [BEAM-8397] Fix 
infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339275364
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -235,40 +259,19 @@ def test_remote_runner_display_data(self):
 p = Pipeline(remote_runner,
  options=PipelineOptions(self.default_properties))
 
-# TODO: Should not subclass ParDo. Switch to PTransform as soon as
-# composite transforms support display data.
-class SpecialParDo(beam.ParDo):
-  def __init__(self, fn, now):
-super(SpecialParDo, self).__init__(fn)
-self.fn = fn
-self.now = now
-
-  # Make this a list to be accessible within closure
-  def display_data(self):
-return {'asubcomponent': self.fn,
-'a_class': SpecialParDo,
-'a_time': self.now}
-
-class SpecialDoFn(beam.DoFn):
-  def display_data(self):
-return {'dofn_value': 42}
-
-  def process(self):
-pass
-
 now = datetime.now()
 # pylint: disable=expression-not-assigned
 (p | ptransform.Create([1, 2, 3, 4, 5])
  | 'Do' >> SpecialParDo(SpecialDoFn(), now))
 
-p.run()
+# TODO(BEAM-366) Enable runner API on this test.
+p.run(test_runner_api=False)
 
 Review comment:
   Could you add the reason why? IIRC, it's because Runner API does a 
pickle-unpickle test which can crash under certain combinations of IDE/debugger 
settings and Python versions.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334448)
Time Spent: 1h  (was: 50m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> 

[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334449=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334449
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:11
Start Date: 26/Oct/19 00:11
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9881: [BEAM-8397] Fix 
infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#discussion_r339274728
 
 

 ##
 File path: sdks/python/apache_beam/runners/dataflow/dataflow_runner_test.py
 ##
 @@ -279,8 +282,7 @@ def process(self):
  {'type': 'INTEGER', 'namespace': nspace+'SpecialDoFn',
   'value': 42, 'key': 'dofn_value'}]
 expected_data = sorted(expected_data, key=lambda x: 
x['namespace']+x['key'])
 
 Review comment:
   Is this invocation of sorted still necessary, since you've removed the other 
one?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334449)
Time Spent: 1h  (was: 50m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334447=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334447
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:11
Start Date: 26/Oct/19 00:11
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339275341
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import apache_beam as beam  # pylint: disable=ungrouped-imports
+import timeloop
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(AssertionError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_pcoll_visualization_generate_unique_display_id(self):
+pv_1 = pv.PCollVisualization(self._pcoll)
+pv_2 = pv.PCollVisualization(self._pcoll)
+self.assertNotEqual(pv_1._dive_display_id, pv_2._dive_display_id)
+self.assertNotEqual(pv_1._overview_display_id, pv_2._overview_display_id)
+self.assertNotEqual(pv_1._df_display_id, pv_2._df_display_id)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', lambda x: [1, 2, 3])
+  def test_one_shot_visualization_not_return_handle(self):
+self.assertIsNone(pv.visualize(self._pcoll))
+
+  def _mock_to_element_list(self):
+yield [1, 2, 3]
+yield [1, 2, 3, 4]
+yield [1, 2, 3, 4, 5]
+yield [1, 2, 3, 4, 5, 6]
+yield [1, 2, 3, 4, 5, 6, 7]
+yield [1, 2, 3, 4, 5, 6, 7, 8]
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  def test_dynamic_plotting_return_handle(self):
+h = pv.visualize(self._pcoll, dynamic_plotting_interval=1)
+self.assertIsInstance(h, timeloop.Timeloop)
+h.stop()
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization.display_facets')
+  def test_dynamic_plotting_update_same_display(self,
+mocked_display_facets):
+fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
+ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
+# Starts async dynamic plotting that never 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334446=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334446
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:10
Start Date: 26/Oct/19 00:10
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339275261
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,279 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from pandas.io.json import json_normalize
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+# TODO(BEAM-8288): clean up once Py2 is deprecated from Beam.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+try:
+  from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+  _facets_gfsg_ready = True
+except ImportError:
+  _facets_gfsg_ready = False
+
+try:
+  from IPython.core.display import HTML
+  from IPython.core.display import Javascript
+  from IPython.core.display import display
+  from IPython.core.display import display_javascript
+  from IPython.core.display import update_display
+  _ipython_ready = True
+except ImportError:
+  _ipython_ready = False
+
+try:
+  from timeloop import Timeloop
+  _tl_ready = True
+except ImportError:
+  _tl_ready = False
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 
 Review comment:
   Yes, we'll do the integration tests as soon as the framework is ready, and 
OSS those tests when the framework supports OSS.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334446)
Time Spent: 15h 50m  (was: 15h 40m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=334445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334445
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 26/Oct/19 00:01
Start Date: 26/Oct/19 00:01
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9896: [BEAM-8481] 
Increasing Py3.7 postcommit job timeout after 120 minutes instead of 100
URL: https://github.com/apache/beam/pull/9896#discussion_r339274361
 
 

 ##
 File path: .test-infra/jenkins/job_PostCommit_Python37.groovy
 ##
 @@ -27,7 +27,7 @@ 
PostcommitJobBuilder.postCommitJob('beam_PostCommit_Python37', 'Run Python 3.7 P
   previousNames('/beam_PostCommit_Python3_Verify/')
 
   // Set common parameters.
-  commonJobProperties.setTopLevelMainJobProperties(delegate)
+  commonJobProperties.setTopLevelMainJobProperties(delegate, 'master', 120)
 
 Review comment:
   Does groovy allow to pass key-value pairs here, so that we ca avoid passing 
'master'?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334445)
Time Spent: 20m  (was: 10m)

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334439=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334439
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339267118
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
+this.dbUri = matcher.group("credsHostPort"); // "mongodb://localhost:27017"
+this.dbName = matcher.group("database");
+this.dbCollection = matcher.group("collection");
+  }
+
+  @Override
+  public PCollection buildIOReader(PBegin begin) {
+// Read MongoDb Documents
+PCollection readDocuments =
+MongoDbIO.read()
+.withUri(dbUri)
+.withDatabase(dbName)
+.withCollection(dbCollection)
+.expand(begin);
+
+return readDocuments
+// TODO: figure out a way convert Document directly to Row.
+.apply("Convert Document to JSON", createParserParDo())
+.apply("Transform JSON to Row", JsonToRow.withSchema(getSchema()))
+.setRowSchema(getSchema());
+  }
+
+  @Override
+  public POutput buildIOWriter(PCollection input) {
+throw new UnsupportedOperationException("Writing to a MongoDB is not 
supported");
+  }
+
+  @Override
+  public IsBounded isBounded() {
+return IsBounded.BOUNDED;
+  }
+
+  @Override
+  public BeamTableStatistics getTableStatistics(PipelineOptions options) {
+return BeamTableStatistics.BOUNDED_UNKNOWN;
 
 Review comment:
   Does MongoDB have any interface we could use to implement this? If so a 
TODO/Jira pointing in the right direction would be nice. If not, maybe add a 
comment stating why.
 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334443=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334443
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339272952
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/package-info.java
 ##
 @@ -0,0 +1,20 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/** Table schema for MongoDb. */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
 
 Review comment:
   I think these package-info files are supposed to have 
`@DefaultAnnotation(NonNull.class)` like 
[this](https://github.com/apache/beam/blob/928c4df93491f1d15ccfe36fe7887a27e4bdfbdc/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/pubsub/package-info.java#L20).
 (Although it looks like not all of them [do right 
now](https://github.com/apache/beam/blob/928c4df93491f1d15ccfe36fe7887a27e4bdfbdc/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/bigquery/package-info.java#L20)).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334443)
Time Spent: 50m  (was: 40m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334441=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334441
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339273986
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
 
 Review comment:
   I think maybe this `?` is a copy-pasta error. That's not actually part of 
the format is it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334441)
Time Spent: 50m  (was: 40m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=33=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-33
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339273805
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
+this.dbUri = matcher.group("credsHostPort"); // "mongodb://localhost:27017"
+this.dbName = matcher.group("database");
+this.dbCollection = matcher.group("collection");
+  }
+
+  @Override
+  public PCollection buildIOReader(PBegin begin) {
+// Read MongoDb Documents
+PCollection readDocuments =
+MongoDbIO.read()
+.withUri(dbUri)
+.withDatabase(dbName)
+.withCollection(dbCollection)
+.expand(begin);
+
+return readDocuments
+// TODO: figure out a way convert Document directly to Row.
+.apply("Convert Document to JSON", createParserParDo())
+.apply("Transform JSON to Row", JsonToRow.withSchema(getSchema()))
+.setRowSchema(getSchema());
+  }
+
+  @Override
+  public POutput buildIOWriter(PCollection input) {
+throw new UnsupportedOperationException("Writing to a MongoDB is not 
supported");
+  }
+
+  @Override
+  public IsBounded isBounded() {
+return IsBounded.BOUNDED;
+  }
+
+  @Override
+  public BeamTableStatistics getTableStatistics(PipelineOptions options) {
+return BeamTableStatistics.BOUNDED_UNKNOWN;
+  }
+
+  @VisibleForTesting
+  SingleOutput createParserParDo() {
+return ParDo.of(new DocumentToJsonStringConverter());
+  }
+
+  // TODO: add support for complex fields.
+  

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334440
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339264449
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
+this.dbUri = matcher.group("credsHostPort"); // "mongodb://localhost:27017"
+this.dbName = matcher.group("database");
+this.dbCollection = matcher.group("collection");
+  }
+
+  @Override
+  public PCollection buildIOReader(PBegin begin) {
+// Read MongoDb Documents
+PCollection readDocuments =
+MongoDbIO.read()
+.withUri(dbUri)
+.withDatabase(dbName)
+.withCollection(dbCollection)
+.expand(begin);
+
+return readDocuments
+// TODO: figure out a way convert Document directly to Row.
 
 Review comment:
   Can you make a jira for this and change this to `TODO(BEAM-)`? Also 
maybe add a note there about how it should probably use `RowWithGetters` rather 
than `RowWithStorage`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334440)
Time Spent: 40m  (was: 0.5h)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334438=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334438
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339268863
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java
 ##
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING;
+import static org.junit.Assert.assertEquals;
+
+import com.mongodb.MongoClient;
+import java.util.Arrays;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.io.mongodb.MongoDBIOIT.MongoDBPipelineOptions;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.SimpleFunction;
+import org.apache.beam.sdk.transforms.ToJson;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.bson.Document;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * A test of {@link 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbTable} on an
+ * independent Mongo instance.
+ *
+ * This test requires a running instance of MongoDB. Pass in connection 
information using
+ * PipelineOptions:
+ *
+ * 
+ *  ./gradlew integrationTest -p sdks/java/extensions/sql/integrationTest 
-DintegrationTestPipelineOptions='[
+ *  "--mongoDBHostName=1.2.3.4",
+ *  "--mongoDBPort=27017",
+ *  "--mongoDBDatabaseName=mypass",
+ *  "--numberOfRecords=1000" ]'
+ *  --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbReadWriteIT
+ *  -DintegrationTestRunner=direct
+ * 
+ *
+ * Please see 'build_rules.gradle' file for instructions regarding running 
this test using Beam
+ * performance testing framework.
+ */
+@RunWith(JUnit4.class)
+public class MongoDbReadWriteIT {
+  private static final Schema SOURCE_SCHEMA =
+  Schema.builder()
+  .addNullableField("_id", STRING)
+  .addNullableField("c_bigint", INT64)
+  .addNullableField("c_tinyint", BYTE)
+  .addNullableField("c_smallint", INT16)
+  .addNullableField("c_integer", INT32)
+  .addNullableField("c_float", FLOAT)
+  .addNullableField("c_double", DOUBLE)
+  .addNullableField("c_boolean", BOOLEAN)
+  .addNullableField("c_varchar", STRING)
+  .addNullableField("c_arr", FieldType.array(STRING))
+  .build();
+  private static final String collection = "collection";
+  private static MongoDBPipelineOptions options;
+
+  @Rule public final TestPipeline writePipeline = TestPipeline.create();
+  @Rule public final 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334436=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334436
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339272432
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTableTest.java
 ##
 @@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING;
+
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.bson.Document;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+@RunWith(JUnit4.class)
+public class MongoDbTableTest {
+
+  private static final Schema SCHEMA =
+  Schema.builder()
+  .addNullableField("long", INT64)
+  .addNullableField("int32", INT32)
+  .addNullableField("int16", INT16)
+  .addNullableField("byte", BYTE)
+  .addNullableField("bool", BOOLEAN)
+  .addNullableField("double", DOUBLE)
+  .addNullableField("float", FLOAT)
+  .addNullableField("string", STRING)
+  .addNullableField("arr", FieldType.array(STRING))
+  .build();
+  private static final String JSON_ROW =
+  "{ "
+  + "\"long\" : 9223372036854775807, "
+  + "\"int32\" : 2147483647, "
+  + "\"int16\" : 32767, "
+  + "\"byte\" : 127, "
+  + "\"bool\" : true, "
+  + "\"double\" : 1.0, "
+  + "\"float\" : 1.0, "
+  + "\"string\" : \"string\", "
+  + "\"arr\" : [\"str1\", \"str2\", \"str3\"]"
+  + " }";
+  private static final MongoDbTable SQL_TABLE =
+  (MongoDbTable)
+  new MongoDbTableProvider()
+  .buildBeamSqlTable(
+  fakeTable("TEST", 
"mongodb://localhost:27017/database/collection"));
+
+  @Rule public transient TestPipeline pipeline = TestPipeline.create();
+
+  @Test
+  public void testDocumentToRowConverter() {
+PCollection output =
+pipeline
+.apply("Create document from JSON", 
Create.of(Document.parse(JSON_ROW)))
+.apply("Decode document back into JSON", 
SQL_TABLE.createParserParDo());
+
+// Make sure JSON was decoded correctly
+PAssert.that(output).containsInAnyOrder(JSON_ROW);
 
 Review comment:
   I think this is brittle, since the string comparison can fail if the keys 
are written out in a different order, and I don't think there's anything 
guaranteeing the order. Maybe instead verify that `output` and `JSON_ROW` can 
be parsed into equivalent rows?
   
   This is a problem I keep running into as I'm writing tests with JSON, it 
would be nice to have some general purpose way to assert that two strings are 
equivalent JSON. Maybe we just need to write our own `Matcher` that parses to 
jackson nodes and compares. (You don't need to do that here 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334442=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334442
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339272927
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbReadWriteIT.java
 ##
 @@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BOOLEAN;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.BYTE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.DOUBLE;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.FLOAT;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT16;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT32;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.INT64;
+import static org.apache.beam.sdk.schemas.Schema.FieldType.STRING;
+import static org.junit.Assert.assertEquals;
+
+import com.mongodb.MongoClient;
+import java.util.Arrays;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv;
+import org.apache.beam.sdk.extensions.sql.impl.rel.BeamSqlRelUtils;
+import org.apache.beam.sdk.io.mongodb.MongoDBIOIT.MongoDBPipelineOptions;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptionsFactory;
+import org.apache.beam.sdk.schemas.Schema;
+import org.apache.beam.sdk.schemas.Schema.FieldType;
+import org.apache.beam.sdk.testing.PAssert;
+import org.apache.beam.sdk.testing.TestPipeline;
+import org.apache.beam.sdk.transforms.Create;
+import org.apache.beam.sdk.transforms.MapElements;
+import org.apache.beam.sdk.transforms.SimpleFunction;
+import org.apache.beam.sdk.transforms.ToJson;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.Row;
+import org.bson.Document;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.JUnit4;
+
+/**
+ * A test of {@link 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbTable} on an
+ * independent Mongo instance.
+ *
+ * This test requires a running instance of MongoDB. Pass in connection 
information using
+ * PipelineOptions:
+ *
+ * 
+ *  ./gradlew integrationTest -p sdks/java/extensions/sql/integrationTest 
-DintegrationTestPipelineOptions='[
+ *  "--mongoDBHostName=1.2.3.4",
+ *  "--mongoDBPort=27017",
+ *  "--mongoDBDatabaseName=mypass",
+ *  "--numberOfRecords=1000" ]'
+ *  --tests 
org.apache.beam.sdk.extensions.sql.meta.provider.mongodb.MongoDbReadWriteIT
+ *  -DintegrationTestRunner=direct
+ * 
+ *
+ * Please see 'build_rules.gradle' file for instructions regarding running 
this test using Beam
+ * performance testing framework.
+ */
+@RunWith(JUnit4.class)
+public class MongoDbReadWriteIT {
+  private static final Schema SOURCE_SCHEMA =
+  Schema.builder()
+  .addNullableField("_id", STRING)
+  .addNullableField("c_bigint", INT64)
+  .addNullableField("c_tinyint", BYTE)
+  .addNullableField("c_smallint", INT16)
+  .addNullableField("c_integer", INT32)
+  .addNullableField("c_float", FLOAT)
+  .addNullableField("c_double", DOUBLE)
+  .addNullableField("c_boolean", BOOLEAN)
+  .addNullableField("c_varchar", STRING)
+  .addNullableField("c_arr", FieldType.array(STRING))
+  .build();
+  private static final String collection = "collection";
+  private static MongoDBPipelineOptions options;
+
+  @Rule public final TestPipeline writePipeline = TestPipeline.create();
+  @Rule public final 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334437=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334437
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:59
Start Date: 25/Oct/19 23:59
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9806: 
[BEAM-8427] Create a table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#discussion_r339265846
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/mongodb/MongoDbTable.java
 ##
 @@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.sdk.extensions.sql.meta.provider.mongodb;
+
+import static 
org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkArgument;
+
+import java.io.Serializable;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+import org.apache.beam.sdk.annotations.Experimental;
+import org.apache.beam.sdk.extensions.sql.impl.BeamTableStatistics;
+import org.apache.beam.sdk.extensions.sql.meta.SchemaBaseBeamTable;
+import org.apache.beam.sdk.extensions.sql.meta.Table;
+import org.apache.beam.sdk.io.mongodb.MongoDbIO;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.transforms.DoFn;
+import org.apache.beam.sdk.transforms.JsonToRow;
+import org.apache.beam.sdk.transforms.ParDo;
+import org.apache.beam.sdk.transforms.ParDo.SingleOutput;
+import org.apache.beam.sdk.values.PBegin;
+import org.apache.beam.sdk.values.PCollection;
+import org.apache.beam.sdk.values.PCollection.IsBounded;
+import org.apache.beam.sdk.values.POutput;
+import org.apache.beam.sdk.values.Row;
+import 
org.apache.beam.vendor.calcite.v1_20_0.com.google.common.annotations.VisibleForTesting;
+import org.bson.Document;
+import org.bson.json.JsonMode;
+import org.bson.json.JsonWriterSettings;
+
+@Experimental
+public class MongoDbTable extends SchemaBaseBeamTable implements Serializable {
+  // Should match: 
mongodb://username:password@localhost:27017/database/collection
+  @VisibleForTesting
+  final Pattern locationPattern =
+  Pattern.compile(
+  
"(?mongodb://(?.*:.*@)?.+:\\d+)/(?.+)/(?.+)");
+
+  @VisibleForTesting final String dbCollection;
+  @VisibleForTesting final String dbName;
+  @VisibleForTesting final String dbUri;
+
+  MongoDbTable(Table table) {
+super(table.getSchema());
+
+String location = table.getLocation();
+Matcher matcher = locationPattern.matcher(location);
+checkArgument(
+matcher.matches(),
+"MongoDb location must be in the following format: 
'mongodb://(username:password@)?localhost:27017/database/collection'");
+this.dbUri = matcher.group("credsHostPort"); // "mongodb://localhost:27017"
+this.dbName = matcher.group("database");
+this.dbCollection = matcher.group("collection");
+  }
+
+  @Override
+  public PCollection buildIOReader(PBegin begin) {
+// Read MongoDb Documents
+PCollection readDocuments =
+MongoDbIO.read()
+.withUri(dbUri)
+.withDatabase(dbName)
+.withCollection(dbCollection)
+.expand(begin);
+
+return readDocuments
+// TODO: figure out a way convert Document directly to Row.
+.apply("Convert Document to JSON", createParserParDo())
+.apply("Transform JSON to Row", JsonToRow.withSchema(getSchema()))
+.setRowSchema(getSchema());
+  }
+
+  @Override
+  public POutput buildIOWriter(PCollection input) {
+throw new UnsupportedOperationException("Writing to a MongoDB is not 
supported");
+  }
+
+  @Override
+  public IsBounded isBounded() {
+return IsBounded.BOUNDED;
+  }
+
+  @Override
+  public BeamTableStatistics getTableStatistics(PipelineOptions options) {
+return BeamTableStatistics.BOUNDED_UNKNOWN;
+  }
+
+  @VisibleForTesting
+  SingleOutput createParserParDo() {
+return ParDo.of(new DocumentToJsonStringConverter());
+  }
+
+  // TODO: add support for complex fields.
 
 Review comment:
 

[jira] [Work logged] (BEAM-8481) Python 3.7 Postcommit test -- frequent timeouts

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8481?focusedWorklogId=334435=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334435
 ]

ASF GitHub Bot logged work on BEAM-8481:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:57
Start Date: 25/Oct/19 23:57
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9896: [BEAM-8481] 
Increasing Py3.7 postcommit job timeout after 120 minutes instead of 100
URL: https://github.com/apache/beam/pull/9896#issuecomment-546547343
 
 
   Thank you, @pabloem !
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334435)
Remaining Estimate: 0h
Time Spent: 10m

> Python 3.7 Postcommit test -- frequent timeouts
> ---
>
> Key: BEAM-8481
> URL: https://issues.apache.org/jira/browse/BEAM-8481
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Ahmet Altay
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/job/beam_PostCommit_Python37/] – this suite 
> seemingly frequently timing out. Other suites are not affected by these 
> timeouts. From the history, the issues started before Oct 10 and we cannot 
> pinpoint because history is lost.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334434=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334434
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:56
Start Date: 25/Oct/19 23:56
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546547206
 
 
   > Hm I think `BenchmarkConfig` runs the benchmark multiple times on a fixed 
size. I think I prefer running the varying sizes to get a linear regression of 
runtime vs number of elements. That allows to extract the per-element overhead, 
and the fixed cost of running the NxM stages...
   
   Linear regression sounds like something that can be reused in other 
benchmarks, so you can consider generalizing the launcher in utils.py to allow 
that.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334434)
Time Spent: 1h 20m  (was: 1h 10m)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8493) Add standard double coder to Go SDK

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8493?focusedWorklogId=334433=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334433
 ]

ASF GitHub Bot logged work on BEAM-8493:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:55
Start Date: 25/Oct/19 23:55
Worklog Time Spent: 10m 
  Work Description: youngoli commented on pull request #9897: [BEAM-8493] 
Add standard double coder to Go SDK.
URL: https://github.com/apache/beam/pull/9897
 
 
   For upcoming features (in this case, SDF), we need to support the
   standard double coder from beam_runner_api.proto. This commit adds
   relevant support.
   
   This change is heavily based off #9809 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334432
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:52
Start Date: 25/Oct/19 23:52
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-546546758
 
 
   Looks cleaner. Could you ping once all open comments are resolved.
   
   (I will be OOO most of next week but @pabloem can finish the review.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334432)
Time Spent: 15h 40m  (was: 15.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8493) Add standard double coder to Go SDK

2019-10-25 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8493 started by Daniel Oliveira.
-
> Add standard double coder to Go SDK
> ---
>
> Key: BEAM-8493
> URL: https://issues.apache.org/jira/browse/BEAM-8493
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
>
> For upcoming features, we need to add support for the standard double coder: 
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L573]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8493) Add standard double coder to Go SDK

2019-10-25 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira updated BEAM-8493:
--
Status: Open  (was: Triage Needed)

> Add standard double coder to Go SDK
> ---
>
> Key: BEAM-8493
> URL: https://issues.apache.org/jira/browse/BEAM-8493
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Minor
>
> For upcoming features, we need to add support for the standard double coder: 
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L573]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8493) Add standard double coder to Go SDK

2019-10-25 Thread Daniel Oliveira (Jira)
Daniel Oliveira created BEAM-8493:
-

 Summary: Add standard double coder to Go SDK
 Key: BEAM-8493
 URL: https://issues.apache.org/jira/browse/BEAM-8493
 Project: Beam
  Issue Type: New Feature
  Components: sdk-go
Reporter: Daniel Oliveira
Assignee: Daniel Oliveira


For upcoming features, we need to add support for the standard double coder: 
[https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L573]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-10-25 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira resolved BEAM-7970.
---
Fix Version/s: Not applicable
   Resolution: Won't Fix

> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8167) Replace old Postcommit_Python_Verify test with new python postcommits in Grafana.

2019-10-25 Thread Daniel Oliveira (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira resolved BEAM-8167.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Replace old Postcommit_Python_Verify test with new python postcommits in 
> Grafana.
> -
>
> Key: BEAM-8167
> URL: https://issues.apache.org/jira/browse/BEAM-8167
> Project: Beam
>  Issue Type: Bug
>  Components: project-management, testing
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The old postcommit was replaced with multiple new postcommits for each 
> supported python version (Python2, Python35, Python36, Python37). These 
> aren't being reflected on the Grafana dashboards yet, so that should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-25 Thread Pablo Estrada (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-8446.
-
Fix Version/s: Not applicable
   Resolution: Fixed

Fixed. Issue now is that the build is timing out : (

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334431
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:39
Start Date: 25/Oct/19 23:39
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-546545161
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334431)
Time Spent: 3h 40m  (was: 3.5h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=334430=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334430
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:37
Start Date: 25/Oct/19 23:37
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9858: [BEAM-8446] 
Adding a test checking the wait for BQ jobs
URL: https://github.com/apache/beam/pull/9858
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334430)
Time Spent: 6h 10m  (was: 6h)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8492) Python typehints: don't try to strip_iterable from None

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8492?focusedWorklogId=334425=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334425
 ]

ASF GitHub Bot logged work on BEAM-8492:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:14
Start Date: 25/Oct/19 23:14
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9895: [BEAM-8492] Allow 
None return hints for DoFn.process
URL: https://github.com/apache/beam/pull/9895
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-8492) Python typehints: don't try to strip_iterable from None

2019-10-25 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8492:

Status: Open  (was: Triage Needed)

> Python typehints: don't try to strip_iterable from None
> ---
>
> Key: BEAM-8492
> URL: https://issues.apache.org/jira/browse/BEAM-8492
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> The return value of DoFn.process can be an iterable of elements or None.
> Handle the case when the output type hint of process is None.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8492) Python typehints: don't try to strip_iterable from None

2019-10-25 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8492:
---

 Summary: Python typehints: don't try to strip_iterable from None
 Key: BEAM-8492
 URL: https://issues.apache.org/jira/browse/BEAM-8492
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


The return value of DoFn.process can be an iterable of elements or None.
Handle the case when the output type hint of process is None.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334423=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334423
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 23:10
Start Date: 25/Oct/19 23:10
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546540625
 
 
   Hm I think `BenchmarkConfig` runs the benchmark multiple times on a fixed 
size. I think I prefer running the varying sizes to get a linear regression of 
runtime vs number of elements. That allows to extract the per-element overhead, 
and the fixed cost of running the NxM stages...
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334423)
Time Spent: 1h 10m  (was: 1h)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334415
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:53
Start Date: 25/Oct/19 22:53
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546537886
 
 
   Thanks for the tip Valentyn. I'll add that
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334415)
Time Spent: 1h  (was: 50m)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8491) Add ability for multiple output PCollections from composites

2019-10-25 Thread Sam Rohde (Jira)
Sam Rohde created BEAM-8491:
---

 Summary: Add ability for multiple output PCollections from 
composites
 Key: BEAM-8491
 URL: https://issues.apache.org/jira/browse/BEAM-8491
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Sam Rohde
Assignee: Sam Rohde


The Python SDK has DoOutputTuples which allows for a single transform to have 
multiple outputs. However, this does not include the ability for a composite 
transform to have multiple outputs PCollections from different transforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8490) Python typehints: properly resolve empty dict type

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8490?focusedWorklogId=334406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334406
 ]

ASF GitHub Bot logged work on BEAM-8490:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:33
Start Date: 25/Oct/19 22:33
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9894: [BEAM-8490] Fix 
instance_to_type of empty dict
URL: https://github.com/apache/beam/pull/9894
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334404=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334404
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:31
Start Date: 25/Oct/19 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546533905
 
 
   Consider reusing 
https://github.com/apache/beam/blob/51efa2efbd850838092d9b4f5db12da1eb54d6e6/sdks/python/apache_beam/tools/utils.py#L47,
 you can look at coders_microbenchmark for an example. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334404)
Time Spent: 50m  (was: 40m)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334405
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:31
Start Date: 25/Oct/19 22:31
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9881: [BEAM-8397] Fix 
infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#issuecomment-546534027
 
 
   All tests passed, this is ready for review.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334405)
Time Spent: 50m  (was: 40m)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> 

[jira] [Created] (BEAM-8490) Python typehints: properly resolve empty dict type

2019-10-25 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8490:
---

 Summary: Python typehints: properly resolve empty dict type
 Key: BEAM-8490
 URL: https://issues.apache.org/jira/browse/BEAM-8490
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


Currently:
{code}
trivial_inference.instance_to_type({})
{code}
returns
{code}
Dict[Union[], Union[]]
{code}
instead of
{code}
Dict[Any,Any]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8490) Python typehints: properly resolve empty dict type

2019-10-25 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8490:

Status: Open  (was: Triage Needed)

> Python typehints: properly resolve empty dict type
> --
>
> Key: BEAM-8490
> URL: https://issues.apache.org/jira/browse/BEAM-8490
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> Currently:
> {code}
> trivial_inference.instance_to_type({})
> {code}
> returns
> {code}
> Dict[Union[], Union[]]
> {code}
> instead of
> {code}
> Dict[Any,Any]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8489) Python typehints: filter callable output type hint should not be used

2019-10-25 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8489:

Status: Open  (was: Triage Needed)

> Python typehints: filter callable output type hint should not be used
> -
>
> Key: BEAM-8489
> URL: https://issues.apache.org/jira/browse/BEAM-8489
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> A filter function returns bool, while the Filter() transform outputs the same 
> element type as the input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334396=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334396
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:21
Start Date: 25/Oct/19 22:21
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9741: [BEAM-7926] Visualize 
PCollection
URL: https://github.com/apache/beam/pull/9741#issuecomment-546531751
 
 
   > Thank you.
   > 
   > Could you add your changes in new commits, instead of force pushing. 
Reviewing incremental changes that way is easier.
   
   Yes, I'll do that from now on. Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334396)
Time Spent: 15.5h  (was: 15h 20m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 15.5h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334394
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:20
Start Date: 25/Oct/19 22:20
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339258080
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -189,7 +189,7 @@ commands =
   time {toxinidir}/scripts/run_pylint.sh
 
 [testenv:docs]
-extras = test,gcp,docs
+extras = test,gcp,docs,interactive
 
 Review comment:
   Yes, `docs` needs it because it's generating documentation for all source 
code. Kind of like a static scan. That's why `test` and `gcp` packages have 
also been included here.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334394)
Time Spent: 15h 10m  (was: 15h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 15h 10m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334395
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:20
Start Date: 25/Oct/19 22:20
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339258088
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -208,3 +208,13 @@ commands =
   coverage report --skip-covered
   # Generate report in xml format
   coverage xml
+
+[testenv:py36-interactive]
+extras = test,interactive
+commands =
+  python setup.py nosetests --where 'apache_beam/runners/interactive' {posargs}
 
 Review comment:
   TL;DR: For simplicity, I'll just change to run all tests.
   
   It's possible that after setting up a Py36 (or any other) environment with 
interactive dependencies, some dependencies would conflict or be incompatible. 
Like the rollback we had, in some test setup (could be totally different from 
the tox env we have in this repo), some execution path used a newer versioned 
IPython through transitive dependency causing package usage to fail if there is 
no try-except for relative import path.
   To ensure at least all test environments work without conflict in 
dependencies, we can use the same `commands` used by `testenv:py36-gcp`. It's 
just now we rerun many tests simply because the result of `pip install ...` 
might be different.
   But the test suites are executed in parallel, this just adds more workload 
to Jenkins without increasing sequenced task elapse, so it should be fine.
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334395)
Time Spent: 15h 20m  (was: 15h 10m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 15h 20m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334343=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334343
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 19:58
Start Date: 25/Oct/19 19:58
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9887: 
[release-2.17.0] Revert "Merge pull request #9854 from [BEAM-8457] Label 
Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887
 
 
   … from Notebook"
   
   This reverts commit 1a8391da9222ab8d0493b0007bd60bdbeeb5e275.
   
   **This is a cherry pick of PR #9879**
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8487) Python typehints: support forward references

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8487?focusedWorklogId=334345=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334345
 ]

ASF GitHub Bot logged work on BEAM-8487:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:03
Start Date: 25/Oct/19 20:03
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9888: [BEAM-8487] 
Convert forward references to Any
URL: https://github.com/apache/beam/pull/9888
 
 
   Changes convert_to_beam_type to support string literals, but requires #9602 
to actually call convert_to_beam_type on typehints.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Updated] (BEAM-8487) Python typehints: support forward references

2019-10-25 Thread Udi Meiri (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Udi Meiri updated BEAM-8487:

Status: Open  (was: Triage Needed)

> Python typehints: support forward references
> 
>
> Key: BEAM-8487
> URL: https://issues.apache.org/jira/browse/BEAM-8487
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Udi Meiri
>Assignee: Udi Meiri
>Priority: Major
>
> Typehints may be given as string literals: 
> https://www.python.org/dev/peps/pep-0484/#forward-references
> These are currently not evaluated and result in errors.
> Example 1:
> {code}
>   def test_typed_callable_string_hints(self):
> def do_fn(element: 'int') -> 'typehints.List[str]':
>   return [[str(element)] * 2]
> result = [1, 2] | beam.ParDo(do_fn)
> self.assertEqual([['1', '1'], ['2', '2']], sorted(result))
> {code}
> This results in:
> {code}
> > return issubclass(sub, base)
> E TypeError: issubclass() arg 2 must be a class or tuple of classes
> typehints.py:1168: TypeError
> {code}
> Example 2:
> {code}
>   def test_typed_dofn_string_hints(self):
> class MyDoFn(beam.DoFn):
>   def process(self, element: 'int') -> 'typehints.List[str]':
> return [[str(element)] * 2]
> result = [1, 2] | beam.ParDo(MyDoFn())
> self.assertEqual([['1', '1'], ['2', '2']], sorted(result))
> {code}
> This results in:
> {code}
> > raise ValueError('%s is not iterable' % type_hint)
> E ValueError: typehints.List[str] is not iterable
> typehints.py:1194: ValueError
> {code}
> where the non-iterable entity the error refers to is a string literal 
> ("typehints.List[str]").



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334381=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334381
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:39
Start Date: 25/Oct/19 21:39
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546522382
 
 
   Thanks for starting to create microbenchmark for more components! Would you 
mind explaining more about what kind of regression this microbenchmark trying 
to catch? Regression of core SDK component? SDK harness? or just FnApiRunner?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334381)
Time Spent: 0.5h  (was: 20m)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334390=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334390
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 22:03
Start Date: 25/Oct/19 22:03
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546527944
 
 
   This is for the Fn API runner. I'm going to be making changes to it, and I 
want to make sure that performance will not worsen : )
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334390)
Time Spent: 40m  (was: 0.5h)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16960153#comment-16960153
 ] 

Valentyn Tymofieiev commented on BEAM-8397:
---

Filed https://bugs.python.org/issue38593 for uncaught recursion errors in 
Python 3.7.

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
> ...
> {noformat}
> cc: [~yoshiki.obata]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8480) Explicitly set restriction coder for bounded reader wrapper SDF.

2019-10-25 Thread Robert Bradshaw (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Bradshaw closed BEAM-8480.
-
Fix Version/s: 2.18.0
   Resolution: Fixed

> Explicitly set restriction coder for bounded reader wrapper SDF.
> 
>
> Key: BEAM-8480
> URL: https://issues.apache.org/jira/browse/BEAM-8480
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Restrictions are serialized by default with the default (Pickle) coder. In 
> this case, we try to serialize the source, which may require Dill (e.g. if it 
> references any counters or other non-trivial objects). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8480) Explicitly set restriction coder for bounded reader wrapper SDF.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8480?focusedWorklogId=334385=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334385
 ]

ASF GitHub Bot logged work on BEAM-8480:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:45
Start Date: 25/Oct/19 21:45
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9877: [BEAM-8480] 
Explicitly set restriction coder for bounded reader wrapper SDF.
URL: https://github.com/apache/beam/pull/9877
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334385)
Time Spent: 0.5h  (was: 20m)

> Explicitly set restriction coder for bounded reader wrapper SDF.
> 
>
> Key: BEAM-8480
> URL: https://issues.apache.org/jira/browse/BEAM-8480
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Restrictions are serialized by default with the default (Pickle) coder. In 
> this case, we try to serialize the source, which may require Dill (e.g. if it 
> references any counters or other non-trivial objects). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8480) Explicitly set restriction coder for bounded reader wrapper SDF.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8480?focusedWorklogId=334384=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334384
 ]

ASF GitHub Bot logged work on BEAM-8480:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:44
Start Date: 25/Oct/19 21:44
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9877: [BEAM-8480] 
Explicitly set restriction coder for bounded reader wrapper SDF.
URL: https://github.com/apache/beam/pull/9877#issuecomment-546523761
 
 
   Yes, the default coder is FastPrimitivesCoder, which delegates (as a last 
resort) to PickleCoder, which can't handle pickling arbitrary Source objects. 
But this is still a better default most of the time--we use FastPrimitivesCoder 
for data and dill for UserFns. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334384)
Time Spent: 20m  (was: 10m)

> Explicitly set restriction coder for bounded reader wrapper SDF.
> 
>
> Key: BEAM-8480
> URL: https://issues.apache.org/jira/browse/BEAM-8480
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Restrictions are serialized by default with the default (Pickle) coder. In 
> this case, we try to serialize the source, which may require Dill (e.g. if it 
> references any counters or other non-trivial objects). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334377=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334377
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:20
Start Date: 25/Oct/19 21:20
Worklog Time Spent: 10m 
  Work Description: 11moon11 commented on pull request #9892: [BEAM-8427] 
[SQL] buildIOWrite from MongoDb Table
URL: https://github.com/apache/beam/pull/9892
 
 
   - Implemented write functionality for MongoDbTable.
   - Updated conversion logic for RowJsonSerializer.
   
   Based on top of #9806.
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334374=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334374
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:15
Start Date: 25/Oct/19 21:15
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9854: [BEAM-8457] Label 
Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9854#issuecomment-546516062
 
 
   +1. We should not be importing the interactive runner (it's causing problems 
with tests as well), and interactivity should not be a property of the 
pipeline, but of the runner (and I'd prefer a design that avoid passing an 
interactive bit around everywhere). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334374)
Time Spent: 3.5h  (was: 3h 20m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334372
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:13
Start Date: 25/Oct/19 21:13
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891#issuecomment-546515515
 
 
   r: @boyuanzz 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334372)
Time Spent: 20m  (was: 10m)

> A microbenchmark that exercises the FnAPI runner functionality
> --
>
> Key: BEAM-8474
> URL: https://issues.apache.org/jira/browse/BEAM-8474
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Pablo Estrada
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The main data paths for the Fn API runner are exercised by:
>  * Side inputs
>  * GBK
>  * State
>  * Timers
>  * SDF?
> A microbenchmark would have a number of stages that exercise one or more of 
> these data paths.
> A microbenchmark suite may have more than one pipeline.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8474) A microbenchmark that exercises the FnAPI runner functionality

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8474?focusedWorklogId=334371=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334371
 ]

ASF GitHub Bot logged work on BEAM-8474:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:13
Start Date: 25/Oct/19 21:13
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9891: [BEAM-8474] A 
microbenchmark for Python FnApiRunner:
URL: https://github.com/apache/beam/pull/9891
 
 
   Overall runtime of this microbenchmark is about one minute.
   
   This runs a series of N parallel pipelines with M parallel stages each. Each
   stage does the following:
   
   1) Put all the PCollection elements in state
   2) Set a timer for the future
   3) When the timer fires, change the key and output all the elements 
downstream
   
   Results:
   ```
   python -m apache_beam.tools.fn_api_runner_microbenchmark
1 element  3.97874 sec
  101 elements 5.81925 sec
  201 elements 6.12566 sec
  301 elements 6.57774 sec
  401 elements 7.43154 sec
  501 elements 7.97614 sec
  601 elements 8.63536 sec
  701 elements 9.10238 sec
  801 elements 9.17786 sec
  901 elements 10.0178 sec
   
   Fixed cost   4.765602442423502
   Per-element  0.00017222933748583773
   R^2  0.9607312619599335
   ```
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 

[jira] [Work logged] (BEAM-8427) [SQL] Add support for MongoDB source

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8427?focusedWorklogId=334373=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334373
 ]

ASF GitHub Bot logged work on BEAM-8427:


Author: ASF GitHub Bot
Created on: 25/Oct/19 21:13
Start Date: 25/Oct/19 21:13
Worklog Time Spent: 10m 
  Work Description: apilloud commented on issue #9806: [BEAM-8427] Create a 
table and a table provider for MongoDB
URL: https://github.com/apache/beam/pull/9806#issuecomment-546515561
 
 
   R: @TheNeuralBit 
   Brian, can you take a look at this? You are probably more familiar with JSON 
stuff than I am.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334373)
Time Spent: 20m  (was: 10m)

> [SQL] Add support for MongoDB source
> 
>
> Key: BEAM-8427
> URL: https://issues.apache.org/jira/browse/BEAM-8427
> Project: Beam
>  Issue Type: New Feature
>  Components: dsl-sql
>Reporter: Kirill Kozlov
>Assignee: Kirill Kozlov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In progress:
>  * Create a MongoDB table and table provider.
>  * Implement buildIOReader
>  * Support primitive types
> Still needs to be done:
>  * Implement buildIOWrite
>  * improve getTableStatistics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334368=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334368
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:55
Start Date: 25/Oct/19 20:55
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9885: [BEAM-8457] 
Label Dataflow jobs from Notebook
URL: https://github.com/apache/beam/pull/9885#discussion_r339235798
 
 

 ##
 File path: sdks/python/apache_beam/pipeline.py
 ##
 @@ -396,28 +400,57 @@ def replace_all(self, replacements):
 for override in replacements:
   self._check_replacement(override)
 
-  def run(self, test_runner_api=True):
-"""Runs the pipeline. Returns whatever our runner returns after running."""
-
+  def run(self, test_runner_api=True, runner=None, options=None,
+  interactive=None):
+"""Runs the pipeline. Returns whatever our runner returns after running.
+
+If another runner instance and options are provided, that runner will
+execute the pipeline with the given options. If either of them is not set,
+a ValueError is raised. The usage is similar to directly invoking
+`runner.run_pipeline(pipeline, options)`.
+Additionally, an interactive field can be set to override the pipeline's
+self.interactive field to mark current pipeline as being initiated from an
+interactive environment.
+"""
+if interactive:
+  self.interactive = interactive
+elif (type(self.runner).__module__
+  == 'apache_beam.runners.interactive.interactive_runner' and
+  type(self.runner).__name__ == 'InteractiveRunner'):
 
 Review comment:
   This is the difference from previous 
[PR](https://www.google.com/url?q=https://github.com/apache/beam/pull/9854). 
   All runners are using  "new-style" classes in Python, the 
`type(obj).__module__/__name__` should always work. Please let me know if there 
would be backward incompatible cases. 
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334368)
Time Spent: 3h 20m  (was: 3h 10m)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=334357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334357
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:38
Start Date: 25/Oct/19 20:38
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9844: [BEAM-8372] Support both 
flink_master and flink_master_url parameter
URL: https://github.com/apache/beam/pull/9844#issuecomment-546504944
 
 
   >`flink_master_url` is very new, I think we can consolidate on one option 
for now. 
   
   The instance in `FlinkJobServerDriver` has been there for over a year. It 
might be sufficient to convert it to an alias only there.
   
   >I think it would be good to get this resolved in the current release, if 
possible.
   
   +1
   
   >So, if I understand correctly, if `flink_master` is just host:port, there 
is no way to (simultaneously) submit a job to two different clusters (one with 
ssl, one without) without editing the config file in the middle? This seems 
quite unfortunate. Also, the code here does not read the config file.
   
   The `FlinkJobServerDriver` reads the config file before it submits the Flink 
job to the cluster. So the Python code does not have to worry about it. Yes, 
there is no way to submit against two clusters without changing the 
configuration file. You can start a second job server though, which uses a 
different configuration file.
   
   >What if, instead, we made the `http[s]://` part optional, where it would be 
stripped in Java and assumed to be just `http://` in Python if not specified? 
(Or it could try https and then fall back to http on failure.)
   
   That would be a new feature. Flink does not accept any url schemas like 
`http://`. It just expects host:port. The config file determines whether the 
Rest service running at the port uses SSL or not.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334357)
Time Spent: 6h 40m  (was: 6.5h)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 6h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8489) Python typehints: filter callable output type hint should not be used

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8489?focusedWorklogId=334356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334356
 ]

ASF GitHub Bot logged work on BEAM-8489:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:36
Start Date: 25/Oct/19 20:36
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9890: [BEAM-8489] 
Filter: don't use callable's output type
URL: https://github.com/apache/beam/pull/9890
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more 
tips on [how to make review process 
smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-8489) Python typehints: filter callable output type hint should not be used

2019-10-25 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8489:
---

 Summary: Python typehints: filter callable output type hint should 
not be used
 Key: BEAM-8489
 URL: https://issues.apache.org/jira/browse/BEAM-8489
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Udi Meiri
Assignee: Udi Meiri


A filter function returns bool, while the Filter() transform outputs the same 
element type as the input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8446) apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types is flaky

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8446?focusedWorklogId=334354=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334354
 ]

ASF GitHub Bot logged work on BEAM-8446:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:27
Start Date: 25/Oct/19 20:27
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9858: [BEAM-8446] 
Adding a test checking the wait for BQ jobs
URL: https://github.com/apache/beam/pull/9858#discussion_r339226407
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py
 ##
 @@ -424,6 +425,90 @@ def test_records_traverse_transform_with_mocks(self):
   assert_that(jobs,
   equal_to([job_reference]), label='CheckJobs')
 
+  @unittest.skipIf(sys.version_info[0] == 2, 
+   'Mock pickling problems in Py 2')
+  @mock.patch('time.sleep')
+  def test_wait_for_job_completion(self, sleep_mock):
+job_references = [bigquery_api.JobReference(),
+  bigquery_api.JobReference()]
+job_references[0].projectId = 'project1'
+job_references[0].jobId = 'jobId1'
+job_references[1].projectId = 'project1'
+job_references[1].jobId = 'jobId2'
+
+job_1_waiting = mock.Mock()
+job_1_waiting.status.state = 'RUNNING'
+job_2_done = mock.Mock()
+job_2_done.status.state = 'DONE'
+job_2_done.status.errorResult = None
+
+job_1_done = mock.Mock()
+job_1_done.status.state = 'DONE'
+job_1_done.status.errorResult = None
+
+bq_client = mock.Mock()
+bq_client.jobs.Get.side_effect = [
+job_1_waiting,
+job_2_done,
+job_1_done,
+job_2_done]
+
+waiting_dofn = bqfl.WaitForBQJobs(bq_client)
+
+dest_list = [(i, job) for i, job in enumerate(job_references)]
+
+with TestPipeline('DirectRunner') as p:
+  references = beam.pvalue.AsList(p | 'job_ref' >> beam.Create(dest_list))
+  outputs = (p
+ | beam.Create([''])
+ | beam.ParDo(waiting_dofn, references))
+
+  assert_that(outputs,
+  equal_to(dest_list))
+
+sleep_mock.assert_called_once()
+
+  @unittest.skipIf(sys.version_info[0] == 2, 
+   'Mock pickling problems in Py 2')
+  @mock.patch('time.sleep')
+  def test_one_job_failed_after_waiting(self, sleep_mock):
+job_references = [bigquery_api.JobReference(),
+  bigquery_api.JobReference()]
+job_references[0].projectId = 'project1'
+job_references[0].jobId = 'jobId1'
+job_references[1].projectId = 'project1'
+job_references[1].jobId = 'jobId2'
+
+job_1_waiting = mock.Mock()
+job_1_waiting.status.state = 'RUNNING'
+job_2_done = mock.Mock()
+job_2_done.status.state = 'DONE'
+job_2_done.status.errorResult = None
+
+job_1_error = mock.Mock()
+job_1_error.status.state = 'DONE'
 
 Review comment:
   yes, it returns DONE with an error report
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334354)
Time Spent: 6h  (was: 5h 50m)

> apache_beam.io.gcp.bigquery_write_it_test.BigQueryWriteIntegrationTests.test_big_query_write_new_types
>  is flaky
> ---
>
> Key: BEAM-8446
> URL: https://issues.apache.org/jira/browse/BEAM-8446
> Project: Beam
>  Issue Type: New Feature
>  Components: test-failures
>Reporter: Boyuan Zhang
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> test_big_query_write_new_types appears to be flaky in 
> beam_PostCommit_Python37 test suite.
> https://builds.apache.org/job/beam_PostCommit_Python37/733/
> https://builds.apache.org/job/beam_PostCommit_Python37/739/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8456) BigQuery to Beam SQL timestamp has the wrong default: truncation makes the most sense

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8456?focusedWorklogId=334355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334355
 ]

ASF GitHub Bot logged work on BEAM-8456:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:27
Start Date: 25/Oct/19 20:27
Worklog Time Spent: 10m 
  Work Description: kennknowles commented on pull request #9849: 
[BEAM-8456] Add pipeline option to have Data Catalog truncate sub-millisecond 
precision
URL: https://github.com/apache/beam/pull/9849
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334355)
Time Spent: 3h 20m  (was: 3h 10m)

> BigQuery to Beam SQL timestamp has the wrong default: truncation makes the 
> most sense
> -
>
> Key: BEAM-8456
> URL: https://issues.apache.org/jira/browse/BEAM-8456
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kenneth Knowles
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Most of the time, a user reading a timestamp from BigQuery with 
> higher-than-millisecond precision timestamps may not even realize that the 
> data source created these high precision timestamps. They are probably 
> timestamps on log entries generated by a system with higher precision.
> If they are using it with Beam SQL, which only supports millisecond 
> precision, it makes sense to "just work" by default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8457) Instrument Dataflow jobs that are launched from Notebooks

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8457?focusedWorklogId=334352=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334352
 ]

ASF GitHub Bot logged work on BEAM-8457:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:21
Start Date: 25/Oct/19 20:21
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #9887: [release-2.17.0] 
Revert "Merge pull request #9854 from [BEAM-8457] Label Dataflow jobs…
URL: https://github.com/apache/beam/pull/9887#issuecomment-546498911
 
 
   cc: @Ardagan 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334352)
Time Spent: 3h 10m  (was: 3h)

> Instrument Dataflow jobs that are launched from Notebooks
> -
>
> Key: BEAM-8457
> URL: https://issues.apache.org/jira/browse/BEAM-8457
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
> Fix For: 2.17.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Dataflow needs the capability to tell how many Dataflow jobs are launched 
> from the Notebook environment, i.e., the Interactive Runner.
>  # Change the pipeline.run() API to allow supply a runner and an option 
> parameter so that a pipeline initially bundled w/ an interactive runner can 
> be directly run by other runners from notebook.
>  # Implicitly add the necessary source information through user labels when 
> the user does p.run(runner=DataflowRunner()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8397) DataflowRunnerTest.test_remote_runner_display_data fails due to infinite recursion during pickling.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8397?focusedWorklogId=334348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334348
 ]

ASF GitHub Bot logged work on BEAM-8397:


Author: ASF GitHub Bot
Created on: 25/Oct/19 20:15
Start Date: 25/Oct/19 20:15
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #9881: [BEAM-8397] Fix 
infinite recursion errors in test_remote_runner_display_data_test.
URL: https://github.com/apache/beam/pull/9881#issuecomment-546496935
 
 
   Run Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334348)
Time Spent: 40m  (was: 0.5h)

> DataflowRunnerTest.test_remote_runner_display_data fails due to infinite 
> recursion during pickling.
> ---
>
> Key: BEAM-8397
> URL: https://issues.apache.org/jira/browse/BEAM-8397
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> `python ./setup.py test -s 
> apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest.test_remote_runner_display_data`
>  passes.
> `tox -e py37-gcp` passes if Beam depends on dill==0.3.0, but fails if Beam 
> depends on dill==0.3.1.1.`python ./setup.py nosetests --tests 
> 'apache_beam/runners/dataflow/dataflow_runner_test.py:DataflowRunnerTest.test_remote_runner_display_data`
>  fails currently if run on master.
> The failure indicates infinite recursion during pickling:
> {noformat}
> test_remote_runner_display_data 
> (apache_beam.runners.dataflow.dataflow_runner_test.DataflowRunnerTest) ... 
> Fatal Python error: Cannot recover from stack overflow.
> Current thread 0x7f9d700ed740 (most recent call first):
>   File "/usr/lib/python3.7/pickle.py", line 479 in get
>   File "/usr/lib/python3.7/pickle.py", line 497 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 in new_save_module_dict
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 114 in wrapper
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1137 in save_cell
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 771 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 786 in save_tuple
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 638 in save_reduce
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 1394 in save_function
>   File "/usr/lib/python3.7/pickle.py", line 504 in save
>   File "/usr/lib/python3.7/pickle.py", line 882 in _batch_setitems
>   File "/usr/lib/python3.7/pickle.py", line 856 in save_dict
>   File 
> "/usr/local/google/home/valentyn/tmp/py37env/lib/python3.7/site-packages/dill/_dill.py",
>  line 910 in save_module_dict
>   File 
> "/usr/local/google/home/valentyn/projects/beam/clean/beam/sdks/python/apache_beam/internal/pickler.py",
>  line 198 

[jira] [Created] (BEAM-8488) DynamoDBIO write batch size should be configurable

2019-10-25 Thread Pradeep Bhosale (Jira)
Pradeep Bhosale created BEAM-8488:
-

 Summary: DynamoDBIO write batch size should be configurable
 Key: BEAM-8488
 URL: https://issues.apache.org/jira/browse/BEAM-8488
 Project: Beam
  Issue Type: Bug
  Components: io-java-aws
Reporter: Pradeep Bhosale


DynamoDO write size is hard-coded in the below file

DynamoDBIO.java 

      *private* *static* *final* *int* *_BATCH_SIZE_* = 25;

Pleas provide a way to override this value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=334328=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334328
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 25/Oct/19 19:01
Start Date: 25/Oct/19 19:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9844: [BEAM-8372] Support 
both flink_master and flink_master_url parameter
URL: https://github.com/apache/beam/pull/9844#issuecomment-546473901
 
 
   I think it would be good to get this resolved in the current release, if 
possible. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334328)
Time Spent: 6.5h  (was: 6h 20m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 6.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-8372) Allow submission of Flink UberJar directly to flink cluster.

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8372?focusedWorklogId=334327=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334327
 ]

ASF GitHub Bot logged work on BEAM-8372:


Author: ASF GitHub Bot
Created on: 25/Oct/19 19:01
Start Date: 25/Oct/19 19:01
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9844: [BEAM-8372] Support 
both flink_master and flink_master_url parameter
URL: https://github.com/apache/beam/pull/9844#issuecomment-546473642
 
 
   `flink_master_url` is very new, I think we can consolidate on one option for 
now. 
   
   So, if I understand correctly, if `flink_master` is just host:port, there is 
no way to (simultaneously) submit a job to two different clusters (one with 
ssl, one without) without editing the config file in the middle? This seems 
quite unfortunate. Also, the code here does not read the config file. 
   
   What if, instead, we made the `http[s]://` part optional, where it would be 
stripped in Java and assumed to be just `http://` in Python if not specified? 
(Or it could try https and then fall back to http on failure.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334327)
Time Spent: 6h 20m  (was: 6h 10m)

> Allow submission of Flink UberJar directly to flink cluster.
> 
>
> Key: BEAM-8372
> URL: https://issues.apache.org/jira/browse/BEAM-8372
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334319
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:49
Start Date: 25/Oct/19 18:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339189834
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,279 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from pandas.io.json import json_normalize
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+# TODO(BEAM-8288): clean up once Py2 is deprecated from Beam.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+try:
+  from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+  _facets_gfsg_ready = True
+except ImportError:
+  _facets_gfsg_ready = False
+
+try:
+  from IPython.core.display import HTML
+  from IPython.core.display import Javascript
+  from IPython.core.display import display
+  from IPython.core.display import display_javascript
+  from IPython.core.display import update_display
+  _ipython_ready = True
+except ImportError:
+  _ipython_ready = False
+
+try:
+  from timeloop import Timeloop
+  _tl_ready = True
+except ImportError:
+  _tl_ready = False
+
+# 1-d types that need additional normalization to be compatible with DataFrame.
+_one_dimension_types = (int, float, str, bool, list, tuple)
+
+_DIVE_SCRIPT_TEMPLATE = """
+document.querySelector("#{display_id}").data = {jsonstr};"""
+_DIVE_HTML_TEMPLATE = """
+https://cdnjs.cloudflare.com/ajax/libs/webcomponentsjs/1.3.3/webcomponents-lite.js";>
 
 Review comment:
   Integration tests sounds good. If not with this PR, I will suggest doing 
sooner.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334319)
Time Spent: 15h  (was: 14h 50m)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 15h
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334315=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334315
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:49
Start Date: 25/Oct/19 18:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339191310
 
 

 ##
 File path: sdks/python/tox.ini
 ##
 @@ -189,7 +189,7 @@ commands =
   time {toxinidir}/scripts/run_pylint.sh
 
 [testenv:docs]
-extras = test,gcp,docs
+extras = test,gcp,docs,interactive
 
 Review comment:
   Does docs need it?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334315)
Time Spent: 14h 40m  (was: 14.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334314
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:49
Start Date: 25/Oct/19 18:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339189616
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py
 ##
 @@ -0,0 +1,279 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module visualizes PCollection data.
+
+For internal use only; no backwards-compatibility guarantees.
+Only works with Python 3.5+.
+"""
+from __future__ import absolute_import
+
+import base64
+import logging
+from datetime import timedelta
+
+from pandas.io.json import json_normalize
+
+from apache_beam import pvalue
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive import pipeline_instrument as instr
+
+# jsons doesn't support < Python 3.5. Work around with json for legacy tests.
+# TODO(BEAM-8288): clean up once Py2 is deprecated from Beam.
+try:
+  import jsons
+  _pv_jsons_load = jsons.load
+  _pv_jsons_dump = jsons.dump
+except ImportError:
+  import json
+  _pv_jsons_load = json.load
+  _pv_jsons_dump = json.dump
+
+try:
+  from facets_overview.generic_feature_statistics_generator import 
GenericFeatureStatisticsGenerator
+  _facets_gfsg_ready = True
+except ImportError:
+  _facets_gfsg_ready = False
 
 Review comment:
   I would suggest first option. Require installation of full [interactive] 
extra package.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334314)
Time Spent: 14h 40m  (was: 14.5h)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334316
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:49
Start Date: 25/Oct/19 18:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339191223
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -166,6 +166,14 @@ def get_version():
 'google-cloud-bigtable>=0.31.1,<1.1.0',
 ]
 
+INTERACTIVE_BEAM = [
+'facets-overview>=1.0.0,<2',
 
 Review comment:
   Nevermind. This looks good.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 334316)

> Visualize PCollection with Interactive Beam
> ---
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-py-interactive
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 14h 40m
>  Remaining Estimate: 0h
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with Interactive Beam.
> Say an Interactive Beam pipeline defined as
> p = create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., visualize(pcoll)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work logged] (BEAM-7926) Visualize PCollection with Interactive Beam

2019-10-25 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?focusedWorklogId=334317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-334317
 ]

ASF GitHub Bot logged work on BEAM-7926:


Author: ASF GitHub Bot
Created on: 25/Oct/19 18:49
Start Date: 25/Oct/19 18:49
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9741: [BEAM-7926] 
Visualize PCollection
URL: https://github.com/apache/beam/pull/9741#discussion_r339190776
 
 

 ##
 File path: 
sdks/python/apache_beam/runners/interactive/display/pcoll_visualization_test.py
 ##
 @@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Tests for apache_beam.runners.interactive.display.pcoll_visualization."""
+from __future__ import absolute_import
+
+import sys
+import time
+import unittest
+
+import apache_beam as beam  # pylint: disable=ungrouped-imports
+import timeloop
+from apache_beam.runners import runner
+from apache_beam.runners.interactive import interactive_environment as ie
+from apache_beam.runners.interactive.display import pcoll_visualization as pv
+
+# Work around nose tests using Python2 without unittest.mock module.
+try:
+  from unittest.mock import patch
+except ImportError:
+  from mock import patch
+
+
+class PCollVisualizationTest(unittest.TestCase):
+
+  def setUp(self):
+self._p = beam.Pipeline()
+# pylint: disable=range-builtin-not-iterating
+self._pcoll = self._p | 'Create' >> beam.Create(range(1000))
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_raise_error_for_non_pcoll_input(self):
+class Foo(object):
+  pass
+
+with self.assertRaises(AssertionError) as ctx:
+  pv.PCollVisualization(Foo())
+  self.assertTrue('pcoll should be apache_beam.pvalue.PCollection' in
+  ctx.exception)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  def test_pcoll_visualization_generate_unique_display_id(self):
+pv_1 = pv.PCollVisualization(self._pcoll)
+pv_2 = pv.PCollVisualization(self._pcoll)
+self.assertNotEqual(pv_1._dive_display_id, pv_2._dive_display_id)
+self.assertNotEqual(pv_1._overview_display_id, pv_2._overview_display_id)
+self.assertNotEqual(pv_1._df_display_id, pv_2._df_display_id)
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', lambda x: [1, 2, 3])
+  def test_one_shot_visualization_not_return_handle(self):
+self.assertIsNone(pv.visualize(self._pcoll))
+
+  def _mock_to_element_list(self):
+yield [1, 2, 3]
+yield [1, 2, 3, 4]
+yield [1, 2, 3, 4, 5]
+yield [1, 2, 3, 4, 5, 6]
+yield [1, 2, 3, 4, 5, 6, 7]
+yield [1, 2, 3, 4, 5, 6, 7, 8]
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  def test_dynamic_plotting_return_handle(self):
+h = pv.visualize(self._pcoll, dynamic_plotting_interval=1)
+self.assertIsInstance(h, timeloop.Timeloop)
+h.stop()
+
+  @unittest.skipIf(sys.version_info < (3, 5, 3),
+   'PCollVisualization is not supported on Python 2.')
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization._to_element_list', _mock_to_element_list)
+  @patch('apache_beam.runners.interactive.display.pcoll_visualization'
+ '.PCollVisualization.display_facets')
+  def test_dynamic_plotting_update_same_display(self,
+mocked_display_facets):
+fake_pipeline_result = runner.PipelineResult(runner.PipelineState.RUNNING)
+ie.current_env().set_pipeline_result(self._p, fake_pipeline_result)
+# Starts async dynamic plotting that never 

  1   2   3   >