[jira] [Commented] (BEAM-2535) Allow explicit output time independent of firing specification for all timers

2019-08-07 Thread Muhammad Talha Jamil (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902701#comment-16902701
 ] 

Muhammad Talha Jamil commented on BEAM-2535:


We were looking into BEAM-2535. It look like an old ticket & already has a 
staled PR. Is their any change of context in this ticket or should we carry on 
from the existing PR?

> Allow explicit output time independent of firing specification for all timers
> -
>
> Key: BEAM-2535
> URL: https://issues.apache.org/jira/browse/BEAM-2535
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model, sdk-java-core
>Reporter: Kenneth Knowles
>Assignee: Shehzaad Nakhoda
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Today, we have insufficient control over the event time timestamp of elements 
> output from a timer callback.
> 1. For an event time timer, it is the timestamp of the timer itself.
>  2. For a processing time timer, it is the current input watermark at the 
> time of processing.
> But for both of these, we may want to reserve the right to output a 
> particular time, aka set a "watermark hold".
> A naive implementation of a {{TimerWithWatermarkHold}} would work for making 
> sure output is not droppable, but does not fully explain window expiration 
> and late data/timer dropping.
> In the natural interpretation of a timer as a feedback loop on a transform, 
> timers should be viewed as another channel of input, with a watermark, and 
> items on that channel _all need event time timestamps even if they are 
> delivered according to a different time domain_.
> I propose that the specification for when a timer should fire should be 
> separated (with nice defaults) from the specification of the event time of 
> resulting outputs. These timestamps will determine a side channel with a new 
> "timer watermark" that constrains the output watermark.
>  - We still need to fire event time timers according to the input watermark, 
> so that event time timers fire.
>  - Late data dropping and window expiration will be in terms of the minimum 
> of the input watermark and the timer watermark. In this way, whenever a timer 
> is set, the window is not going to be garbage collected.
>  - We will need to make sure we have a way to "wake up" a window once it is 
> expired; this may be as simple as exhausting the timer channel as soon as the 
> input watermark indicates expiration of a window
> This is mostly aimed at end-user timers in a stateful+timely {{DoFn}}. It 
> seems reasonable to use timers as an implementation detail (e.g. in 
> runners-core utilities) without wanting any of this additional machinery. For 
> example, if there is no possibility of output from the timer callback.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290991=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290991
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 04:37
Start Date: 08/Aug/19 04:37
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519359891
 
 
   R: @udim 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290991)
Time Spent: 3h 20m  (was: 3h 10m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6114) SQL join selection should be done in planner, not in expansion to PTransform

2019-08-07 Thread Rahul Patwari (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902628#comment-16902628
 ] 

Rahul Patwari commented on BEAM-6114:
-

[~amaliujia]

Sure Rui.

I am exploring RelOptRule and RelOptConverter rules. I created a rule for 
SideInputJoin which works when the inputs to the Join are BeamSourceIORel. I am 
trying to figure out the rule when one of the inputs to the SideInputJoin is 
also a Join. 

> SQL join selection should be done in planner, not in expansion to PTransform
> 
>
> Key: BEAM-6114
> URL: https://issues.apache.org/jira/browse/BEAM-6114
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Rahul Patwari
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently Beam SQL joins all go through a single physical operator which has 
> a single PTransform that does all join algorithms based on properties of its 
> input PCollections as well as the relational algebra.
> A first step is to make the needed information part of the relational 
> algebra, so it can choose a PTransform based on that, and the PTransforms can 
> be simpler.
> Second step is to have separate (physical) relational operators for different 
> join algorithms.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290959=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290959
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 02:49
Start Date: 08/Aug/19 02:49
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519312722
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290959)
Time Spent: 3h 10m  (was: 3h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290958=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290958
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 02:49
Start Date: 08/Aug/19 02:49
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519341956
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290958)
Time Spent: 3h  (was: 2h 50m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7927) Add ability to get the list of submitted jobs from gRPC JobService

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7927?focusedWorklogId=290919=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290919
 ]

ASF GitHub Bot logged work on BEAM-7927:


Author: ASF GitHub Bot
Created on: 08/Aug/19 01:33
Start Date: 08/Aug/19 01:33
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9293: [BEAM-7927] 
Add JobService.GetJobs to the job API
URL: https://github.com/apache/beam/pull/9293#discussion_r311824689
 
 

 ##
 File path: model/job-management/src/main/proto/beam_job_api.proto
 ##
 @@ -123,6 +126,22 @@ message CancelJobResponse {
   JobState.Enum state = 1; // (required)
 }
 
+// A subset of info provided by ProvisionApi.ProvisionInfo
+message JobInfo {
+  string job_id = 1; // (required)
+  string job_name = 2; // (required)
+  google.protobuf.Struct pipeline_options = 3; // (required)
+  JobState.Enum state = 4; // (required)
 
 Review comment:
   I debated whether or not to include the state here.  
   
   Reason for including it:  It's cheap to get, and it's the kind of info that 
any web client would want to provide when displaying an overview of jobs (see 
flink web ui).  Providing it obviates the need for these clients to issue a 
followup query to get the state of each job.
   
   Reason for excluding it: It's the only value here that is not an intrinsic, 
immutable property of a job.  i.e. it changes over time, and so conceptually, 
including it means that the JobInfo becomes invalid over time.  Removing it 
would make it a close fit for `beam.runners.fnexecution.provisioning.JobInfo`.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290919)
Time Spent: 0.5h  (was: 20m)

> Add ability to get the list of submitted jobs from gRPC JobService
> --
>
> Key: BEAM-7927
> URL: https://issues.apache.org/jira/browse/BEAM-7927
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As a developer building a client for monitoring running jobs via the 
> JobService, I want the ability to get a list of jobs – particularly their job 
> ids – so that I can use this as an entry point for getting other information 
> about a job already offered by the JobService, such as the pipeline 
> definition, a stream of status changes, etc. 
> Currently, the JobService is only useful if you already have a valid job id. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7927) Add ability to get the list of submitted jobs from gRPC JobService

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7927?focusedWorklogId=290914=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290914
 ]

ASF GitHub Bot logged work on BEAM-7927:


Author: ASF GitHub Bot
Created on: 08/Aug/19 01:27
Start Date: 08/Aug/19 01:27
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9293: [BEAM-7927] Add 
JobService.GetJobs to the job API
URL: https://github.com/apache/beam/pull/9293#issuecomment-519326670
 
 
   R: @herohde
   R: @lukecwik
   R: @angoenka
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290914)
Time Spent: 20m  (was: 10m)

> Add ability to get the list of submitted jobs from gRPC JobService
> --
>
> Key: BEAM-7927
> URL: https://issues.apache.org/jira/browse/BEAM-7927
> Project: Beam
>  Issue Type: New Feature
>  Components: beam-model
>Reporter: Chad Dombrova
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> As a developer building a client for monitoring running jobs via the 
> JobService, I want the ability to get a list of jobs – particularly their job 
> ids – so that I can use this as an entry point for getting other information 
> about a job already offered by the JobService, such as the pipeline 
> definition, a stream of status changes, etc. 
> Currently, the JobService is only useful if you already have a valid job id. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7885) DoFn.setup() don't run for streaming jobs.

2019-08-07 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902580#comment-16902580
 ] 

Ahmet Altay commented on BEAM-7885:
---

[~nikenano] could you add a log statement to your setup and check to see 
whether Dataflow logs have it or not?

The related PR (https://github.com/apache/beam/pull/7994) added a validates 
runner test called (DoFnLifecycleTest). I checked for a recent run 
(https://builds.apache.org/view/A-D/view/Beam/view/PostCommit/job/beam_PostCommit_Py_VR_Dataflow/4226/consoleFull)
 and verified that it worked for batch and streaming. This is the streaming one 
(https://console.cloud.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2019-08-06_17_48_28-1792094113770579496?project=apache-beam-testing)

(For everyone's benefit, there is additional context on the dev@ list 
https://lists.apache.org/thread.html/ce2a149e978bde252da8a1cc1c5257465a0e456e56453824594908a3@%3Cdev.beam.apache.org%3E)

> DoFn.setup() don't run for streaming jobs. 
> ---
>
> Key: BEAM-7885
> URL: https://issues.apache.org/jira/browse/BEAM-7885
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Affects Versions: 2.14.0
> Environment: Python
>Reporter: niklas Hansson
>Priority: Minor
>
> From version 2.14.0 Python have introduced setup and teardown for DoFn in 
> order to "Called to prepare an instance for processing bundles of 
> elements.This is a good place to initialize transient in-memory resources, 
> such as network connections."
> However when trying to use it for a unbounded job (pubsub source) it seams 
> like the DoFn.setup() is never called and the resources are never  
> initialize. Instead I get:
>  
> AttributeError: 'NoneType' object has no attribute 'predict' [while running 
> 'transform the data']
> """
> My source code: [https://github.com/NikeNano/DataflowSklearnStreaming]
>  
> I am happy to contribute with example code for how to use setup as soon as I 
> get it running :)  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7927) Add ability to get the list of submitted jobs from gRPC JobService

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7927?focusedWorklogId=290913=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290913
 ]

ASF GitHub Bot logged work on BEAM-7927:


Author: ASF GitHub Bot
Created on: 08/Aug/19 01:25
Start Date: 08/Aug/19 01:25
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9293: [BEAM-7927] 
Add JobService.GetJobs to the job API
URL: https://github.com/apache/beam/pull/9293
 
 
   As a developer building a client for monitoring running jobs via the 
JobService, I want the ability to get a list of submitted jobs – particularly 
their job ids – so that I can use this as an entry point for getting other 
information about a job that is already offered by the JobService, such as the 
pipeline definition, a stream of status changes, etc. 
   
   Currently, the JobService is only useful if you already have a valid job id. 
   
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-7927) Add ability to get the list of submitted jobs from gRPC JobService

2019-08-07 Thread Chad Dombrova (JIRA)
Chad Dombrova created BEAM-7927:
---

 Summary: Add ability to get the list of submitted jobs from gRPC 
JobService
 Key: BEAM-7927
 URL: https://issues.apache.org/jira/browse/BEAM-7927
 Project: Beam
  Issue Type: New Feature
  Components: beam-model
Reporter: Chad Dombrova


As a developer building a client for monitoring running jobs via the 
JobService, I want the ability to get a list of jobs – particularly their job 
ids – so that I can use this as an entry point for getting other information 
about a job already offered by the JobService, such as the pipeline definition, 
a stream of status changes, etc. 

Currently, the JobService is only useful if you already have a valid job id. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=290912=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290912
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 08/Aug/19 01:14
Start Date: 08/Aug/19 01:14
Worklog Time Spent: 10m 
  Work Description: chadrik commented on pull request #9179: [BEAM-7060] 
Use typing in type decorators of core.py
URL: https://github.com/apache/beam/pull/9179#discussion_r311821796
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -89,9 +90,9 @@
 ]
 
 # Type variables
-T = typehints.TypeVariable('T')
-K = typehints.TypeVariable('K')
-V = typehints.TypeVariable('V')
+T = typing.TypeVar('T')
 
 Review comment:
   Looks good to me!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290912)
Time Spent: 10h 10m  (was: 10h)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=290907=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290907
 ]

ASF GitHub Bot logged work on BEAM-7760:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:55
Start Date: 08/Aug/19 00:55
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9278: [BEAM-7760] Added 
iBeam module
URL: https://github.com/apache/beam/pull/9278#issuecomment-519320986
 
 
   Updated the PR with 2nd commit, also sent out an email briefing the 
interactive Beam work for this PR and future plan.
   
   We should expect many PRs in the near future re-writing the interactive Beam 
(how cache used, how DOT renders, streaming has different interactive behavior 
from batch) since we changed how the underlying magic caching works.
   
   The other main work is PCollection Data visualization.
   
   However, except the interactive_beam module, everything else should not be 
concerned by either Beam users or developers. The change is within interactive 
Beam scope and any magic implemented is implicit. We shall update the README 
under interactive package as work evolves. 
   
   PTAL
   R: @aaltay 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290907)
Time Spent: 2h 10m  (was: 2h)

> Interactive Beam Caching PCollections bound to user defined vars in notebook
> 
>
> Key: BEAM-7760
> URL: https://issues.apache.org/jira/browse/BEAM-7760
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Cache only PCollections bound to user defined variables in a pipeline when 
> running pipeline with interactive runner in jupyter notebooks.
> [Interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
>  has been caching and using caches of "leaf" PCollections for interactive 
> execution in jupyter notebooks.
> The interactive execution is currently supported so that when appending new 
> transforms to existing pipeline for a new run, executed part of the pipeline 
> doesn't need to be re-executed. 
> A PCollection is "leaf" when it is never used as input in any PTransform in 
> the pipeline.
> The problem with building caches and pipeline to execute around "leaf" is 
> that when a PCollection is consumed by a sink with no output, the pipeline to 
> execute built will miss the subgraph generating and consuming that 
> PCollection.
> An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty 
> pipeline.
> Caching around PCollections bound to user defined variables and replacing 
> transforms with source and sink of caches could resolve the pipeline to 
> execute properly under the interactive execution scenario. Also, cached 
> PCollection now can trace back to user code and can be used for user data 
> visualization if user wants to do it.
> E.g.,
> {code:java}
> // ...
> p = beam.Pipeline(interactive_runner.InteractiveRunner(),
>   options=pipeline_options)
> messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...')
> messages | "Write" >> beam.io.WriteToPubSub(topic_path)
> result = p.run()
> // ...
> visualize(messages){code}
>  The interactive runner automatically figures out that PCollection
> {code:java}
> messages{code}
> created by
> {code:java}
> p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code}
> should be cached and reused if the notebook user appends more transforms.
>  And once the pipeline gets executed, the user could use any 
> visualize(PCollection) module to visualize the data statically (batch) or 
> dynamically (stream)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7820) Add hot key detection to Dataflow Runner

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7820?focusedWorklogId=290906=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290906
 ]

ASF GitHub Bot logged work on BEAM-7820:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:50
Start Date: 08/Aug/19 00:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9270: [BEAM-7820] 
HotKeyDetection
URL: https://github.com/apache/beam/pull/9270#discussion_r311817499
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -1020,6 +1026,17 @@ private void scheduleWorkItem(
 Preconditions.checkState(
 outputDataWatermark == null || 
!outputDataWatermark.isAfter(inputDataWatermark));
 SdkWorkerHarness worker = 
sdkHarnessRegistry.getAvailableWorkerAndAssignWork();
+
+if (workItem.hasHotKeyInfo()) {
+  Windmill.HotKeyInfo hotKeyInfo = workItem.getHotKeyInfo();
+  Duration hotKeyAge = Duration.millis(hotKeyInfo.getHotKeyAgeUsec() / 
1000);
+
+  // The MapTask instruction is ordered by dependencies, such that the 
first element is
+  // always going to be the shuffle task.
+  String stepName = 
computationState.getMapTask().getInstructions().get(0).getName();
+  hotKeyLogger.logHotKeyDetection(stepName, hotKeyAge);
 
 Review comment:
   Is the `stepName` the correct step name that we want? (e.g. not `s2`, nor 
`GroupByKey/ReadFromShuffle`, but rather, the user-known `GroupByKey` step?)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290906)
Time Spent: 3h  (was: 2h 50m)

> Add hot key detection to Dataflow Runner
> 
>
> Key: BEAM-7820
> URL: https://issues.apache.org/jira/browse/BEAM-7820
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> This tracks adding hot key detection in the Dataflow Runner. 
> There are times when a user's pipeline spuriously slows down due to hot keys. 
> During these times, users are unable to see under the hood at what the 
> pipeline is doing. This adds hot key detection to show the user when their 
> pipeline has a hot key.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7820) Add hot key detection to Dataflow Runner

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7820?focusedWorklogId=290905=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290905
 ]

ASF GitHub Bot logged work on BEAM-7820:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:50
Start Date: 08/Aug/19 00:50
Worklog Time Spent: 10m 
  Work Description: pabloem commented on pull request #9270: [BEAM-7820] 
HotKeyDetection
URL: https://github.com/apache/beam/pull/9270#discussion_r311817803
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/HotKeyLogger.java
 ##
 @@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.dataflow.worker;
+
+import com.google.api.client.util.Clock;
+import java.text.MessageFormat;
+import org.apache.beam.runners.dataflow.util.TimeUtil;
+import org.joda.time.Duration;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class HotKeyLogger {
+  Logger LOG = LoggerFactory.getLogger(HotKeyLogger.class);
+
+  /** Clock used to either provide real system time or mocked to virtualize 
time for testing. */
+  private Clock clock = Clock.SYSTEM;
+
+  /**
+   * The previous time the HotKeyDetection was logged. This is used to 
throttle logging to every 5
+   * minutes.
+   */
+  private long prevHotKeyDetectionLogMs = 0;
+
+  /** Throttles logging the detection to every loggingPeriod */
+  private final Duration loggingPeriod = Duration.standardMinutes(5);
+
+  HotKeyLogger() {}
+
+  HotKeyLogger(Clock clock) {
+this.clock = clock;
+  }
+
+  /** Logs a detection of the hot key every 5 minutes. */
+  public void logHotKeyDetection(String userStepName, Duration hotKeyAge) {
+if (isThrottled()) {
+  return;
+}
+LOG.warn(getHotKeyMessage(userStepName, 
TimeUtil.toCloudDuration(hotKeyAge)));
 
 Review comment:
   Is warning the right log level for this? It may be "normal", and acceptable 
to have a hot key - and when users see a warning they'll think something bad is 
happening. What do you think?
   
   The same thing has happened with lull logging, where users think something's 
wrong - though that's not the case.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290905)
Time Spent: 2h 50m  (was: 2h 40m)

> Add hot key detection to Dataflow Runner
> 
>
> Key: BEAM-7820
> URL: https://issues.apache.org/jira/browse/BEAM-7820
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This tracks adding hot key detection in the Dataflow Runner. 
> There are times when a user's pipeline spuriously slows down due to hot keys. 
> During these times, users are unable to see under the hood at what the 
> pipeline is doing. This adds hot key detection to show the user when their 
> pipeline has a hot key.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=290904=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290904
 ]

ASF GitHub Bot logged work on BEAM-7760:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:44
Start Date: 08/Aug/19 00:44
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on pull request #9278: [BEAM-7760] 
Added iBeam module
URL: https://github.com/apache/beam/pull/9278#discussion_r311687253
 
 

 ##
 File path: sdks/python/apache_beam/runners/interactive/interactive_beam.py
 ##
 @@ -0,0 +1,199 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""Module of the current iBeam (interactive Beam) environment.
+
+The purpose of the module is to reduce the learning curve of iBeam users, 
+provide a single place for importing and add sugar syntax for all iBeam
+components. It gives users capability to manipulate existing environment for
+interactive beam, TODO(ningk) run interactive pipeline on selected runner as
+normal pipeline, create pipeline with interactive runner and visualize
+PCollections as bounded dataset.
+
+Note: iBeam works the same as normal Beam with DirectRunner when not in an
+interactively environment such as Jupyter lab or Jupyter Notebook. You can also
+run pipeline created by iBeam as normal Beam pipeline by run_pipeline() with
+desired runners.
+"""
+
+import importlib
+
+import apache_beam as beam
+from apache_beam.runners.interactive import interactive_runner
+
+_ibeam_env = None
+
+
+def watch(watchable):
+  """Watches a watchable so that iBeam can understand your pipeline.
+
+  If you write Beam pipeline in a notebook or __main__ module directly, since
+  __main__ module is always watched by default, you don't have to instruct
+  iBeam. However, if your Beam pipeline is defined in some module other than
+  __main__, e.g., inside a class function or a unit test, you can watch() the
+  scope to instruct iBeam to apply magic to your pipeline when running pipeline
+  interactively.
+
+For example:
+
+class Foo(object)
+  def build_pipeline(self):
+p = create_pipeline()
+init_pcoll = p |  'Init Create' >> beam.Create(range(10))
+watch(locals())
+return p
+Foo().build_pipeline().run()
+
+iBeam will cache init_pcoll for the first run. You can use:
+
+visualize(init_pcoll)
+
+To visualize data from init_pcoll once the pipeline is executed. And if you
+make change to the original pipeline by adding:
+
+squares = init_pcoll | 'Square' >> beam.Map(lambda x: x*x)
+
+When you re-run the pipeline from the line you just added, squares will
+use the init_pcoll data cached so you can have an interactive experience.
+
+  Currently the implementation mainly watches for PCollection variables defined
+  in user code. A watchable can be a dictionary of variable metadata such as
+  locals(), a str name of a module, a module object or an instance of a class.
+  The variable can come from any scope even local variables in a method of a
+  class defined in a module.
+
+Below are all valid:
+
+watch(__main__)  # if import __main__ is already invoked
+watch('__main__')  # does not require invoking import __main__ beforehand
+watch(self)  # inside a class
+watch(SomeInstance())  # an instance of a class
+watch(locals())  # inside a function, watching local variables within
+  """
+  current_env().watch(watchable)
+
+
+def create_pipeline(runner=None, options=None, argv=None):
+  """Creates a pipeline with interactive runner by default.
+
+  You can use run_pipeline() provided within this module to execute the iBeam
+  pipeline with other runners.
+
+  Args:
+runner (~apache_beam.runners.runner.PipelineRunner): An object of
+  type :class:`~apache_beam.runners.runner.PipelineRunner` that will be
+  used to execute the pipeline. For registered runners, the runner name
+  can be specified, otherwise a runner object must be supplied.
+options (~apache_beam.options.pipeline_options.PipelineOptions):
+  A configured
+  

[jira] [Created] (BEAM-7926) Visualize PCollection with iBeam

2019-08-07 Thread Ning Kang (JIRA)
Ning Kang created BEAM-7926:
---

 Summary: Visualize PCollection with iBeam
 Key: BEAM-7926
 URL: https://issues.apache.org/jira/browse/BEAM-7926
 Project: Beam
  Issue Type: New Feature
  Components: examples-python
Reporter: Ning Kang
Assignee: Ning Kang


Support auto plotting / charting of materialized data of a given PCollection 
with interactive Beam.

Say an iBeam pipeline defined as

p = ibeam.create_pipeline()

pcoll = p | 'Transform' >> transform()

The use can call a single function and get auto-magical charting of the data as 
materialized pcoll.

e.g., visualize(pcoll)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7926) Visualize PCollection with iBeam

2019-08-07 Thread Ning Kang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ning Kang updated BEAM-7926:

Description: 
Support auto plotting / charting of materialized data of a given PCollection 
with interactive Beam.

Say an iBeam pipeline defined as

p = ibeam.create_pipeline()

pcoll = p | 'Transform' >> transform()

The use can call a single function and get auto-magical charting of the data as 
materialized pcoll.

e.g., ibeam.visualize(pcoll)

  was:
Support auto plotting / charting of materialized data of a given PCollection 
with interactive Beam.

Say an iBeam pipeline defined as

p = ibeam.create_pipeline()

pcoll = p | 'Transform' >> transform()

The use can call a single function and get auto-magical charting of the data as 
materialized pcoll.

e.g., visualize(pcoll)


> Visualize PCollection with iBeam
> 
>
> Key: BEAM-7926
> URL: https://issues.apache.org/jira/browse/BEAM-7926
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>
> Support auto plotting / charting of materialized data of a given PCollection 
> with interactive Beam.
> Say an iBeam pipeline defined as
> p = ibeam.create_pipeline()
> pcoll = p | 'Transform' >> transform()
> The use can call a single function and get auto-magical charting of the data 
> as materialized pcoll.
> e.g., ibeam.visualize(pcoll)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=290902=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290902
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:32
Start Date: 08/Aug/19 00:32
Worklog Time Spent: 10m 
  Work Description: aryann commented on issue #9156: [BEAM-7495] Add 
fine-grained progress reporting
URL: https://github.com/apache/beam/pull/9156#issuecomment-519317006
 
 
   @chamikaramj thank you for your patience. I've fixed the bug that I spotted 
related to splits. There is a unit test that covers this with a comment 
explaining the reasoning behind the numbers. Please take another look.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290902)
Time Spent: 11h 40m  (was: 11.5h)
Remaining Estimate: 492h 20m  (was: 492.5h)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 11h 40m
>  Remaining Estimate: 492h 20m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-07 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902560#comment-16902560
 ] 

Rui Wang commented on BEAM-7049:


And actually I think this JIRA's title was misleading, multiple UNION works but 
is not efficient:

Current plan will contains two binary Union, and each Union will have a shuffle 
(due to GroupByKey).   


This JIRA wants to implement a way to make sure all inputs are in the same 
BeamUnionRel, thus we can have one single shuffle to union all inputs.



> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-07 Thread Rui Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated BEAM-7049:
---
Summary: Merge multiple input to one BeamUnionRel  (was: BeamUnionRel 
should work on mutiple input )

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-7049) BeamUnionRel should work on mutiple input

2019-08-07 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902556#comment-16902556
 ] 

Rui Wang edited comment on BEAM-7049 at 8/8/19 12:26 AM:
-

I see. Can you confirm how many instances of 
[BeamUnionRel|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
  created for your UNION query in your test?

You can set a breakpoint at: 
[CalciteQueryPlanner.java#L167|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java#L167]
 to observe generated physical plan (check the structure of beamRelNode). 


was (Author: amaliujia):
I see. Can you confirm how many instances of [BeamUnionRel 
title|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
  created for your UNION query in your test?

You can set a breakpoint at: 
[CalciteQueryPlanner.java#L167|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java#L167]
 to observe generated physical plan (check the structure of beamRelNode). 

> BeamUnionRel should work on mutiple input 
> --
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7049) BeamUnionRel should work on mutiple input

2019-08-07 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902556#comment-16902556
 ] 

Rui Wang commented on BEAM-7049:


I see. Can you confirm how many instances of [BeamUnionRel 
title|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java]
  created for your UNION query in your test?

You can set a breakpoint at: 
[CalciteQueryPlanner.java#L167|https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/CalciteQueryPlanner.java#L167]
 to observe generated physical plan (check the structure of beamRelNode). 

> BeamUnionRel should work on mutiple input 
> --
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290888=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290888
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519312722
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290888)
Time Spent: 2h 40m  (was: 2.5h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290895=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290895
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311179927
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
 
 Review comment:
   Can we be a bit more specific here? Something like "A byte array 
representing a packed bitset ...". And mention that the "padding bits" are 0's?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290895)
Time Spent: 1.5h  (was: 1h 20m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290894=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290894
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311292446
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
 ##
 @@ -74,6 +78,14 @@
   FullWindowedValueCoder.of(
   IterableCoder.of(VarLongCoder.of()), 
IntervalWindowCoder.of()))
   .add(DoubleCoder.of())
+  .add(
 
 Review comment:
   It seems that the tests here won't be able to compare `RowCoder`'s equality 
properly. The root of the problem is not in this file though. It is because 
`RowCoder` currently does not define a correct `equals()` method (any two 
`RowCoder` instances are considered equal now).
   
   Details: `RowCoder` currently extends `StructuredCoder`'s `equals()` 
definition, which checks if the `component`s in the coder are the same, but the 
schema associated with a `RowCoder` is not its `component`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290894)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290884=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290884
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519287208
 
 
   run java postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290884)
Time Spent: 2h 10m  (was: 2h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290896=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290896
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519287102
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290896)
Time Spent: 2h 50m  (was: 2h 40m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290890=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290890
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311179276
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
 
 Review comment:
   Can we explains in more detail what "supported schema changes" means here? 
Is it only useful for the streaming pipeline update case?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290890)
Time Spent: 1h  (was: 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290891=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290891
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311196553
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
+// instance of Schema from schema.proto.
+//
+// A row is encoded as the concatenation of:
+//   - The number of attributes in the schema, encoded with
+// beam:coder:varint:v1. This is useful for detecting supported schema
+// changes (column additions/deletions).
+//   - A packed bitset indicating null fields (a 1 indicating a null)
+// encoded with beam:coder:bytes:v1. If there are no nulls an empty 
byte
+// array is encoded.
+//   - An encoding for each non-null field, concatenated together.
+//
+// Schema types are mapped to coders as follows:
+//   AtomicType:
+// BYTE:  not yet a standard coder
 
 Review comment:
   Maybe add a JIRA ticket link here saying we do need to close these gaps 
before making row coder truly standard?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290891)
Time Spent: 1h 10m  (was: 1h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290892=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290892
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311280635
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +122,32 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
+  }
+
+  @Override
+  public byte[] getPayload(RowCoder from) {
+return SchemaTranslation.toProto(from.getSchema()).toByteArray();
+  }
+
+  @Override
+  public RowCoder fromComponents(List> components, byte[] 
payload) {
+// Assert that components are empty?
 
 Review comment:
   I think adding an assert here won't hurt.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290892)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290885=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290885
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519286690
 
 
   run xvr_flink postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290885)
Time Spent: 2h 20m  (was: 2h 10m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290893=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290893
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311808396
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CommonCoderTest.java
 ##
 @@ -278,41 +290,85 @@ private static Object convertValue(Object value, 
CommonCoder coderSpec, Coder co
   return WindowedValue.of(windowValue, timestamp, windows, paneInfo);
 } else if (s.equals(getUrn(StandardCoders.Enum.DOUBLE))) {
   return Double.parseDouble((String) value);
+} else if (s.equals(getUrn(StandardCoders.Enum.ROW))) {
+  Schema schema;
+  try {
+schema = 
SchemaTranslation.fromProto(SchemaApi.Schema.parseFrom(coderSpec.getPayload()));
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException("Failed to parse schema payload for row 
coder", e);
+  }
+
+  return parseField(value, Schema.FieldType.row(schema));
 } else {
   throw new IllegalStateException("Unknown coder URN: " + 
coderSpec.getUrn());
 }
   }
 
+  private static Object parseField(Object value, Schema.FieldType fieldType) {
+switch (fieldType.getTypeName()) {
+  case BYTE:
+return ((Number) value).byteValue();
+  case INT16:
+return ((Number) value).shortValue();
+  case INT32:
+return ((Number) value).intValue();
+  case INT64:
+return ((Number) value).longValue();
+  case FLOAT:
+return Float.parseFloat((String) value);
+  case DOUBLE:
+return Double.parseDouble((String) value);
+  case STRING:
+return (String) value;
+  case BOOLEAN:
+return (Boolean) value;
+  case BYTES:
+// extract String as byte[]
+return ((String) value).getBytes(StandardCharsets.ISO_8859_1);
+  case ARRAY:
+return ((List) value)
+.stream()
+.map((element) -> parseField(element, 
fieldType.getCollectionElementType()))
+.collect(toImmutableList());
+  case MAP:
+Map kvMap = (Map) value;
+return kvMap.entrySet().stream()
+.collect(
+toImmutableMap(
+(pair) -> parseField(pair.getKey(), 
fieldType.getMapKeyType()),
+(pair) -> parseField(pair.getValue(), 
fieldType.getMapValueType(;
+  case ROW:
+Map rowMap = (Map) value;
+Schema schema = fieldType.getRowSchema();
+Row.Builder row = Row.withSchema(schema);
+for (Schema.Field field : schema.getFields()) {
+  Object element = rowMap.remove(field.getName());
+  if (element != null) {
+element = parseField(element, field.getType());
+  }
+  row.addValue(element);
+}
+
+if (!rowMap.isEmpty()) {
+  throw new IllegalArgumentException(
+  "Value contains keys that are not in the schema: " + 
rowMap.keySet());
+}
+
+return row.build();
+  default: // DECIMAL, DATETIME, LOGICAL_TYPE
+throw new IllegalArgumentException("Unsupported type name: " + 
fieldType.getTypeName());
+}
+  }
+
   private static Coder instantiateCoder(CommonCoder coder) {
 List> components = new ArrayList<>();
 for (CommonCoder innerCoder : coder.getComponents()) {
   components.add(instantiateCoder(innerCoder));
 }
-String s = coder.getUrn();
-if (s.equals(getUrn(StandardCoders.Enum.BYTES))) {
-  return ByteArrayCoder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.STRING_UTF8))) {
-  return StringUtf8Coder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.KV))) {
-  return KvCoder.of(components.get(0), components.get(1));
-} else if (s.equals(getUrn(StandardCoders.Enum.VARINT))) {
-  return VarLongCoder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.INTERVAL_WINDOW))) {
-  return IntervalWindowCoder.of();
-} else if (s.equals(getUrn(StandardCoders.Enum.ITERABLE))) {
-  return IterableCoder.of(components.get(0));
-} else if (s.equals(getUrn(StandardCoders.Enum.TIMER))) {
-  return Timer.Coder.of(components.get(0));
-} else if (s.equals(getUrn(StandardCoders.Enum.GLOBAL_WINDOW))) {
-  return GlobalWindow.Coder.INSTANCE;
-} else if (s.equals(getUrn(StandardCoders.Enum.WINDOWED_VALUE))) {
-  return WindowedValue.FullWindowedValueCoder.of(
-  components.get(0), (Coder) components.get(1));
-} else if 

[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290887=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290887
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519312706
 
 
   run xvr_flink postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290887)
Time Spent: 2.5h  (was: 2h 20m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=290889=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290889
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:09
Start Date: 08/Aug/19 00:09
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9188: [BEAM-7886] 
Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r311178383
 
 

 ##
 File path: model/pipeline/src/main/proto/beam_runner_api.proto
 ##
 @@ -642,6 +642,45 @@ message StandardCoders {
 // Components: Coder for a single element.
 // Experimental.
 STATE_BACKED_ITERABLE = 9 [(beam_urn) = 
"beam:coder:state_backed_iterable:v1"];
+
+// Additional Standard Coders
+// --
+// The following coders are not required to be implemented for an SDK or
+// runner to support the Beam model, but can enable additional
+// functionality.
+
+// Encodes a "row", an element with a known schema, defined by an
 
 Review comment:
   Maybe make it clear here what "additional functionality" can be enabled by 
implementing this coder?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290889)
Time Spent: 50m  (was: 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290883=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290883
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 08/Aug/19 00:08
Start Date: 08/Aug/19 00:08
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519287174
 
 
   run python 3.5 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290883)
Time Spent: 2h  (was: 1h 50m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7049) BeamUnionRel should work on mutiple input

2019-08-07 Thread sridhar Reddy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902548#comment-16902548
 ] 

sridhar Reddy commented on BEAM-7049:
-

[~amaliujia] Thanks for checking in with me. 

 

I just finished a couple of tests and can you please confirm if I am doing the 
right tests

created three PCollections

---

PCollection outputStreamOne =
 inputTable.apply(SqlTransform.query("select c1, c2, c3 from PCOLLECTION "));
PCollection outputStreamTwo =
 inputTable2.apply(SqlTransform.query("select c1, c2, c3 from PCOLLECTION "));
PCollection outputStreamThree =
 inputTable3.apply(SqlTransform.query("select c1, c2, c3 from PCOLLECTION "));



used union on 3 Pcollections

--

PCollection outputStreamResult =
 PCollectionTuple.of(new TupleTag<>("FirstOne"), outputStreamOne)
 .and(new TupleTag<>("SecondOne"), outputStreamTwo)
 .and(new TupleTag<>("ThirdOne"), outputStreamThree)
 .apply(SqlTransform.query("select c2 from SecondOne union " +
 "select c2 from FirstOne union select c2 from ThirdOne" ));

-

 

It seems to be working fine. If I used a simple union that seems to be working 
fine too

---

PCollection outputStreamResult =
 PCollectionTuple.of(new TupleTag<>("FirstOne"), outputStreamOne)
 .and(new TupleTag<>("SecondOne"), outputStreamTwo)
 .and(new TupleTag<>("ThirdOne"), outputStreamThree)
 .apply(SqlTransform.query("select 1 from SecondOne union " +
 "select 2 from FirstOne union select 3 from ThirdOne" ));

-

 

Please let me know your thoughts. 

> BeamUnionRel should work on mutiple input 
> --
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7874) FnApi only supports up to 10 workers

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7874?focusedWorklogId=290865=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290865
 ]

ASF GitHub Bot logged work on BEAM-7874:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:21
Start Date: 07/Aug/19 23:21
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9218: 
[BEAM-7874], [BEAM-7873] Distributed FnApiRunner bugfixs
URL: https://github.com/apache/beam/pull/9218#discussion_r311801800
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1319,55 +1320,62 @@ def stop_worker(self):
 @WorkerHandler.register_environment(common_urns.environments.DOCKER.urn,
 beam_runner_api_pb2.DockerPayload)
 class DockerSdkWorkerHandler(GrpcWorkerHandler):
+
+  _lock = threading.Lock()
 
 Review comment:
   Yes, this is a workaround for subprocess.
   
   I tried to add a function to handle calls to subprocess, but, I was not able 
to find a good way to handle it. With `DockerSdkWorkerHandler`, we can lock 
`start_worker()`, but we cannot lock `start_worker()` with 
`SubprocessSdkWorkerHandler` handler. We should lock `subprocess.Popen()` only 
with `SubprocessSdkWorkerHandler`, if we lock the whole `start_worker()` 
function, it waits a worker to finish job, so the pipeline gets stuck.
   
   In fact, I doubt  `DockerSdkWorkerHandler` works with FnApi at the moment. 
FnApi doesn't stage any artifacts, so docker bootstrap fails and container 
cannot be created. I will make a change to `DockerSdkWorkerHandler` soon and 
create only one container and handle multithreading/processing when a job 
running within container, so we can get rid of lock here for 
`DockerSdkWorkerHandler`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290865)
Time Spent: 2h  (was: 1h 50m)

> FnApi only supports up to 10 workers
> 
>
> Key: BEAM-7874
> URL: https://issues.apache.org/jira/browse/BEAM-7874
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Because max_workers of grpc servers are hardcoded to 10, it only supports up 
> to 10 workers, and if we pass more direct_num_workers greater than 10, 
> pipeline hangs, because not all workers get connected to the runner.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1141]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290855=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290855
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311799555
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
 
 Review comment:
   Add comma
   
   "expressions such as"->"expressions, such as"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290855)
Time Spent: 35h 40m  (was: 35.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290862=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290862
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311800566
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_match %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Regex search
+
+[`re.search`](https://docs.python.org/3/library/re.html#re.search)
+will try to search for the first occurrence the regular expression in the 
string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_search %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Regex find all
+
+[`re.finditer`](https://docs.python.org/3/library/re.html#re.finditer)
+will try to search for all the occurrence the regular expression in the string.
+This returns an iterator of match objects.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_find_all %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:words %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 4: Regex replace
+
+[`re.sub`](https://docs.python.org/3/library/re.html#re.sub)
+will substitute occurrences the regular expression in the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_replace %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plants_csv %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 5: Regex split
+

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290856=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290856
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311799866
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
 
 Review comment:
   "will try to"->"tries to"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290856)
Time Spent: 35h 50m  (was: 35h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290857=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290857
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311799769
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
 
 Review comment:
   Split into a new sentence.
   
   "Make sure to specify the Python flavor at the left side bar."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290857)
Time Spent: 36h  (was: 35h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 36h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290861=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290861
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311801274
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_match %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Regex search
+
+[`re.search`](https://docs.python.org/3/library/re.html#re.search)
+will try to search for the first occurrence the regular expression in the 
string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_search %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Regex find all
+
+[`re.finditer`](https://docs.python.org/3/library/re.html#re.finditer)
+will try to search for all the occurrence the regular expression in the string.
 
 Review comment:
   "for all the"->"for all of the"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290861)
Time Spent: 36h 40m  (was: 36.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 36h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290863=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290863
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311801058
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_match %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Regex search
+
+[`re.search`](https://docs.python.org/3/library/re.html#re.search)
+will try to search for the first occurrence the regular expression in the 
string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_search %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Regex find all
+
+[`re.finditer`](https://docs.python.org/3/library/re.html#re.finditer)
+will try to search for all the occurrence the regular expression in the string.
+This returns an iterator of match objects.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_find_all %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:words %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 4: Regex replace
+
+[`re.sub`](https://docs.python.org/3/library/re.html#re.sub)
+will substitute occurrences the regular expression in the string.
 
 Review comment:
   "occurrences the"->"occurrences of the"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290863)
Time Spent: 37h  (was: 36h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290859=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290859
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311800064
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_match %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Regex search
+
+[`re.search`](https://docs.python.org/3/library/re.html#re.search)
+will try to search for the first occurrence the regular expression in the 
string.
 
 Review comment:
   "will try to search"->"tries to search"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290859)
Time Spent: 36h 20m  (was: 36h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 36h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290858=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290858
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311800194
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_match %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Regex search
+
+[`re.search`](https://docs.python.org/3/library/re.html#re.search)
+will try to search for the first occurrence the regular expression in the 
string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_search %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Regex find all
+
+[`re.finditer`](https://docs.python.org/3/library/re.html#re.finditer)
+will try to search for all the occurrence the regular expression in the string.
 
 Review comment:
   "will try to search"->"tries to search"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290858)
Time Spent: 36h 10m  (was: 36h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 36h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290860=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290860
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:20
Start Date: 07/Aug/19 23:20
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#discussion_r311800664
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/regex.md
 ##
 @@ -19,10 +19,151 @@ limitations under the License.
 -->
 
 # Regex
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Filters input string elements based on a regex. May also transform them based 
on the matching groups.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Map]({{ site.baseurl }}/documentation/transforms/python/elementwise/map) 
applies a simple 1-to-1 mapping function over each element in the collection
\ No newline at end of file
+In the following examples, we create a pipeline with a `PCollection` of text 
strings.
+Then, we use the `re` module to search, replace, and split through the text 
elements using
+[regular expressions](https://docs.python.org/3/library/re.html).
+
+You can use tools to help you create and test your regular expressions such as
+[regex101](https://regex101.com/),
+make sure to specify the Python flavor at the left side bar.
+
+### Example 1: Regex match
+
+[`re.match`](https://docs.python.org/3/library/re.html#re.match)
+will try to match the regular expression from the beginning of the string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_match %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Regex search
+
+[`re.search`](https://docs.python.org/3/library/re.html#re.search)
+will try to search for the first occurrence the regular expression in the 
string.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_search %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:plant_matches %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Regex find all
+
+[`re.finditer`](https://docs.python.org/3/library/re.html#re.finditer)
+will try to search for all the occurrence the regular expression in the string.
+This returns an iterator of match objects.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py
 tag:regex_find_all %}```
+
+Output `PCollection` after regex:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex_test.py
 tag:words %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/regex.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 4: Regex replace
+
+[`re.sub`](https://docs.python.org/3/library/re.html#re.sub)
+will substitute occurrences the regular expression in the string.
 
 Review comment:
   "will substitute"->"substitutes"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290860)
Time Spent: 36.5h  (was: 36h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: 

[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290854=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290854
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:18
Start Date: 07/Aug/19 23:18
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9277: [BEAM-6907] Reuse 
Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#issuecomment-519302920
 
 
   Run Python 2 PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290854)
Time Spent: 2h 40m  (was: 2.5h)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290850=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290850
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:08
Start Date: 07/Aug/19 23:08
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9265: [BEAM-7389] Add 
code examples for Map page
URL: https://github.com/apache/beam/pull/9265#discussion_r311796559
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/map.md
 ##
 @@ -19,24 +19,258 @@ limitations under the License.
 -->
 
 # Map
-
-
+localStorage.setItem('language', 'language-py')
+
+
+
+  
+https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Map;>
-  https://beam.apache.org/images/logos/sdks/python.png; 
width="20px" height="20px"
-   alt="Pydoc" />
- Pydoc
+  https://beam.apache.org/images/logos/sdks/python.png;
+  width="20px" height="20px" alt="Pydoc" />
+  Pydoc
 
+  
 
 
+
 Applies a simple 1-to-1 mapping function over each element in the collection.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
+In the following examples, we create a pipeline with a `PCollection` of 
produce their icon, name, and duration.
 
 Review comment:
   "produce their icon"->"produce with their icon"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290850)
Time Spent: 35h 10m  (was: 35h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290852=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290852
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:08
Start Date: 07/Aug/19 23:08
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9265: [BEAM-7389] Add 
code examples for Map page
URL: https://github.com/apache/beam/pull/9265#discussion_r311797092
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/map.md
 ##
 @@ -19,24 +19,258 @@ limitations under the License.
 -->
 
 # Map
-
-
+localStorage.setItem('language', 'language-py')
+
+
+
+  
+https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Map;>
-  https://beam.apache.org/images/logos/sdks/python.png; 
width="20px" height="20px"
-   alt="Pydoc" />
- Pydoc
+  https://beam.apache.org/images/logos/sdks/python.png;
+  width="20px" height="20px" alt="Pydoc" />
+  Pydoc
 
+  
 
 
+
 Applies a simple 1-to-1 mapping function over each element in the collection.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
+In the following examples, we create a pipeline with a `PCollection` of 
produce their icon, name, and duration.
+Then, we apply `Map` in multiple ways to transform every element in the 
`PCollection`.
+
+`Map` accepts a function that returns a single element for every input element 
in the `PCollection`.
+
+### Example 1: Map with a predefined function
+
+We use the function `str.strip` which takes a single `str` element and outputs 
a `str`.
+It will strip the input element's whitespaces, including newlines and tabs.
 
 Review comment:
   "will strip"->"strips"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290852)
Time Spent: 35.5h  (was: 35h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290851=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290851
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 23:08
Start Date: 07/Aug/19 23:08
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9265: [BEAM-7389] Add 
code examples for Map page
URL: https://github.com/apache/beam/pull/9265#discussion_r311796761
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/map.md
 ##
 @@ -19,24 +19,258 @@ limitations under the License.
 -->
 
 # Map
-
-
+localStorage.setItem('language', 'language-py')
+
+
+
+  
+https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.core.html#apache_beam.transforms.core.Map;>
-  https://beam.apache.org/images/logos/sdks/python.png; 
width="20px" height="20px"
-   alt="Pydoc" />
- Pydoc
+  https://beam.apache.org/images/logos/sdks/python.png;
+  width="20px" height="20px" alt="Pydoc" />
+  Pydoc
 
+  
 
 
+
 Applies a simple 1-to-1 mapping function over each element in the collection.
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
+In the following examples, we create a pipeline with a `PCollection` of 
produce their icon, name, and duration.
 
 Review comment:
   (This is my typo)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290851)
Time Spent: 35h 20m  (was: 35h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290847=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290847
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:58
Start Date: 07/Aug/19 22:58
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519298358
 
 
   run python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290847)
Time Spent: 1h 50m  (was: 1h 40m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290846=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290846
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:57
Start Date: 07/Aug/19 22:57
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519298358
 
 
   run python precommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290846)
Time Spent: 1h 40m  (was: 1.5h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7596) List is not parallel: Make it parallel

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7596?focusedWorklogId=290845=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290845
 ]

ASF GitHub Bot logged work on BEAM-7596:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:54
Start Date: 07/Aug/19 22:54
Worklog Time Spent: 10m 
  Work Description: jobegrabber commented on pull request #8912: 
[BEAM-7596] Align wording of list in web docs
URL: https://github.com/apache/beam/pull/8912#discussion_r311795617
 
 

 ##
 File path: website/src/documentation/index.md
 ##
 @@ -29,7 +29,7 @@ This section provides in-depth conceptual information and 
reference material for
 
 Learn about the Beam Programming Model and the concepts common to all Beam 
SDKs and Runners.
 
-* The [Programming Guide]({{ site.baseurl }}/documentation/programming-guide/) 
introduces all the key Beam concepts.
+* Read through the [Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/) to get introduced to all the key Beam 
concepts.
 * Learn about Beam's [execution model]({{ site.baseurl 
}}/documentation/execution-model/) to better understand how pipelines execute.
 * Visit [Learning Resources]({{ site.baseurl 
}}/documentation/resources/learning-resources) for some of our favorite 
articles and talks about Beam.
 
 Review comment:
   Thanks for the pointer; done!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290845)
Time Spent: 1h  (was: 50m)

> List is not parallel: Make it parallel
> --
>
> Key: BEAM-7596
> URL: https://issues.apache.org/jira/browse/BEAM-7596
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Riona MacNamara
>Assignee: Jonas Grabber
>Priority: Trivial
>  Labels: Starter
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> In [https://beam.apache.org/documentation/], the list under *Concepts* is not 
> parallel:
>  * The [Programming 
> Guide|https://beam.apache.org/documentation/programming-guide/] introduces 
> all the key Beam concepts.
>  * Learn about Beam’s [execution 
> model|https://beam.apache.org/documentation/execution-model/] to better 
> understand how pipelines execute.
>  * Visit [Learning 
> Resources|https://beam.apache.org/documentation/resources/learning-resources] 
> for some of our favorite articles and talks about Beam.
> The first item should either begin with a verb, or the second two items 
> should be framed as a description of the doc.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7918) UNNEST does not work with nested records

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7918?focusedWorklogId=290843=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290843
 ]

ASF GitHub Bot logged work on BEAM-7918:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:52
Start Date: 07/Aug/19 22:52
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9288: [BEAM-7918] 
adding nested row implementation for unnest and uncollect
URL: https://github.com/apache/beam/pull/9288
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290843)
Time Spent: 1h  (was: 50m)

> UNNEST does not work with nested records
> 
>
> Key: BEAM-7918
> URL: https://issues.apache.org/jira/browse/BEAM-7918
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Sahith Nallapareddy
>Assignee: Sahith Nallapareddy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> UNNEST seems to have problems with nested rows. It assumes that the values 
> will be primitives and adds it to the resulting row, but for a nested row it 
> must go one level deeper and add the row values. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290842=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290842
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:50
Start Date: 07/Aug/19 22:50
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9277: 
[BEAM-6907] Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311794874
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1884,6 +1884,13 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+  // Set run order for basic tasks.
+  // This should be called after applyPythonNature() since TaskContainer
+  // requires task instances created first before setting the order.
+  project.ext.setTaskOrder = {
 
 Review comment:
   Seems we can do `installGcpTest.mustRunAfter configurations.distTarBall`. 
[This build](https://scans.gradle.com/s/64ylkwsnyt4go) shows correct run order. 
I'll go ahead and change the code.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290842)
Time Spent: 2.5h  (was: 2h 20m)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7874) FnApi only supports up to 10 workers

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7874?focusedWorklogId=290841=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290841
 ]

ASF GitHub Bot logged work on BEAM-7874:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:50
Start Date: 07/Aug/19 22:50
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9218: 
[BEAM-7874], [BEAM-7873] Distributed FnApiRunner bugfixs
URL: https://github.com/apache/beam/pull/9218#discussion_r311794701
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1134,11 +1134,12 @@ class GrpcServer(object):
 
   _DEFAULT_SHUTDOWN_TIMEOUT_SECS = 5
 
-  def __init__(self, state, provision_info):
+  def __init__(self, state, provision_info, num_workers):
 self.state = state
 self.provision_info = provision_info
+max_workers = max(10, num_workers)
 
 Review comment:
   no guarantee it's safe, instead, I throw out an error when we need more 
threads than we have.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290841)
Time Spent: 1h 50m  (was: 1h 40m)

> FnApi only supports up to 10 workers
> 
>
> Key: BEAM-7874
> URL: https://issues.apache.org/jira/browse/BEAM-7874
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Because max_workers of grpc servers are hardcoded to 10, it only supports up 
> to 10 workers, and if we pass more direct_num_workers greater than 10, 
> pipeline hangs, because not all workers get connected to the runner.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1141]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7874) FnApi only supports up to 10 workers

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7874?focusedWorklogId=290840=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290840
 ]

ASF GitHub Bot logged work on BEAM-7874:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:49
Start Date: 07/Aug/19 22:49
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9218: 
[BEAM-7874], [BEAM-7873] Distributed FnApiRunner bugfixs
URL: https://github.com/apache/beam/pull/9218#discussion_r311794392
 
 

 ##
 File path: sdks/python/apache_beam/runners/portability/fn_api_runner.py
 ##
 @@ -1386,7 +1386,8 @@ def get_worker_handlers(self, environment_id, 
num_workers):
 # assume it's using grpc if environment is not EMBEDDED_PYTHON.
 if environment.urn != python_urns.EMBEDDED_PYTHON and \
 self._grpc_server is None:
-  self._grpc_server = GrpcServer(self._state, self._job_provision_info)
+  self._grpc_server = GrpcServer(
+  self._state, self._job_provision_info, num_workers)
 
 Review comment:
   Now I think I understand your comment, hopefully my understanding is correct.
   Here is how I decide to throw an error.
   keep `max_workers` with GrpcServer(), and whenever we need more threads than 
`max_workers`, it throws out an error. 
   `max_workers = num_workers * len(self._environments)`
   I still don't understand why we need to multiply ` len(self._environments)` 
and how `num_workers` may change at each stage. Isn't if fixed?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290840)
Time Spent: 1h 40m  (was: 1.5h)

> FnApi only supports up to 10 workers
> 
>
> Key: BEAM-7874
> URL: https://issues.apache.org/jira/browse/BEAM-7874
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Because max_workers of grpc servers are hardcoded to 10, it only supports up 
> to 10 workers, and if we pass more direct_num_workers greater than 10, 
> pipeline hangs, because not all workers get connected to the runner.
> [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L1141]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290838=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290838
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:48
Start Date: 07/Aug/19 22:48
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9289: [BEAM-7389] Add 
code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r311793350
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+## Example
+
+Any non-string element can be converted to a string using sandard Python 
functions and methods.
+Many I/O transforms such as `TextIO` expect their input elements to be strings.
 
 Review comment:
   "transforms, such as `TextIO`, expect"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290838)
Time Spent: 35h  (was: 34h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290839=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290839
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:48
Start Date: 07/Aug/19 22:48
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9289: [BEAM-7389] Add 
code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r311794222
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+## Example
+
+Any non-string element can be converted to a string using sandard Python 
functions and methods.
 
 Review comment:
   @aaltay Should we comment that this is not a Beam transform for python? It's 
obvious if you look at the example, but my concern is that people would hunt 
around for the pydoc
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290839)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 35h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=290836=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290836
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:48
Start Date: 07/Aug/19 22:48
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9289: [BEAM-7389] Add 
code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r311793504
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+## Example
+
+Any non-string element can be converted to a string using sandard Python 
functions and methods.
+Many I/O transforms such as `TextIO` expect their input elements to be strings.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/to_string.py
 tag:to_string %}```
+
+Output `PCollection` after *to string*:
 
 Review comment:
   "The output `PCollection` from this pipeline is the following:"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290836)
Time Spent: 34h 40m  (was: 34.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 34h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7918) UNNEST does not work with nested records

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7918?focusedWorklogId=290831=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290831
 ]

ASF GitHub Bot logged work on BEAM-7918:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:42
Start Date: 07/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9288: [BEAM-7918] adding 
nested row implementation for unnest and uncollect
URL: https://github.com/apache/beam/pull/9288#issuecomment-519295083
 
 
   run SQL PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290831)
Time Spent: 50m  (was: 40m)

> UNNEST does not work with nested records
> 
>
> Key: BEAM-7918
> URL: https://issues.apache.org/jira/browse/BEAM-7918
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Sahith Nallapareddy
>Assignee: Sahith Nallapareddy
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> UNNEST seems to have problems with nested rows. It assumes that the values 
> will be primitives and adds it to the resulting row, but for a nested row it 
> must go one level deeper and add the row values. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290827=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290827
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:42
Start Date: 07/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9277: 
[BEAM-6907] Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311792681
 
 

 ##
 File path: sdks/python/test-suites/dataflow/py2/build.gradle
 ##
 @@ -48,7 +52,7 @@ task preCommitIT(dependsOn: ['sdist', 'installGcpTest']) {
 ]
 def cmdArgs = project.mapToArgString([
 "test_opts": testOpts,
-"sdk_location": "${project.buildDir}/apache-beam.tar.gz",
+"sdk_location": files(configurations.distTarBall.files).singleFile,
 
 Review comment:
   I'm afraid not. We need full path of this tarball, but 
`configurations.distTarBall` returns a string `configuration 
':sdks:python:test-suites:dataflow:py2:distTarBall'`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290827)
Time Spent: 2h 20m  (was: 2h 10m)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290828=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290828
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:42
Start Date: 07/Aug/19 22:42
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9277: 
[BEAM-6907] Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311792100
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1884,6 +1884,13 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+  // Set run order for basic tasks.
+  // This should be called after applyPythonNature() since TaskContainer
+  // requires task instances created first before setting the order.
+  project.ext.setTaskOrder = {
 
 Review comment:
   You are right, set `setupVirtualenv`run after `installGcpTest` is not 
required in here. The main purpose is to set `installGcpTest` run after 
`sdks:python:sdist` in each project.
   
   I don't know if depend on `distTarBall` will work or not but worth to try, 
so that we may be able to get rid of `setTaskOrder`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290828)
Time Spent: 2h 20m  (was: 2h 10m)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7596) List is not parallel: Make it parallel

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7596?focusedWorklogId=290819=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290819
 ]

ASF GitHub Bot logged work on BEAM-7596:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:31
Start Date: 07/Aug/19 22:31
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #8912: [BEAM-7596] 
Align wording of list in web docs
URL: https://github.com/apache/beam/pull/8912#discussion_r311789903
 
 

 ##
 File path: website/src/documentation/index.md
 ##
 @@ -29,7 +29,7 @@ This section provides in-depth conceptual information and 
reference material for
 
 Learn about the Beam Programming Model and the concepts common to all Beam 
SDKs and Runners.
 
-* The [Programming Guide]({{ site.baseurl }}/documentation/programming-guide/) 
introduces all the key Beam concepts.
+* Read through the [Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/) to get introduced to all the key Beam 
concepts.
 * Learn about Beam's [execution model]({{ site.baseurl 
}}/documentation/execution-model/) to better understand how pipelines execute.
 * Visit [Learning Resources]({{ site.baseurl 
}}/documentation/resources/learning-resources) for some of our favorite 
articles and talks about Beam.
 
 Review comment:
   Replace "for"->"to view" to finish off this list's parallel structure.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290819)
Time Spent: 50m  (was: 40m)

> List is not parallel: Make it parallel
> --
>
> Key: BEAM-7596
> URL: https://issues.apache.org/jira/browse/BEAM-7596
> Project: Beam
>  Issue Type: Bug
>  Components: website
>Reporter: Riona MacNamara
>Assignee: Jonas Grabber
>Priority: Trivial
>  Labels: Starter
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In [https://beam.apache.org/documentation/], the list under *Concepts* is not 
> parallel:
>  * The [Programming 
> Guide|https://beam.apache.org/documentation/programming-guide/] introduces 
> all the key Beam concepts.
>  * Learn about Beam’s [execution 
> model|https://beam.apache.org/documentation/execution-model/] to better 
> understand how pipelines execute.
>  * Visit [Learning 
> Resources|https://beam.apache.org/documentation/resources/learning-resources] 
> for some of our favorite articles and talks about Beam.
> The first item should either begin with a verb, or the second two items 
> should be framed as a description of the doc.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7849) UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow runner

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902513#comment-16902513
 ] 

Valentyn Tymofieiev commented on BEAM-7849:
---

Yes, closing.

> UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow 
> runner
> --
>
> Key: BEAM-7849
> URL: https://issues.apache.org/jira/browse/BEAM-7849
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
>
> pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 280, in load_session 
>
> return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
> load_session
> module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
> find_class
> return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (BEAM-7849) UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow runner

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev closed BEAM-7849.
-
Resolution: Duplicate

> UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow 
> runner
> --
>
> Key: BEAM-7849
> URL: https://issues.apache.org/jira/browse/BEAM-7849
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
>
> pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 280, in load_session 
>
> return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
> load_session
> module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
> find_class
> return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7849) UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow runner

2019-08-07 Thread Ahmet Altay (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902512#comment-16902512
 ] 

Ahmet Altay commented on BEAM-7849:
---

Valentyn should we close this as a duplicate?

> UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow 
> runner
> --
>
> Key: BEAM-7849
> URL: https://issues.apache.org/jira/browse/BEAM-7849
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
>
> pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 280, in load_session 
>
> return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
> load_session
> module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
> find_class
> return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has superclass constructor calls.

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902505#comment-16902505
 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 8/7/19 10:23 PM:


To clarify, this error is still happening. I updated the title and description 
to reflect this. 

https://github.com/apache/beam/pull/7710 removed this error on wordcount 
example, but it will still happen on other examples and may affect users 
migrating to Python 3 using currently released Beam SDKs. 

Removing a superclass call works around the issue. In particular, calling a 
superclass constructor for DoFn class [1] is not critical in current Beam SDK. 
A call of DoFn constructor triggers object initialization in [2], but there is 
also lazy initialization in place [3]. 

Nevertheless this issue may cause friction in other usecases and we should 
address it in future releases.  

[1] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/transforms/core.py#L422
[2] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/typehints/decorators.py#L201
[3] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/typehints/decorators.py#L204


was (Author: tvalentyn):
To clarify, this error is still happening. 
https://github.com/apache/beam/pull/7710 removed this error on wordcount 
example, but it will still happen on other examples and may affect users 
migrating to Python 3 using currently released Beam SDKs. 

Removing a superclass call works around the issue. In particular, calling a 
superclass constructor for DoFn class [1] is not critical in current Beam SDK. 
A call of DoFn constructor triggers object initialization in [2], but there is 
also lazy initialization in place [3]. 

Nevertheless this issue may cause friction in other usecases and we should 
address it in future releases.  

[1] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/transforms/core.py#L422
[2] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/typehints/decorators.py#L201
[3] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/typehints/decorators.py#L204

> Using --save_main_session fails on Python 3 when main module has superclass 
> constructor calls.
> --
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session   

[jira] [Work logged] (BEAM-2103) Document Python 3 support in Beam starting from 2.14.0

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-2103?focusedWorklogId=290817=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290817
 ]

ASF GitHub Bot logged work on BEAM-2103:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:23
Start Date: 07/Aug/19 22:23
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9290: [BEAM-2103] 
Clarify supported Python 3 versions on roadmap
URL: https://github.com/apache/beam/pull/9290
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290817)
Time Spent: 2h 10m  (was: 2h)

> Document Python 3 support in Beam starting from 2.14.0
> --
>
> Key: BEAM-2103
> URL: https://issues.apache.org/jira/browse/BEAM-2103
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core, website
>Affects Versions: 0.6.0
>Reporter: Tobias Kaymak
>Assignee: Rose Nguyen
>Priority: Blocker
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Beam website documentation should mention Python 3.5 - 3.7 support in 
> addition to Python 2.7. Available user documentation (e.g. quickstarts) 
> should be adjusted where needed, to accommodate for Beam Python 3 users.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7849) UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow runner

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902506#comment-16902506
 ] 

Valentyn Tymofieiev commented on BEAM-7849:
---

It's https://issues.apache.org/jira/browse/BEAM-6158. Several other examples 
are also affected. Added details on that issue.

> UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow 
> runner
> --
>
> Key: BEAM-7849
> URL: https://issues.apache.org/jira/browse/BEAM-7849
> Project: Beam
>  Issue Type: Sub-task
>  Components: examples-python
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.16.0
>
>
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
>
> pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 280, in load_session 
>
> return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
> load_session
> module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
> find_class
> return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has superclass constructor calls.

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902505#comment-16902505
 ] 

Valentyn Tymofieiev commented on BEAM-6158:
---

To clarify, this error is still happening. 
https://github.com/apache/beam/pull/7710 removed this error on wordcount 
example, but it will still happen on other examples and may affect users 
migrating to Python 3 using currently released Beam SDKs. 

Removing a superclass call works around the issue. In particular, calling a 
superclass constructor for DoFn class [1] is not critical in current Beam SDK. 
A call of DoFn constructor triggers object initialization in [2], but there is 
also lazy initialization in place [3]. 

Nevertheless this issue may cause friction in other usecases and we should 
address it in future releases.  

[1] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/transforms/core.py#L422
[2] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/typehints/decorators.py#L201
[3] 
https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/typehints/decorators.py#L204

> Using --save_main_session fails on Python 3 when main module has superclass 
> constructor calls.
> --
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several 
> Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 164, in 
> run()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
>  line 158, in run 
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
>  line 426, in __exit__
>  
> self.run().wait_until_finish()
>   File 
> "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
>  line 1338, in wait_until_finish   
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: 
> Dataflow pipeline failed. State: FAILED, Error:   
>  
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 773, in run
> self._load_main_session(self.local_staging_directory)
>   File 
> "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 
> 489, in _load_main_session
>
> pickler.load_session(session_file)
>   File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 280, in load_session 
>
> return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
> load_session
> module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
> find_class
> return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on  'dataflow_worker.start' from 
> '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> {noformat}
>  
> Note that the example has the following code [1]:
> {code:python}
> class ParseGameEventFn(beam.DoFn):
>   def __init__(self):
>     super(ParseGameEventFn, self).__init__()
> {code}
> https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/examples/complete/game/user_score.py#L81
> +cc: [~tvalentyn] 

[jira] [Work logged] (BEAM-7916) Change ElasticsearchIO query parameter to be a ValueProvider

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7916?focusedWorklogId=290815=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290815
 ]

ASF GitHub Bot logged work on BEAM-7916:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:19
Start Date: 07/Aug/19 22:19
Worklog Time Spent: 10m 
  Work Description: oliverhenlich commented on issue #9285: [BEAM-7916] - 
Change ElasticsearchIO query parameter to be a ValueProvider
URL: https://github.com/apache/beam/pull/9285#issuecomment-519289516
 
 
   Thanks @RyanSkraba. 
   The only other person in the OWNERS file is @timrobertson100. Is that ok? 
How would I go about trying to find another reviewer?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290815)
Time Spent: 50m  (was: 40m)

> Change ElasticsearchIO query parameter to be a ValueProvider
> 
>
> Key: BEAM-7916
> URL: https://issues.apache.org/jira/browse/BEAM-7916
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Oliver Henlich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> We need to be able to perform Elasticsearch queries that are dynamic. The 
> problem is {{ElasticsearchIO.read().withQuery()}} only accepts a string which 
> means the query must be known when the pipleline/Google Dataflow Template is 
> built.
> It would be great if we could change the parameter on the {{withQuery()}} 
> method from {{String}} to {{ValueProvider}}.
> Pull request: https://github.com/apache/beam/pull/9285



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7916) Change ElasticsearchIO query parameter to be a ValueProvider

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7916?focusedWorklogId=290813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290813
 ]

ASF GitHub Bot logged work on BEAM-7916:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:16
Start Date: 07/Aug/19 22:16
Worklog Time Spent: 10m 
  Work Description: oliverhenlich commented on pull request #9285: 
[BEAM-7916] - Change ElasticsearchIO query parameter to be a ValueProvider
URL: https://github.com/apache/beam/pull/9285#discussion_r311786159
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -499,9 +500,8 @@ public Read 
withConnectionConfiguration(ConnectionConfiguration connectionConfig
  * DSL
  * @return a {@link PTransform} reading data from Elasticsearch.
  */
-public Read withQuery(String query) {
+public Read withQuery(ValueProvider query) {
 
 Review comment:
   Hi @RyanSkraba. 
   Thanks for taking a look. Yes I should have done as you suggested in the 
first place to avoid people's code breaking because of my change. Pushed my 
changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290813)
Time Spent: 40m  (was: 0.5h)

> Change ElasticsearchIO query parameter to be a ValueProvider
> 
>
> Key: BEAM-7916
> URL: https://issues.apache.org/jira/browse/BEAM-7916
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Oliver Henlich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> We need to be able to perform Elasticsearch queries that are dynamic. The 
> problem is {{ElasticsearchIO.read().withQuery()}} only accepts a string which 
> means the query must be known when the pipleline/Google Dataflow Template is 
> built.
> It would be great if we could change the parameter on the {{withQuery()}} 
> method from {{String}} to {{ValueProvider}}.
> Pull request: https://github.com/apache/beam/pull/9285



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7916) Change ElasticsearchIO query parameter to be a ValueProvider

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7916?focusedWorklogId=290812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290812
 ]

ASF GitHub Bot logged work on BEAM-7916:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:16
Start Date: 07/Aug/19 22:16
Worklog Time Spent: 10m 
  Work Description: oliverhenlich commented on pull request #9285: 
[BEAM-7916] - Change ElasticsearchIO query parameter to be a ValueProvider
URL: https://github.com/apache/beam/pull/9285#discussion_r311786159
 
 

 ##
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##
 @@ -499,9 +500,8 @@ public Read 
withConnectionConfiguration(ConnectionConfiguration connectionConfig
  * DSL
  * @return a {@link PTransform} reading data from Elasticsearch.
  */
-public Read withQuery(String query) {
+public Read withQuery(ValueProvider query) {
 
 Review comment:
   Hi @RyanSkraba. 
   Thanks for taking a look. Yes I should have done as you suggested in the 
first place to avoid peoples code breaking because of my change. Pushed my 
changes.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290812)
Time Spent: 0.5h  (was: 20m)

> Change ElasticsearchIO query parameter to be a ValueProvider
> 
>
> Key: BEAM-7916
> URL: https://issues.apache.org/jira/browse/BEAM-7916
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.14.0
>Reporter: Oliver Henlich
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We need to be able to perform Elasticsearch queries that are dynamic. The 
> problem is {{ElasticsearchIO.read().withQuery()}} only accepts a string which 
> means the query must be known when the pipleline/Google Dataflow Template is 
> built.
> It would be great if we could change the parameter on the {{withQuery()}} 
> method from {{String}} to {{ValueProvider}}.
> Pull request: https://github.com/apache/beam/pull/9285



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290809=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290809
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:09
Start Date: 07/Aug/19 22:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519287208
 
 
   run java postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290809)
Time Spent: 1.5h  (was: 1h 20m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290806=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290806
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:09
Start Date: 07/Aug/19 22:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519286964
 
 
   run python2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290806)
Time Spent: 1h  (was: 50m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290808=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290808
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:09
Start Date: 07/Aug/19 22:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519287174
 
 
   run python 3.5 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290808)
Time Spent: 1h 20m  (was: 1h 10m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290807=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290807
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:09
Start Date: 07/Aug/19 22:09
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519287102
 
 
   run python 2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290807)
Time Spent: 1h 10m  (was: 1h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290804=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290804
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:08
Start Date: 07/Aug/19 22:08
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519286732
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290804)
Time Spent: 40m  (was: 0.5h)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290805=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290805
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:08
Start Date: 07/Aug/19 22:08
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519286964
 
 
   run python2 postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290805)
Time Spent: 50m  (was: 40m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290802=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290802
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:07
Start Date: 07/Aug/19 22:07
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519286690
 
 
   run xvr_flink postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290802)
Time Spent: 20m  (was: 10m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290803
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:07
Start Date: 07/Aug/19 22:07
Worklog Time Spent: 10m 
  Work Description: ihji commented on issue #9292: [BEAM-7924] Failure in 
Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292#issuecomment-519286732
 
 
   run python postcommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290803)
Time Spent: 0.5h  (was: 20m)

> Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
> 
>
> Key: BEAM-7924
> URL: https://issues.apache.org/jira/browse/BEAM-7924
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Udi Meiri
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This seems to be the root cause:
> {code}
> 11:32:59 [grpc-default-executor-1] WARN pipeline_options.get_all_options - 
> Discarding unparseable args: [u'--app_name=None', 
> u'--shutdown_sources_on_final_watermark', u'--flink_master=[auto]', 
> u'--direct_runner_use_stacked_bundle', u'--options_id=1', 
> u'--fail_on_checkpointing_errors', u'--enable_metrics', 
> u'--pipeline_type_check', u'--parallelism=2'] 
> 11:32:59 [grpc-default-executor-1] INFO sdk_worker_main.main - Python sdk 
> harness started with pipeline_options: {'runner': u'None', 'experiments': 
> [u'worker_threads=100', u'beam_fn_api'], 'environment_cache_millis': 
> u'1', 'sdk_location': u'container', 'job_name': 
> u'BeamApp-root-0807183253-57a72c22', 'save_main_session': True, 'region': 
> u'us-central1', 'sdk_worker_parallelism': u'1'}
> 11:32:59 [grpc-default-executor-1] ERROR sdk_worker_main.main - Python sdk 
> harness failed: 
> 11:32:59 Traceback (most recent call last):
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 153, in main
> 11:32:59 sdk_pipeline_options.view_as(pipeline_options.ProfilingOptions))
> 11:32:59   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/pipeline_options.py",
>  line 334, in __getattr__
> 11:32:59 (type(self).__name__, name))
> 11:32:59 AttributeError: 'PipelineOptions' object has no attribute 
> 'ProfilingOptions' 
> {code}
> https://builds.apache.org/job/beam_PostCommit_Python2_PR/58/console



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7924) Failure in Python 2 postcommit: crossLanguagePythonJavaFlink

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7924?focusedWorklogId=290801=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290801
 ]

ASF GitHub Bot logged work on BEAM-7924:


Author: ASF GitHub Bot
Created on: 07/Aug/19 22:06
Start Date: 07/Aug/19 22:06
Worklog Time Spent: 10m 
  Work Description: ihji commented on pull request #9292: [BEAM-7924] 
Failure in Python 2 postcommit: crossLanguagePythonJavaFlink
URL: https://github.com/apache/beam/pull/9292
 
 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | 

[jira] [Updated] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has superclass constructor calls.

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-6158:
--
Description: 
A typical manifestation of this failure, which can be observed on several Beam 
examples:

{noformat}
Traceback (most recent call last):
  File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
  File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File 
"/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
 line 164, in 
run()
  File 
"/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py",
 line 158, in run 
| 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
  File 
"/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py",
 line 426, in __exit__  
   
self.run().wait_until_finish()
  File 
"/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
 line 1338, in wait_until_finish   
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow 
pipeline failed. State: FAILED, Error:  
  
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", 
line 773, in run
self._load_main_session(self.local_staging_directory)
  File "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", 
line 489, in _load_main_session 
  
pickler.load_session(session_file)
  File 
"/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", line 
280, in load_session

return dill.load_session(file_path)
  File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in 
load_session
module = unpickler.load()
  File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in 
find_class
return StockUnpickler.find_class(self, module, name)
AttributeError: Can't get attribute 'ParseGameEventFn' on  {noformat}
 
Note that the example has the following code [1]:

{code:python}
class ParseGameEventFn(beam.DoFn):
  def __init__(self):
    super(ParseGameEventFn, self).__init__()
{code}

https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/examples/complete/game/user_score.py#L81

+cc: [~tvalentyn] [~robertwb] [~altay]

  was:
This happened when I run wordcount example with portable Dataflow runner in 
Python 3.5. The failure shows in worker log (unfortunately unformatted) of 
[this 
job|https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-29_11_47_38-6731484595556255542?project=google.com:clouddfe]:
{code:java}
Could not load main session: Traceback (most recent call last): File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 125, in main _load_main_session(semi_persistent_directory) File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 201, in _load_main_session pickler.load_session(session_file) File 
"/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", line 
269, in load_session return dill.load_session(file_path) File 
"/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 402, in 
load_session module = unpickler.load() File 
"/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 465, in find_class 
return StockUnpickler.find_class(self, module, name) AttributeError: Can't get 
attribute 'WordExtractingDoFn' on 
 Traceback (most recent call last): File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 125, in main _load_main_session(semi_persistent_directory) File 
"/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
 line 201, in _load_main_session pickler.load_session(session_file) File 
"/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", line 
269, in load_session return dill.load_session(file_path) File 
"/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 402, in 
load_session module = unpickler.load() File 

[jira] [Commented] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-07 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902495#comment-16902495
 ] 

Udi Meiri commented on BEAM-7860:
-

merged

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has superclass constructor calls.

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-6158:
--
Summary: Using --save_main_session fails on Python 3 when main module has 
superclass constructor calls.  (was: Enable support for save_main_session in 
Python 3)

> Using --save_main_session fails on Python 3 when main module has superclass 
> constructor calls.
> --
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This happened when I run wordcount example with portable Dataflow runner in 
> Python 3.5. The failure shows in worker log (unfortunately unformatted) of 
> [this 
> job|https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-29_11_47_38-6731484595556255542?project=google.com:clouddfe]:
> {code:java}
> Could not load main session: Traceback (most recent call last): File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 125, in main _load_main_session(semi_persistent_directory) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 201, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 269, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 402, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 465, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'WordExtractingDoFn' on  'apache_beam.runners.worker.sdk_worker_main' from 
> '/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py'>
>  Traceback (most recent call last): File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 125, in main _load_main_session(semi_persistent_directory) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 201, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 269, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 402, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 465, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'WordExtractingDoFn' on  'apache_beam.runners.worker.sdk_worker_main' from 
> '/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py'>
> {code}
> Looks like saved main session didn't work properly in Python 3.
> +cc: [~tvalentyn] [~robertwb] [~altay]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (BEAM-6158) Enable support for save_main_session in Python 3

2019-08-07 Thread Valentyn Tymofieiev (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760602#comment-16760602
 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 8/7/19 9:49 PM:
---

[~ccy] rootcaused this to: [https://github.com/uqfoundation/dill/issues/300.] 

A workaround is to not call superclass constructor in classes in main module, 
example: [https://github.com/apache/beam/pull/7710]


was (Author: tvalentyn):
[~ccy]rootcaused this to: [https://github.com/uqfoundation/dill/issues/300.] 

A workaround is to not call superclass constructor in classes in main module, 
example: https://github.com/apache/beam/pull/7710

> Enable support for save_main_session in Python 3
> 
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Mark Liu
>Assignee: Valentyn Tymofieiev
>Priority: Major
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This happened when I run wordcount example with portable Dataflow runner in 
> Python 3.5. The failure shows in worker log (unfortunately unformatted) of 
> [this 
> job|https://pantheon.corp.google.com/dataflow/jobsDetail/locations/us-central1/jobs/2018-11-29_11_47_38-6731484595556255542?project=google.com:clouddfe]:
> {code:java}
> Could not load main session: Traceback (most recent call last): File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 125, in main _load_main_session(semi_persistent_directory) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 201, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 269, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 402, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 465, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'WordExtractingDoFn' on  'apache_beam.runners.worker.sdk_worker_main' from 
> '/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py'>
>  Traceback (most recent call last): File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 125, in main _load_main_session(semi_persistent_directory) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py",
>  line 201, in _load_main_session pickler.load_session(session_file) File 
> "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", 
> line 269, in load_session return dill.load_session(file_path) File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 402, in 
> load_session module = unpickler.load() File 
> "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 465, in 
> find_class return StockUnpickler.find_class(self, module, name) 
> AttributeError: Can't get attribute 'WordExtractingDoFn' on  'apache_beam.runners.worker.sdk_worker_main' from 
> '/usr/local/lib/python3.5/site-packages/apache_beam/runners/worker/sdk_worker_main.py'>
> {code}
> Looks like saved main session didn't work properly in Python 3.
> +cc: [~tvalentyn] [~robertwb] [~altay]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290792=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290792
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:48
Start Date: 07/Aug/19 21:48
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9277: 
[BEAM-6907] Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311777665
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1820,15 +1794,23 @@ class BeamModulePlugin implements Plugin {
 
   project.ext.toxTask = { name, tox_env ->
 project.tasks.create(name) {
-  dependsOn = ['sdist']
+  dependsOn 'setupVirtualenv'
+  dependsOn ':sdks:python:sdist'
+
   doLast {
+// Python source directory is also tox execution workspace, We want
+// to isolate them per tox suite to avoid conflict when running
+// multiple tox suites in parallel.
+project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
+
 def copiedPyRoot = "${copiedSrcRoot}/sdks/python"
+def distTarBall = "${pythonRootDir}/build/apache-beam.tar.gz"
 project.exec {
   executable 'sh'
-  args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env 
${project.buildDir}/apache-beam.tar.gz"
+  args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env $distTarBall"
 
 Review comment:
   tox will build tarball for venv install automatically if not provided. This 
build depends on shared file when running in parallel and cause our test flaky. 
So we prebuild the tarball and pass it from `--installpkg` flag to avoid that 
issue.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290792)
Time Spent: 2h 10m  (was: 2h)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290791=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290791
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:47
Start Date: 07/Aug/19 21:47
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9277: 
[BEAM-6907] Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311777665
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1820,15 +1794,23 @@ class BeamModulePlugin implements Plugin {
 
   project.ext.toxTask = { name, tox_env ->
 project.tasks.create(name) {
-  dependsOn = ['sdist']
+  dependsOn 'setupVirtualenv'
+  dependsOn ':sdks:python:sdist'
+
   doLast {
+// Python source directory is also tox execution workspace, We want
+// to isolate them per tox suite to avoid conflict when running
+// multiple tox suites in parallel.
+project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
+
 def copiedPyRoot = "${copiedSrcRoot}/sdks/python"
+def distTarBall = "${pythonRootDir}/build/apache-beam.tar.gz"
 project.exec {
   executable 'sh'
-  args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env 
${project.buildDir}/apache-beam.tar.gz"
+  args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env $distTarBall"
 
 Review comment:
   tox will build tarball for venv install automatically if not provided. This 
build depends on shared file when running in parallel and cause our test flaky. 
So we prebuild the tarball and pass it from `--installpkg` flag.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290791)
Time Spent: 2h  (was: 1h 50m)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7925) ParquetIO supports neither column projection nor filter predicate

2019-08-07 Thread Neville Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Li updated BEAM-7925:
-
Issue Type: Bug  (was: Improvement)

> ParquetIO supports neither column projection nor filter predicate
> -
>
> Key: BEAM-7925
> URL: https://issues.apache.org/jira/browse/BEAM-7925
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-parquet
>Affects Versions: 2.14.0
>Reporter: Neville Li
>Priority: Major
>
> Current {{ParquetIO}} supports neither column projection nor filter predicate 
> which defeats the performance motivation of using Parquet in the first place. 
> That's why we have our own implementation of 
> [ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
> Scio.
> Reading Parquet as Avro with column projection has some complications, 
> namely, the resulting Avro records may be incomplete and will not survive 
> ser/de. A workaround maybe provide a {{TypedRead}} interface that takes a 
> {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7860) v1new ReadFromDatastore returns duplicates if keys are of mixed types

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7860?focusedWorklogId=290786=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290786
 ]

ASF GitHub Bot logged work on BEAM-7860:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:30
Start Date: 07/Aug/19 21:30
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9240: [BEAM-7860] 
Python Datastore: fix key sort order
URL: https://github.com/apache/beam/pull/9240
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290786)
Time Spent: 2h 10m  (was: 2h)

> v1new ReadFromDatastore returns duplicates if keys are of mixed types
> -
>
> Key: BEAM-7860
> URL: https://issues.apache.org/jira/browse/BEAM-7860
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Affects Versions: 2.13.0
> Environment: Python 2.7
> Python 3.7
>Reporter: Niels Stender
>Assignee: Udi Meiri
>Priority: Blocker
> Fix For: 2.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In the presence of mixed type keys, v1new ReadFromDatastore may return 
> duplicate items. The attached example returns 4 records, not the expected 3.
>  
> {code:java}
> // code placeholder
> from __future__ import unicode_literals
> import apache_beam as beam
> from apache_beam.io.gcp.datastore.v1new.types import Key, Entity, Query
> from apache_beam.io.gcp.datastore.v1new import datastoreio
> config = dict(project='your-google-project', namespace='test')
> def test_mixed():
> keys = [
> Key(['mixed', '10038260-iperm_eservice'], **config),
> Key(['mixed', 4812224868188160], **config),
> Key(['mixed', '99152975-pointshop'], **config)
> ]
> entities = map(lambda key: Entity(key=key), keys)
> with beam.Pipeline() as p:
> (p
> | beam.Create(entities)
> | datastoreio.WriteToDatastore(project=config['project'])
> )
> query = Query(kind='mixed', **config)
> with beam.Pipeline() as p:
> (p
> | datastoreio.ReadFromDatastore(query=query, num_splits=4)
> | beam.io.WriteToText('tmp.txt', num_shards=1, 
> shard_name_template='')
> )
> items = open('tmp.txt').read().strip().split('\n')
> assert len(items) == 3, 'incorrect number of items'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7925) ParquetIO supports neither column projection nor filter predicate

2019-08-07 Thread Neville Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Li updated BEAM-7925:
-
Description: 
Current {{ParquetIO}} supports neither column projection nor filter predicate 
which defeats the performance motivation of using Parquet in the first place. 
That's why we have our own implementation of 
[ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
Scio.

Reading Parquet as Avro with column projection has some complications, namely, 
the resulting Avro records may be incomplete and will not survive ser/de. A 
workaround maybe provide a {{TypedRead}} interface that takes a {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}.

  was:
Current `ParquetIO` supports neither column projection nor filter predicate 
which defeats the performance motivation of using Parquet in the first place. 
That's why we have our own implementation of 
[ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
Scio.

Reading Parquet as Avro with column projection has some complications, namely, 
the resulting Avro records may be incomplete and will not survive ser/de. A 
workaround maybe provide a {{TypedRead}} interface that takes a {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}.


> ParquetIO supports neither column projection nor filter predicate
> -
>
> Key: BEAM-7925
> URL: https://issues.apache.org/jira/browse/BEAM-7925
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Affects Versions: 2.14.0
>Reporter: Neville Li
>Priority: Major
>
> Current {{ParquetIO}} supports neither column projection nor filter predicate 
> which defeats the performance motivation of using Parquet in the first place. 
> That's why we have our own implementation of 
> [ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
> Scio.
> Reading Parquet as Avro with column projection has some complications, 
> namely, the resulting Avro records may be incomplete and will not survive 
> ser/de. A workaround maybe provide a {{TypedRead}} interface that takes a 
> {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7747) ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) Fails on Windows

2019-08-07 Thread Pablo Estrada (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pablo Estrada resolved BEAM-7747.
-
   Resolution: Fixed
Fix Version/s: 2.15.0

> ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro) Fails on 
> Windows
> -
>
> Key: BEAM-7747
> URL: https://issues.apache.org/jira/browse/BEAM-7747
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-avro, test-failures
>Reporter: Valentyn Tymofieiev
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> ==
> ERROR: test_sink_transform (apache_beam.io.avroio_test.TestFastAvro)
> --
> Traceback (most recent call last):
>   File "C:\projects\beam\sdks\python\apache_beam\io\avroio_test.py", line 
> 436, in test_sink_transform
> | avroio.WriteToAvro(path, self.SCHEMA, use_fastavro=self.use_fastavro)
>   File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 426, in 
> __exit__
> self.run().wait_until_finish()
>   File "C:\projects\beam\sdks\python\apache_beam\testing\test_pipeline.py", 
> line 107, in run
> else test_runner_api))
>   File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 406, in 
> run
> self._options).run(False)
>   File "C:\projects\beam\sdks\python\apache_beam\pipeline.py", line 419, in 
> run
> return self.runner.run_pipeline(self, self._options)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\direct\direct_runner.py", 
> line 128, in run_pipeline
> return runner.run_pipeline(pipeline, options)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 319, in run_pipeline
> default_environment=self._default_environment))
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 326, in run_via_runner_api
> return self.run_stages(stage_context, stages)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 408, in run_stages
> stage_context.safe_coders)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 681, in _run_stage
> result, splits = bundle_manager.process_bundle(data_input, data_output)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1562, in process_bundle
> part_inputs):
>   File "C:\venv\newenv1\lib\site-packages\concurrent\futures\_base.py", line 
> 641, in result_iterator
> yield fs.pop().result()
>   File "C:\venv\newenv1\lib\site-packages\concurrent\futures\_base.py", line 
> 462, in result
> return self.__get_result()
>   File "C:\venv\newenv1\lib\site-packages\concurrent\futures\thread.py", line 
> 63, in run
> result = self.fn(*self.args, **self.kwargs)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1561, in 
> self._registered).process_bundle(part, expected_outputs),
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1500, in process_bundle
> result_future = self._controller.control_handler.push(process_bundle_req)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\portability\fn_api_runner.py",
>  line 1017, in push
> response = self.worker.do_instruction(request)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\sdk_worker.py", line 
> 342, in do_instruction
> request.instruction_id)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\sdk_worker.py", line 
> 368, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\bundle_processor.py",
>  line 593, in process_bundle
> data.ptransform_id].process_encoded(data.data)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\bundle_processor.py",
>  line 143, in process_encoded
> self.output(decoded_value)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\operations.py", line 
> 256, in output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\operations.py", line 
> 143, in receive
> self.consumer.process(windowed_value)
>   File 
> "C:\projects\beam\sdks\python\apache_beam\runners\worker\operations.py", line 
> 594, in process
> delayed_application = self.dofn_receiver.receive(o)
>   File "C:\projects\beam\sdks\python\apache_beam\runners\common.py", line 
> 795, in receive
> 

[jira] [Commented] (BEAM-7846) add test for BEAM-7689

2019-08-07 Thread Pablo Estrada (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902484#comment-16902484
 ] 

Pablo Estrada commented on BEAM-7846:
-

Has this been fixed?

> add test for BEAM-7689
> --
>
> Key: BEAM-7846
> URL: https://issues.apache.org/jira/browse/BEAM-7846
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-files, io-py-files
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> add test for BEAM-7689 and also Python counterpart so make sure that it won't 
> come back :)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7925) ParquetIO supports neither column projection nor filter predicate

2019-08-07 Thread Neville Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neville Li updated BEAM-7925:
-
Description: 
Current `ParquetIO` supports neither column projection nor filter predicate 
which defeats the performance motivation of using Parquet in the first place. 
That's why we have our own implementation of 
[ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
Scio.

Reading Parquet as Avro with column projection has some complications, namely, 
the resulting Avro records may be incomplete and will not survive ser/de. A 
workaround maybe provide a {{TypedRead}} interface that takes a {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}.

  was:
Current `ParquetIO` supports neither column projection nor filter predicate 
which defeats the performance motivation of using Parquet in the first place. 
That's why we have our own implementation of 
[ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
Scio.

Reading Parquet as Avro with column projection has some complications, namely, 
the resulting Avro records may be incomplete and will not survive ser/de. A 
workaround maybe provide a {{TypedRead}} interface that takes a {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}


> ParquetIO supports neither column projection nor filter predicate
> -
>
> Key: BEAM-7925
> URL: https://issues.apache.org/jira/browse/BEAM-7925
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Affects Versions: 2.14.0
>Reporter: Neville Li
>Priority: Major
>
> Current `ParquetIO` supports neither column projection nor filter predicate 
> which defeats the performance motivation of using Parquet in the first place. 
> That's why we have our own implementation of 
> [ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
> Scio.
> Reading Parquet as Avro with column projection has some complications, 
> namely, the resulting Avro records may be incomplete and will not survive 
> ser/de. A workaround maybe provide a {{TypedRead}} interface that takes a 
> {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7925) ParquetIO supports neither column projection nor filter predicate

2019-08-07 Thread Neville Li (JIRA)
Neville Li created BEAM-7925:


 Summary: ParquetIO supports neither column projection nor filter 
predicate
 Key: BEAM-7925
 URL: https://issues.apache.org/jira/browse/BEAM-7925
 Project: Beam
  Issue Type: Improvement
  Components: io-java-parquet
Affects Versions: 2.14.0
Reporter: Neville Li


Current `ParquetIO` supports neither column projection nor filter predicate 
which defeats the performance motivation of using Parquet in the first place. 
That's why we have our own implementation of 
[ParquetIO|https://github.com/spotify/scio/tree/master/scio-parquet/src] in 
Scio.

Reading Parquet as Avro with column projection has some complications, namely, 
the resulting Avro records may be incomplete and will not survive ser/de. A 
workaround maybe provide a {{TypedRead}} interface that takes a {{Function}} that maps invalid Avro {{A}} into user defined type {{B}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7918) UNNEST does not work with nested records

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7918?focusedWorklogId=290779=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290779
 ]

ASF GitHub Bot logged work on BEAM-7918:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:15
Start Date: 07/Aug/19 21:15
Worklog Time Spent: 10m 
  Work Description: snallapa commented on pull request #9288: [BEAM-7918] 
adding nested row implementation for unnest and uncollect
URL: https://github.com/apache/beam/pull/9288#discussion_r311766376
 
 

 ##
 File path: 
sdks/java/extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUncollectRelTest.java
 ##
 @@ -44,4 +57,56 @@ public void testNodeStats() {
 Assert.assertEquals(4d, estimate.getWindow(), 0.001);
 Assert.assertEquals(0., estimate.getRate(), 0.001);
   }
+
+  @Test
+  public void testUncollectPrimitive() {
+registerTable(
+"PRIMITIVE",
+TestBoundedTable.of(
+Schema.FieldType.STRING,
+"user_id",
+Schema.FieldType.array(Schema.FieldType.INT32),
+"ints")
+.addRows("1", Arrays.asList(1, 2, 3)));
+
+String sql = "SELECT * FROM unnest(SELECT ints from PRIMITIVE)";
 
 Review comment:
   yup thats better thank you
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290779)
Time Spent: 40m  (was: 0.5h)

> UNNEST does not work with nested records
> 
>
> Key: BEAM-7918
> URL: https://issues.apache.org/jira/browse/BEAM-7918
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Affects Versions: 2.15.0, 2.16.0
>Reporter: Sahith Nallapareddy
>Assignee: Sahith Nallapareddy
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> UNNEST seems to have problems with nested rows. It assumes that the values 
> will be primitives and adds it to the resulting row, but for a nested row it 
> must go one level deeper and add the row values. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290777=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290777
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:12
Start Date: 07/Aug/19 21:12
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9277: [BEAM-6907] 
Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311752794
 
 

 ##
 File path: sdks/python/test-suites/dataflow/py2/build.gradle
 ##
 @@ -48,7 +52,7 @@ task preCommitIT(dependsOn: ['sdist', 'installGcpTest']) {
 ]
 def cmdArgs = project.mapToArgString([
 "test_opts": testOpts,
-"sdk_location": "${project.buildDir}/apache-beam.tar.gz",
+"sdk_location": files(configurations.distTarBall.files).singleFile,
 
 Review comment:
   I am ok to keep this. It feels like there may be a shorter way to reference 
the file defined by this configuration. Did you try 
`configurations.distTarBall`? Somehow it seems to work in 
https://github.com/apache/beam/blob/28a40572778883128c8f486d460003e98bb0e67e/sdks/python/container/build.gradle#L46
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290777)
Time Spent: 1h 40m  (was: 1.5h)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290778=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290778
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:12
Start Date: 07/Aug/19 21:12
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9277: [BEAM-6907] 
Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311761858
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1884,6 +1884,13 @@ class BeamModulePlugin implements Plugin {
   }
 }
   }
+  // Set run order for basic tasks.
+  // This should be called after applyPythonNature() since TaskContainer
+  // requires task instances created first before setting the order.
+  project.ext.setTaskOrder = {
 
 Review comment:
   Hm... this feels a bit hacky. setupVirtualEnv is already in `dependsOn` of 
installGcpTest. 
   
   As for `sdks:python:sdist`, is it possible to add a dependency on 
`distTarBall` configuration instead?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290778)
Time Spent: 1h 50m  (was: 1h 40m)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-6907) Standardize Gradle projects/tasks structure for Python SDK

2019-08-07 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-6907?focusedWorklogId=290776=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-290776
 ]

ASF GitHub Bot logged work on BEAM-6907:


Author: ASF GitHub Bot
Created on: 07/Aug/19 21:12
Start Date: 07/Aug/19 21:12
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9277: [BEAM-6907] 
Reuse Python tarball in tox & dataflow integration tests
URL: https://github.com/apache/beam/pull/9277#discussion_r311743039
 
 

 ##
 File path: 
buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy
 ##
 @@ -1820,15 +1794,23 @@ class BeamModulePlugin implements Plugin {
 
   project.ext.toxTask = { name, tox_env ->
 project.tasks.create(name) {
-  dependsOn = ['sdist']
+  dependsOn 'setupVirtualenv'
+  dependsOn ':sdks:python:sdist'
+
   doLast {
+// Python source directory is also tox execution workspace, We want
+// to isolate them per tox suite to avoid conflict when running
+// multiple tox suites in parallel.
+project.copy { from project.pythonSdkDeps; into copiedSrcRoot }
+
 def copiedPyRoot = "${copiedSrcRoot}/sdks/python"
+def distTarBall = "${pythonRootDir}/build/apache-beam.tar.gz"
 project.exec {
   executable 'sh'
-  args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env 
${project.buildDir}/apache-beam.tar.gz"
+  args '-c', ". ${project.ext.envdir}/bin/activate && cd 
${copiedPyRoot} && scripts/run_tox.sh $tox_env $distTarBall"
 
 Review comment:
   Unrelated to this PR - do you remember why do we pass a tarball to tox 
suite? It looks like we started doing that with 
https://github.com/apache/beam/pull/8067.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 290776)
Time Spent: 1.5h  (was: 1h 20m)

> Standardize Gradle projects/tasks structure for Python SDK
> --
>
> Key: BEAM-6907
> URL: https://issues.apache.org/jira/browse/BEAM-6907
> Project: Beam
>  Issue Type: Task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> As Gradle parallelism applied to Python tests and more python versions added 
> to tests, the way Gradle manages projects/tasks changed a lot. Frictions are 
> generated during Gradle refactor since some projects defined separate build 
> script under source directory. Thus, It will be better to standardize how we 
> use Gradle. This will help to manage Python tests/builds/tasks across 
> different versions and runners, and also easy for people to learn/use/develop.
> In general, we may want to:
> - Apply parallel execution
> - Share common tasks
> - Centralize test related tasks
> - Have a clear Gradle structure for projects/tasks



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


  1   2   3   >