[jira] [Commented] (BEAM-7821) Wheels build on osx fails in Travis

2019-08-13 Thread yifan zou (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906905#comment-16906905
 ] 

yifan zou commented on BEAM-7821:
-

Got the same error in 2.15 wheel.  Changing the xcode version to 9.4 solved 
this problem.

> Wheels build on osx fails in Travis 
> 
>
> Key: BEAM-7821
> URL: https://issues.apache.org/jira/browse/BEAM-7821
> Project: Beam
>  Issue Type: Bug
>  Components: build-system, sdk-py-core
>Reporter: Anton Kedin
>Priority: Major
> Fix For: 2.16.0
>
>
> Attempt to build wheels on OSX in Travis for 2.14.0 RC1 failed due to 
> inability to check the certificate of dist.apache.org: 
> {code}
> --2019-07-25 18:28:31--  
> https://dist.apache.org/repos/dist/dev/beam/2.14.0/python/apache-beam-2.14.0.zip
> Resolving dist.apache.org... 209.188.14.144
> Connecting to dist.apache.org|209.188.14.144|:443... connected.
> ERROR: cannot verify dist.apache.org's certificate, issued by ‘CN=Sectigo 
> RSA Domain Validation Secure Server CA,O=Sectigo Limited,L=Salford,ST=Greater 
> Manchester,C=GB’:
>   Unable to locally verify the issuer's authority.
> To connect to dist.apache.org insecurely, use `--no-check-certificate'.
> {code}
> Full Travis Log: 
> https://gist.github.com/akedin/a5b50dbd0ecacff538186cbb9d7f6bca



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7976) Documentation for word count example is unclear about inputFile vs pom.xml

2019-08-13 Thread niklas Hansson (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niklas Hansson updated BEAM-7976:
-
Description: 
Currently the documentation for the word count example mentions the input file 
and give the following example:

```Java
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
 -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
``
. As far as I believe the pom.xml is not supposed to be the input file in this 
case. I will create a PR  if someone could confirm that my assumption is 
correct :) 

/Niklas

  was:
Currently the documentation for the word count example mentions the input file 
and give the following example:

```Java
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
 -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
``
. As far as I believe the pom.xml is not supposed to be the input file in this 
case. I will create a PR  if some one could confirm that my assumption is 
correct :) 

/Niklas


> Documentation for word count example is unclear about inputFile vs pom.xml
> --
>
> Key: BEAM-7976
> URL: https://issues.apache.org/jira/browse/BEAM-7976
> Project: Beam
>  Issue Type: Bug
>  Components: examples-java
>Affects Versions: 2.14.0
>Reporter: niklas Hansson
>Assignee: niklas Hansson
>Priority: Minor
>
> Currently the documentation for the word count example mentions the input 
> file and give the following example:
> ```Java
> $ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
> ``
> . As far as I believe the pom.xml is not supposed to be the input file in 
> this case. I will create a PR  if someone could confirm that my assumption is 
> correct :) 
> /Niklas



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7976) Documentation for word count example is unclear about inputFile vs pom.xml

2019-08-13 Thread niklas Hansson (JIRA)
niklas Hansson created BEAM-7976:


 Summary: Documentation for word count example is unclear about 
inputFile vs pom.xml
 Key: BEAM-7976
 URL: https://issues.apache.org/jira/browse/BEAM-7976
 Project: Beam
  Issue Type: Bug
  Components: examples-java
Affects Versions: 2.14.0
Reporter: niklas Hansson
Assignee: niklas Hansson


Currently the documentation for the word count example mentions the input file 
and give the following example:

```Java
$ mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
 -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
``
. As far as I believe the pom.xml is not supposed to be the input file in this 
case. I will create a PR  if some one could confirm that my assumption is 
correct :) 

/Niklas



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7856) BigQuery table creation race condition error when executing pipeline on multiple workers

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7856?focusedWorklogId=294340=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294340
 ]

ASF GitHub Bot logged work on BEAM-7856:


Author: ASF GitHub Bot
Created on: 14/Aug/19 02:05
Start Date: 14/Aug/19 02:05
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9204: [BEAM-7856] 
Suppress error on table bigquery table already exists
URL: https://github.com/apache/beam/pull/9204#discussion_r313679387
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
 ##
 @@ -659,12 +659,19 @@ def get_or_create_table(
 if found_table and write_disposition != BigQueryDisposition.WRITE_TRUNCATE:
   return found_table
 else:
-  created_table = self._create_table(
-  project_id=project_id,
-  dataset_id=dataset_id,
-  table_id=table_id,
-  schema=schema or found_table.schema,
-  additional_parameters=additional_create_parameters)
+  created_table = None
+  try:
+created_table = self._create_table(
+project_id=project_id,
+dataset_id=dataset_id,
+table_id=table_id,
+schema=schema or found_table.schema,
+additional_parameters=additional_create_parameters)
+  except HttpError as exn:
+if exn.status_code == 409:
 
 Review comment:
   I agree, Using a single element transform to create the table would be idle.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294340)
Time Spent: 1h 10m  (was: 1h)

> BigQuery table creation race condition error when executing pipeline on 
> multiple workers
> 
>
> Key: BEAM-7856
> URL: https://issues.apache.org/jira/browse/BEAM-7856
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This is non-fatal issue and just prints error in the logs as far as I can 
> tell.
> The issue is when we check and create big query table on multiple workers at 
> the same time. This causes the race condition.
>  
> {noformat}
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute response = task() File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in  self._execute(lambda: worker.do_instruction(work), 
> work) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction request.instruction_id) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle data.ptransform_id].process_encoded(data.data) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded self.output(decoded_value) File 
> "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output def output(self, 
> windowed_value, output_index=0): File 
> "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive 
> self.consumer.process(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process with 
> self.scoped_process_state: File "apache_beam/runners/worker/operations.py", 
> line 594, in apache_beam.runners.worker.operations.DoOperation.process 
> delayed_application = self.dofn_receiver.receive(o) File 
> "apache_beam/runners/common.py", line 799, in 
> apache_beam.runners.common.DoFnRunner.receive self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 805, in 
> apache_beam.runners.common.DoFnRunner.process self._reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 857, in 
> 

[jira] [Work logged] (BEAM-7856) BigQuery table creation race condition error when executing pipeline on multiple workers

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7856?focusedWorklogId=294341=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294341
 ]

ASF GitHub Bot logged work on BEAM-7856:


Author: ASF GitHub Bot
Created on: 14/Aug/19 02:05
Start Date: 14/Aug/19 02:05
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9204: [BEAM-7856] 
Suppress error on table bigquery table already exists
URL: https://github.com/apache/beam/pull/9204
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294341)
Time Spent: 1h 20m  (was: 1h 10m)

> BigQuery table creation race condition error when executing pipeline on 
> multiple workers
> 
>
> Key: BEAM-7856
> URL: https://issues.apache.org/jira/browse/BEAM-7856
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is non-fatal issue and just prints error in the logs as far as I can 
> tell.
> The issue is when we check and create big query table on multiple workers at 
> the same time. This causes the race condition.
>  
> {noformat}
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute response = task() File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in  self._execute(lambda: worker.do_instruction(work), 
> work) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction request.instruction_id) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle data.ptransform_id].process_encoded(data.data) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded self.output(decoded_value) File 
> "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output def output(self, 
> windowed_value, output_index=0): File 
> "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive 
> self.consumer.process(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process with 
> self.scoped_process_state: File "apache_beam/runners/worker/operations.py", 
> line 594, in apache_beam.runners.worker.operations.DoOperation.process 
> delayed_application = self.dofn_receiver.receive(o) File 
> "apache_beam/runners/common.py", line 799, in 
> apache_beam.runners.common.DoFnRunner.receive self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 805, in 
> apache_beam.runners.common.DoFnRunner.process self._reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 857, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented raise File 
> "apache_beam/runners/common.py", line 803, in 
> apache_beam.runners.common.DoFnRunner.process return 
> self.do_fn_invoker.invoke_process(windowed_value) File 
> "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> self._invoke_process_per_window( File "apache_beam/runners/common.py", line 
> 682, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window 
> output_processor.process_outputs( File "apache_beam/runners/common.py", line 
> 903, in apache_beam.runners.common._OutputProcessor.process_outputs def 
> process_outputs(self, windowed_input_element, results): File 
> "apache_beam/runners/common.py", line 942, in 
> apache_beam.runners.common._OutputProcessor.process_outputs 
> self.main_receivers.receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive 
> 

[jira] [Commented] (BEAM-7856) BigQuery table creation race condition error when executing pipeline on multiple workers

2019-08-13 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906800#comment-16906800
 ] 

Ankur Goenka commented on BEAM-7856:


The right fix for this would be to use a single element transform to create the 
table before writing to it. Some thing similar to what java does here 
https://github.com/apache/beam/blob/08d0146791e38be4641ff80ffb2539cdc81f5b6d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingInserts.java#L178

For now PR #9204 is a stop gap solution to mitigate this error.

> BigQuery table creation race condition error when executing pipeline on 
> multiple workers
> 
>
> Key: BEAM-7856
> URL: https://issues.apache.org/jira/browse/BEAM-7856
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is non-fatal issue and just prints error in the logs as far as I can 
> tell.
> The issue is when we check and create big query table on multiple workers at 
> the same time. This causes the race condition.
>  
> {noformat}
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute response = task() File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in  self._execute(lambda: worker.do_instruction(work), 
> work) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction request.instruction_id) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle data.ptransform_id].process_encoded(data.data) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded self.output(decoded_value) File 
> "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output def output(self, 
> windowed_value, output_index=0): File 
> "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive 
> self.consumer.process(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process with 
> self.scoped_process_state: File "apache_beam/runners/worker/operations.py", 
> line 594, in apache_beam.runners.worker.operations.DoOperation.process 
> delayed_application = self.dofn_receiver.receive(o) File 
> "apache_beam/runners/common.py", line 799, in 
> apache_beam.runners.common.DoFnRunner.receive self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 805, in 
> apache_beam.runners.common.DoFnRunner.process self._reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 857, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented raise File 
> "apache_beam/runners/common.py", line 803, in 
> apache_beam.runners.common.DoFnRunner.process return 
> self.do_fn_invoker.invoke_process(windowed_value) File 
> "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> self._invoke_process_per_window( File "apache_beam/runners/common.py", line 
> 682, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window 
> output_processor.process_outputs( File "apache_beam/runners/common.py", line 
> 903, in apache_beam.runners.common._OutputProcessor.process_outputs def 
> process_outputs(self, windowed_input_element, results): File 
> "apache_beam/runners/common.py", line 942, in 
> apache_beam.runners.common._OutputProcessor.process_outputs 
> self.main_receivers.receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive 
> self.consumer.process(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process with 
> self.scoped_process_state: File "apache_beam/runners/worker/operations.py", 
> line 594, in apache_beam.runners.worker.operations.DoOperation.process 
> delayed_application = self.dofn_receiver.receive(o) File 
> 

[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
this when running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 (or more) workers in streaming mode. I don't see 
> this when running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 or 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 or 7 workers in streaming mode. I don't see this when 
> running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I don't see this when 
running with 1 worker

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I haven't checked to 
see if number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 and 7 workers in streaming mode. I don't see this when 
> running with 1 worker
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Hutchison updated BEAM-7975:
--
Description: 
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 5 and 7 workers in streaming mode. I haven't checked to 
see if number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.

  was:
{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 7 workers in streaming mode. I haven't checked to see if 
number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.


> error syncing pod - failed to start container artifact (python SDK)
> ---
>
> Key: BEAM-7975
> URL: https://issues.apache.org/jira/browse/BEAM-7975
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-harness
>Affects Versions: 2.13.0
>Reporter: James Hutchison
>Priority: Major
>
> {code:java}
> Error syncing pod 5966e59c (" name>-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
> "StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
> restarting failed container=artifact pod= name>-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
> Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
> seeing anything. Messages appear about every 0.5 - 5 seconds
> I've been trying to efficiently scale my streaming pipeline and found that 
> adding more workers / dividing into more groups seems to have minimal 
> improvement. Perhaps this is part of the problem?
> One pipeline which never completed (got to one of the last steps and then log 
> messages simply ceased without error on the workers) had this going on in the 
> kubelet logs. I checked some of my other streaming pipelines and found the 
> same thing going on, even though they would complete.
> In a couple of my streaming pipelines, I've gotten the following error 
> message, despite the pipeline eventually finishing:
> {code:java}
> Processing stuck in step s01 for at least 05m00s without outputting or 
> completing in state process{code}
> Perhaps they are related?
> This is running with 5 and 7 workers in streaming mode. I haven't checked to 
> see if number of workers is a factor.
> The pipeline uses requirements.txt and setup.py, as well as using an extra 
> package and using save_main_session.



--
This message was sent by Atlassian JIRA

[jira] [Created] (BEAM-7975) error syncing pod - failed to start container artifact (python SDK)

2019-08-13 Thread James Hutchison (JIRA)
James Hutchison created BEAM-7975:
-

 Summary: error syncing pod - failed to start container artifact 
(python SDK)
 Key: BEAM-7975
 URL: https://issues.apache.org/jira/browse/BEAM-7975
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Affects Versions: 2.13.0
Reporter: James Hutchison


{code:java}
Error syncing pod 5966e59c ("-08131110-7hcg-harness-fbm2_default(5966e59c)"), skipping: failed to 
"StartContainer" for "artifact" with CrashLoopBackOff: "Back-off 5m0s 
restarting failed container=artifact pod=-08131110-7hcg-harness-fbm2_default(5966.e59c)"{code}
Seeing these in streaming pipeline. Running pipeline in batch mode I'm not 
seeing anything. Messages appear about every 0.5 - 5 seconds

I've been trying to efficiently scale my streaming pipeline and found that 
adding more workers / dividing into more groups seems to have minimal 
improvement. Perhaps this is part of the problem?

One pipeline which never completed (got to one of the last steps and then log 
messages simply ceased without error on the workers) had this going on in the 
kubelet logs. I checked some of my other streaming pipelines and found the same 
thing going on, even though they would complete.

In a couple of my streaming pipelines, I've gotten the following error message, 
despite the pipeline eventually finishing:
{code:java}
Processing stuck in step s01 for at least 05m00s without outputting or 
completing in state process{code}
Perhaps they are related?

This is running with 7 workers in streaming mode. I haven't checked to see if 
number of workers is a factor.

The pipeline uses requirements.txt and setup.py, as well as using an extra 
package and using save_main_session.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=294319=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294319
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:24
Start Date: 14/Aug/19 01:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #7061: [BEAM-5191] 
Support for BigQuery clustering
URL: https://github.com/apache/beam/pull/7061#issuecomment-521067100
 
 
   Closing since this was included in https://github.com/apache/beam/pull/8945
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294319)
Time Spent: 16h 20m  (was: 16h 10m)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Jeff Klukas
>Priority: Minor
>  Labels: features, newbie
> Fix For: 2.15.0
>
>  Time Spent: 16h 20m
>  Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5191) Add support for writing to BigQuery clustered tables

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5191?focusedWorklogId=294320=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294320
 ]

ASF GitHub Bot logged work on BEAM-5191:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:24
Start Date: 14/Aug/19 01:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #7061: 
[BEAM-5191] Support for BigQuery clustering
URL: https://github.com/apache/beam/pull/7061
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294320)
Time Spent: 16.5h  (was: 16h 20m)

> Add support for writing to BigQuery clustered tables
> 
>
> Key: BEAM-5191
> URL: https://issues.apache.org/jira/browse/BEAM-5191
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-gcp
>Affects Versions: 2.6.0
>Reporter: Robert Sahlin
>Assignee: Jeff Klukas
>Priority: Minor
>  Labels: features, newbie
> Fix For: 2.15.0
>
>  Time Spent: 16.5h
>  Remaining Estimate: 0h
>
> Google recently added support for clustered tables in BigQuery. It would be 
> useful to set clustering columns the same way as for partitioning. It should 
> support multiple fields (4) for clustering.
> For example:
> [BigQueryIO.Write|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]<[T|https://beam.apache.org/documentation/sdks/javadoc/2.6.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html]>
>  .withClustering(new Clustering().setField("productId").setType("STRING"))



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7667) report GCS throttling time to Dataflow autoscaler

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7667?focusedWorklogId=294318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294318
 ]

ASF GitHub Bot logged work on BEAM-7667:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:22
Start Date: 14/Aug/19 01:22
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #8973: 
[BEAM-7667] report GCS throttling time to Dataflow autoscaler
URL: https://github.com/apache/beam/pull/8973
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294318)
Time Spent: 2h 10m  (was: 2h)

> report GCS throttling time to Dataflow autoscaler
> -
>
> Key: BEAM-7667
> URL: https://issues.apache.org/jira/browse/BEAM-7667
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> report GCS throttling time to Dataflow autoscaler.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7866?focusedWorklogId=294317=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294317
 ]

ASF GitHub Bot logged work on BEAM-7866:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:19
Start Date: 14/Aug/19 01:19
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9233:  
[BEAM-7866] Fix python ReadFromMongoDB potential data loss issue
URL: https://github.com/apache/beam/pull/9233
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294317)
Time Spent: 11h 20m  (was: 11h 10m)

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 11h 20m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7866) Python MongoDB IO performance and correctness issues

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7866?focusedWorklogId=294316=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294316
 ]

ASF GitHub Bot logged work on BEAM-7866:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:18
Start Date: 14/Aug/19 01:18
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9233:  [BEAM-7866] Fix 
python ReadFromMongoDB potential data loss issue
URL: https://github.com/apache/beam/pull/9233#issuecomment-521066083
 
 
   Eugene seems to be OOO. So I'll go ahead and merge this so that this can be 
cherry-picked to 2.15.0 release.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294316)
Time Spent: 11h 10m  (was: 11h)

> Python MongoDB IO performance and correctness issues
> 
>
> Key: BEAM-7866
> URL: https://issues.apache.org/jira/browse/BEAM-7866
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Yichi Zhang
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/mongodbio.py
>  splits the query result by computing number of results in constructor, and 
> then in each reader re-executing the whole query and getting an index 
> sub-range of those results.
> This is broken in several critical ways:
> - The order of query results returned by find() is not necessarily 
> deterministic, so the idea of index ranges on it is meaningless: each shard 
> may basically get random, possibly overlapping subsets of the total results
> - Even if you add order by `_id`, the database may be changing concurrently 
> to reading and splitting. E.g. if the database contained documents with ids 
> 10 20 30 40 50, and this was split into shards 0..2 and 3..5 (under the 
> assumption that these shards would contain respectively 10 20 30, and 40 50), 
> and then suppose shard 10 20 30 is read and then document 25 is inserted - 
> then the 3..5 shard will read 30 40 50, i.e. document 30 is duplicated and 
> document 25 is lost.
> - Every shard re-executes the query and skips the first start_offset items, 
> which in total is quadratic complexity
> - The query is first executed in the constructor in order to count results, 
> which 1) means the constructor can be super slow and 2) it won't work at all 
> if the database is unavailable at the time the pipeline is constructed (e.g. 
> if this is a template).
> Unfortunately, none of these issues are caught by SourceTestUtils: this class 
> has extensive coverage with it, and the tests pass. This is because the tests 
> return the same results in the same order. I don't know how to catch this 
> automatically, and I don't know how to catch the performance issue 
> automatically, but these would all be important follow-up items after the 
> actual fix.
> CC: [~chamikara] as reviewer.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=294313=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294313
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:11
Start Date: 14/Aug/19 01:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r313671666
 
 

 ##
 File path: 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/CoderTranslationTest.java
 ##
 @@ -74,6 +78,14 @@
   FullWindowedValueCoder.of(
   IterableCoder.of(VarLongCoder.of()), 
IntervalWindowCoder.of()))
   .add(DoubleCoder.of())
+  .add(
 
 Review comment:
   Oh wow good catch! I've actually changed the coder translation now to create 
instances of `PortableSchemaCoder`, which shouldn't have that problem. I also 
reverted all the changes to RowCoder, so it's back to being a `CustomCoder`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294313)
Time Spent: 2h 50m  (was: 2h 40m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=294314=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294314
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 14/Aug/19 01:11
Start Date: 14/Aug/19 01:11
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on pull request #9188: 
[BEAM-7886] Make row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#discussion_r313671690
 
 

 ##
 File path: 
runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/CoderTranslators.java
 ##
 @@ -118,6 +122,32 @@ public T fromComponents(List> components) {
 };
   }
 
+  static CoderTranslator row() {
+return new CoderTranslator() {
+  @Override
+  public List> getComponents(RowCoder from) {
+return ImmutableList.of();
+  }
+
+  @Override
+  public byte[] getPayload(RowCoder from) {
+return SchemaTranslation.toProto(from.getSchema()).toByteArray();
+  }
+
+  @Override
+  public RowCoder fromComponents(List> components, byte[] 
payload) {
+// Assert that components are empty?
 
 Review comment:
   Done
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294314)
Time Spent: 3h  (was: 2h 50m)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (BEAM-7974) Make RowCoder package-private

2019-08-13 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette reassigned BEAM-7974:
---

Assignee: Brian Hulette

> Make RowCoder package-private
> -
>
> Key: BEAM-7974
> URL: https://issues.apache.org/jira/browse/BEAM-7974
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Minor
>
> RowCoder is currently public in sdk.coders, tempting people to use it 
> directly. But the Schemas API is written such that everyone should be using 
> SchemaCoder, and RowCoder should be an implementation detail.
> Unfortunately this isn't a trivial change, I tried to do it and resolve the 
> few dependencies that cropped up, but running RowCoderTest yielded the 
> following error:
> {code:java}
> tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
> java.lang.IllegalAccessError: tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:159)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:54)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.encode(CoderProperties.java:334)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.decodeEncode(CoderProperties.java:362)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqualInContext(CoderProperties.java:104)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:94)
> {code}
> My attempt is available at 
> https://github.com/TheNeuralBit/beam/commit/869b8c6ba2f554bf56d8df70a754b76ef38dbc89



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7974) Make RowCoder package-private

2019-08-13 Thread Brian Hulette (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906771#comment-16906771
 ] 

Brian Hulette commented on BEAM-7974:
-

[~reuvenlax] any ideas on how to resolve this?

> Make RowCoder package-private
> -
>
> Key: BEAM-7974
> URL: https://issues.apache.org/jira/browse/BEAM-7974
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Brian Hulette
>Priority: Minor
>
> RowCoder is currently public in sdk.coders, tempting people to use it 
> directly. But the Schemas API is written such that everyone should be using 
> SchemaCoder, and RowCoder should be an implementation detail.
> Unfortunately this isn't a trivial change, I tried to do it and resolve the 
> few dependencies that cropped up, but running RowCoderTest yielded the 
> following error:
> {code:java}
> tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
> java.lang.IllegalAccessError: tried to access class 
> org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
> org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
> Source)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:159)
>   at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:54)
>   at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.encode(CoderProperties.java:334)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.decodeEncode(CoderProperties.java:362)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqualInContext(CoderProperties.java:104)
>   at 
> org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:94)
> {code}
> My attempt is available at 
> https://github.com/TheNeuralBit/beam/commit/869b8c6ba2f554bf56d8df70a754b76ef38dbc89



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7974) Make RowCoder package-private

2019-08-13 Thread Brian Hulette (JIRA)
Brian Hulette created BEAM-7974:
---

 Summary: Make RowCoder package-private
 Key: BEAM-7974
 URL: https://issues.apache.org/jira/browse/BEAM-7974
 Project: Beam
  Issue Type: Improvement
  Components: sdk-java-core
Reporter: Brian Hulette


RowCoder is currently public in sdk.coders, tempting people to use it directly. 
But the Schemas API is written such that everyone should be using SchemaCoder, 
and RowCoder should be an implementation detail.

Unfortunately this isn't a trivial change, I tried to do it and resolve the few 
dependencies that cropped up, but running RowCoderTest yielded the following 
error:

{code:java}
tried to access class 
org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
java.lang.IllegalAccessError: tried to access class 
org.apache.beam.sdk.schemas.RowCoderGenerator$EncodeInstruction from class 
org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3
at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
Source)
at org.apache.beam.sdk.coders.Coder$ByteBuddy$abBJo3R3.encode(Unknown 
Source)
at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:159)
at org.apache.beam.sdk.schemas.RowCoder.encode(RowCoder.java:54)
at org.apache.beam.sdk.coders.Coder.encode(Coder.java:136)
at 
org.apache.beam.sdk.testing.CoderProperties.encode(CoderProperties.java:334)
at 
org.apache.beam.sdk.testing.CoderProperties.decodeEncode(CoderProperties.java:362)
at 
org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqualInContext(CoderProperties.java:104)
at 
org.apache.beam.sdk.testing.CoderProperties.coderDecodeEncodeEqual(CoderProperties.java:94)
{code}

My attempt is available at 
https://github.com/TheNeuralBit/beam/commit/869b8c6ba2f554bf56d8df70a754b76ef38dbc89




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7886) Make row coder a standard coder and implement in python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7886?focusedWorklogId=294307=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294307
 ]

ASF GitHub Bot logged work on BEAM-7886:


Author: ASF GitHub Bot
Created on: 14/Aug/19 00:42
Start Date: 14/Aug/19 00:42
Worklog Time Spent: 10m 
  Work Description: TheNeuralBit commented on issue #9188: [BEAM-7886] Make 
row coder a standard coder and implement in Python
URL: https://github.com/apache/beam/pull/9188#issuecomment-521060327
 
 
   R: @reuvenlax would you mind taking a look at the model changes and the Java 
changes? I can separate out the relevant commits in their own PR if that helps.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294307)
Time Spent: 2h 40m  (was: 2.5h)

> Make row coder a standard coder and implement in python
> ---
>
> Key: BEAM-7886
> URL: https://issues.apache.org/jira/browse/BEAM-7886
> Project: Beam
>  Issue Type: Improvement
>  Components: beam-model, sdk-java-core, sdk-py-core
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-13 Thread Rui Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906758#comment-16906758
 ] 

Rui Wang commented on BEAM-7049:


[~sridharG]

I would also encourage you keep a WIP PR in 
https://github.com/apache/beam/pulls. A few benefits:

1. You can always know the impact on CI from your changes.
2. People who may care can have a preview on it.


 

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7973) Python doesn't shut down job server properly

2019-08-13 Thread Kyle Weaver (JIRA)
Kyle Weaver created BEAM-7973:
-

 Summary: Python doesn't shut down job server properly
 Key: BEAM-7973
 URL: https://issues.apache.org/jira/browse/BEAM-7973
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-harness
Reporter: Kyle Weaver


Using the new Python FlinkRunner [1], a new job server is created and the job 
succeeds, but seemingly not being shut down properly when the Python command 
exits. Specifically, the java -jar command that started the job server is still 
running in the background, eating up memory.

Relevant args:

python ...
 --runner FlinkRunner \ 
 --flink_job_server_jar $FLINK_JOB_SERVER_JAR ...

[1] [https://github.com/apache/beam/pull/9043]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7760) Interactive Beam Caching PCollections bound to user defined vars in notebook

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7760?focusedWorklogId=294288=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294288
 ]

ASF GitHub Bot logged work on BEAM-7760:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:59
Start Date: 13/Aug/19 23:59
Worklog Time Spent: 10m 
  Work Description: KevinGG commented on issue #9278: [BEAM-7760] Added 
iBeam module
URL: https://github.com/apache/beam/pull/9278#issuecomment-521052414
 
 
   Hi all, I'll leave another 3 days for 
[design](https://docs.google.com/document/d/1DYWrT6GL_qDCXhRMoxpjinlVAfHeVilK5Mtf8gO6zxQ/edit?usp=sharing)
 review. Then we can have a vote session if there is no objections.
   
   Thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294288)
Time Spent: 2h 50m  (was: 2h 40m)

> Interactive Beam Caching PCollections bound to user defined vars in notebook
> 
>
> Key: BEAM-7760
> URL: https://issues.apache.org/jira/browse/BEAM-7760
> Project: Beam
>  Issue Type: New Feature
>  Components: examples-python
>Reporter: Ning Kang
>Assignee: Ning Kang
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Cache only PCollections bound to user defined variables in a pipeline when 
> running pipeline with interactive runner in jupyter notebooks.
> [Interactive 
> Beam|[https://github.com/apache/beam/tree/master/sdks/python/apache_beam/runners/interactive]]
>  has been caching and using caches of "leaf" PCollections for interactive 
> execution in jupyter notebooks.
> The interactive execution is currently supported so that when appending new 
> transforms to existing pipeline for a new run, executed part of the pipeline 
> doesn't need to be re-executed. 
> A PCollection is "leaf" when it is never used as input in any PTransform in 
> the pipeline.
> The problem with building caches and pipeline to execute around "leaf" is 
> that when a PCollection is consumed by a sink with no output, the pipeline to 
> execute built will miss the subgraph generating and consuming that 
> PCollection.
> An example, "ReadFromPubSub --> WirteToPubSub" will result in an empty 
> pipeline.
> Caching around PCollections bound to user defined variables and replacing 
> transforms with source and sink of caches could resolve the pipeline to 
> execute properly under the interactive execution scenario. Also, cached 
> PCollection now can trace back to user code and can be used for user data 
> visualization if user wants to do it.
> E.g.,
> {code:java}
> // ...
> p = beam.Pipeline(interactive_runner.InteractiveRunner(),
>   options=pipeline_options)
> messages = p | "Read" >> beam.io.ReadFromPubSub(subscription='...')
> messages | "Write" >> beam.io.WriteToPubSub(topic_path)
> result = p.run()
> // ...
> visualize(messages){code}
>  The interactive runner automatically figures out that PCollection
> {code:java}
> messages{code}
> created by
> {code:java}
> p | "Read" >> beam.io.ReadFromPubSub(subscription='...'){code}
> should be cached and reused if the notebook user appends more transforms.
>  And once the pipeline gets executed, the user could use any 
> visualize(PCollection) module to visualize the data statically (batch) or 
> dynamically (stream)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7840) Create MapTuple and FlatMapTuple to ease migration to Python 3.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7840?focusedWorklogId=294289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294289
 ]

ASF GitHub Bot logged work on BEAM-7840:


Author: ASF GitHub Bot
Created on: 14/Aug/19 00:00
Start Date: 14/Aug/19 00:00
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #9168: [BEAM-7840] 
Provide MapTuple and FlatMapTuple for Python 3 users.
URL: https://github.com/apache/beam/pull/9168#discussion_r313659791
 
 

 ##
 File path: sdks/python/apache_beam/transforms/core.py
 ##
 @@ -1316,6 +1322,144 @@ def Map(fn, *args, **kwargs):  # pylint: 
disable=invalid-name
   return pardo
 
 
+def MapTuple(fn, *args, **kwargs):  # pylint: disable=invalid-name
+  r""":func:`MapTuple` is like :func:`Map` but expects tuple inputs and
+  flattens them into multiple input arguments.
+
+  beam.MapTuple(lambda a, b, ...: ...)
+
+  is equivalent to Python 2
+
+  beam.Map(lambda (a, b, ...), ...: ...)
+
+  In other words
+
+  beam.MapTuple(fn)
+
+  is equivalent to
+
+  beam.Map(lambda element, ...: fn(\*element, ...))
+
+  This can be useful when processing a PCollection of tuples
+  (e.g. key-value pairs).
+
+  Args:
+fn (callable): a callable object.
+*args: positional arguments passed to the transform callable.
+**kwargs: keyword arguments passed to the transform callable.
+
+  Returns:
+~apache_beam.pvalue.PCollection:
+A :class:`~apache_beam.pvalue.PCollection` containing the
+:func:`MapTuple` outputs.
+
+  Raises:
+~exceptions.TypeError: If the **fn** passed as argument is not a callable.
+  Typical error is to pass a :class:`DoFn` instance which is supported only
+  for :class:`ParDo`.
+  """
+  if not callable(fn):
+raise TypeError(
+'MapTuple can be used only with callable objects. '
+'Received %r instead.' % (fn))
+
+  label = 'MapTuple(%s)' % ptransform.label_from_callable(fn)
+
+  argspec = getfullargspec(fn)
+  num_defaults = len(argspec.defaults or ())
+  if num_defaults < len(args) + len(kwargs):
+raise TypeError('Side inputs must have defaults for MapTuple.')
+
+  if argspec.defaults or args or kwargs:
+wrapper = lambda x, *args, **kwargs: [fn(*(tuple(x) + args), **kwargs)]
 
 Review comment:
   I was looking at this code and wondered why is there a `tuple(x)` not `x`, 
since x should always be tuple according to the docstring? The tests pass with 
a plain `x`.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294289)
Time Spent: 1h 40m  (was: 1.5h)

> Create MapTuple and FlatMapTuple to ease migration to Python 3.
> ---
>
> Key: BEAM-7840
> URL: https://issues.apache.org/jira/browse/BEAM-7840
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core
>Reporter: Robert Bradshaw
>Assignee: Robert Bradshaw
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> These are like Map and FlatMap but expand out tuple input elements across
> several arguments. This will be useful as tuple argument unpacking has been
> removed in Python 3. Instead of having to convert
> Map(lambda (k, v): expresion(k, v))
> into
> Map(lambda k_v: expression(k_v[0], k_v[1]))
> one can now write
> MapTuple(lambda k, v: expression(k, v))



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=294287=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294287
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:54
Start Date: 13/Aug/19 23:54
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on pull request #9335: 
[WIP][BEAM-7909] customized containers for python
URL: https://github.com/apache/beam/pull/9335
 
 
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Commented] (BEAM-7049) Merge multiple input to one BeamUnionRel

2019-08-13 Thread sridhar Reddy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906734#comment-16906734
 ] 

sridhar Reddy commented on BEAM-7049:
-

Made a couple of simple cases work 

1)select 1 from order_details union select 2 from order_details union select 3 
from order_details

2)  select c2 from FirstOne union select c2 from FirstOne union select c2 from 
FirstOne

The implementation included manipulating and hardcoding of "inputs" in

[https://github.com/apache/beam/blob/946596b32c06419209ef157dcc742d6189b1c4f4/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamUnionRel.java#L75]

and changes to accept three inputs in the following

[https://github.com/apache/beam/blob/946596b32c06419209ef157dcc742d6189b1c4f4/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/rel/BeamSetOperatorRelBase.java#L61]

and

[https://github.com/apache/beam/blob/946596b32c06419209ef157dcc742d6189b1c4f4/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/transform/BeamSetOperatorsTransforms.java#L60]

 

Thank you for the additional information [~amaliujia] . I will be working on 
understanding it and also looking to improve on my implementation to accept 2 
as well as 3 inputs in the proper places.

 

 

> Merge multiple input to one BeamUnionRel
> 
>
> Key: BEAM-7049
> URL: https://issues.apache.org/jira/browse/BEAM-7049
> Project: Beam
>  Issue Type: Improvement
>  Components: dsl-sql
>Reporter: Rui Wang
>Assignee: sridhar Reddy
>Priority: Major
>
> BeamUnionRel assumes inputs are two and rejects more. So `a UNION b UNION c` 
> will have to be created as UNION(a, UNION(b, c)) and have two shuffles. If 
> BeamUnionRel can handle multiple shuffles, we will have only one shuffle



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7474) Add SDK harness containers for Py 3.6, Py 3.7

2019-08-13 Thread Hannah Jiang (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906730#comment-16906730
 ] 

Hannah Jiang commented on BEAM-7474:


I am working on customized container for python and introducing minor versioned 
container.

[~frederik], do you mind I take over this task?

> Add SDK harness containers for Py 3.6, Py 3.7
> -
>
> Key: BEAM-7474
> URL: https://issues.apache.org/jira/browse/BEAM-7474
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-harness
>Reporter: Valentyn Tymofieiev
>Assignee: Frederik Bode
>Priority: Major
>
> Currently we can build a Py3-compatible container image with gradle by 
> running:
> ./gradlew  :sdks:python:container:py3:docker 
> This builds a docker container image like: 
> valentyn-docker-apache.bintray.io/beam/python3 
> The code for this is defined in: 
> https://github.com/apache/beam/blob/ae60a72b03f3a2b6b2a06667ec1868a7acc8e38f/sdks/python/container/py3/build.gradle#L48
> To support portable runners that use a container (e.g. Flink) on multiple 
> versions of Python 3,  we should make it possible to build Python 
> 3-compatible SDK harness containers bundled with any desired python version. 
> We could have several gradle projects:
>   :sdks:python:container:py35:docker
>   :sdks:python:container:py36:docker
>   :sdks:python:container:py37:docker
> and several Dockerfiles to support this:
>  
>   sdks/python/container/py35/Dockerfile
>   sdks/python/container/py36/Dockerfile
>   sdks/python/container/py37/Dockerfile
> The only difference right now would be the base image used in FROM field in 
> Dockerfile. 
> Alternatively, we could have one parameterized Dockerfile that starts with :
> {code}
> ARG BASE_IMAGE
> FROM $BASE_IMAGE
> ...
> {code}
> I think the latter approach, may result in complications later if these 
> containers will need to diverge down the road.
> cc'ing a few folks who may have some feedback on this: [~angoenka] [~mxm] 
> [~robertwb] [~Juta] [~frederik].



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with with windowed pcollection

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=294284=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294284
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:40
Start Date: 13/Aug/19 23:40
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #9334: [BEAM-7972] 
Always use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#discussion_r313655400
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -634,12 +630,13 @@ def restore_timestamps(element, window=DoFn.WindowParam):
 
 ungrouped = pcoll | Map(reify_timestamps)
 ungrouped._windowing = Windowing(
-window_fn,
+window.GlobalWindows(),
 triggerfn=AfterCount(1),
 accumulation_mode=AccumulationMode.DISCARDING,
 timestamp_combiner=TimestampCombiner.OUTPUT_AT_EARLIEST)
 result = (ungrouped
   | GroupByKey()
+  | WindowInto(window_fn)
 
 Review comment:
   seems line 641 is restoring the window, does applying window_fn here make a 
difference?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294284)
Time Spent: 50m  (was: 40m)

> Portable Python Reshuffle does not work with with windowed pcollection
> --
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with with windowed pcollection

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=294282=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294282
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:39
Start Date: 13/Aug/19 23:39
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #9334: [BEAM-7972] 
Always use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#discussion_r313655400
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -634,12 +630,13 @@ def restore_timestamps(element, window=DoFn.WindowParam):
 
 ungrouped = pcoll | Map(reify_timestamps)
 ungrouped._windowing = Windowing(
-window_fn,
+window.GlobalWindows(),
 triggerfn=AfterCount(1),
 accumulation_mode=AccumulationMode.DISCARDING,
 timestamp_combiner=TimestampCombiner.OUTPUT_AT_EARLIEST)
 result = (ungrouped
   | GroupByKey()
+  | WindowInto(window_fn)
 
 Review comment:
   seems line 641 is restoring the window, does applying window_fn here make a 
difference?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294282)
Time Spent: 40m  (was: 0.5h)

> Portable Python Reshuffle does not work with with windowed pcollection
> --
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with with windowed pcollection

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=294280=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294280
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:38
Start Date: 13/Aug/19 23:38
Worklog Time Spent: 10m 
  Work Description: y1chi commented on pull request #9334: [BEAM-7972] 
Always use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#discussion_r313655400
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -634,12 +630,13 @@ def restore_timestamps(element, window=DoFn.WindowParam):
 
 ungrouped = pcoll | Map(reify_timestamps)
 ungrouped._windowing = Windowing(
-window_fn,
+window.GlobalWindows(),
 triggerfn=AfterCount(1),
 accumulation_mode=AccumulationMode.DISCARDING,
 timestamp_combiner=TimestampCombiner.OUTPUT_AT_EARLIEST)
 result = (ungrouped
   | GroupByKey()
+  | WindowInto(window_fn)
 
 Review comment:
   seems line 641 is restoring the window, is applying window_fn here makes a 
difference?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294280)
Time Spent: 0.5h  (was: 20m)

> Portable Python Reshuffle does not work with with windowed pcollection
> --
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7907) Support customized container for python

2019-08-13 Thread Hannah Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hannah Jiang updated BEAM-7907:
---
Summary: Support customized container for python  (was: Support customized 
container for python streaming)

> Support customized container for python
> ---
>
> Key: BEAM-7907
> URL: https://issues.apache.org/jira/browse/BEAM-7907
> Project: Beam
>  Issue Type: New Feature
>  Components: build-system, sdk-py-harness
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Labels: portability
>
> Support customized container.
> Scope of this ticket is *python*.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=294276=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294276
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:04
Start Date: 13/Aug/19 23:04
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on pull request #9156: 
[BEAM-7495] Add fine-grained progress reporting
URL: https://github.com/apache/beam/pull/9156
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294276)
Time Spent: 12h 50m  (was: 12h 40m)
Remaining Estimate: 491h 10m  (was: 491h 20m)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 12h 50m
>  Remaining Estimate: 491h 10m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with with windowed pcollection

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=294272=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294272
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:01
Start Date: 13/Aug/19 23:01
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #9334: [BEAM-7972] Always 
use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#issuecomment-521040904
 
 
   R: @robertwb @y1chi  @tweise 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294272)
Time Spent: 20m  (was: 10m)

> Portable Python Reshuffle does not work with with windowed pcollection
> --
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with with windowed pcollection

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=294271=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294271
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 13/Aug/19 23:00
Start Date: 13/Aug/19 23:00
Worklog Time Spent: 10m 
  Work Description: angoenka commented on pull request #9334: [BEAM-7972] 
Always use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334
 
 
   …ow again.
   
   **Please** add a meaningful description for your change here
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Created] (BEAM-7972) Portable Python Reshuffle does not work with with windowed pcollection

2019-08-13 Thread Ankur Goenka (JIRA)
Ankur Goenka created BEAM-7972:
--

 Summary: Portable Python Reshuffle does not work with with 
windowed pcollection
 Key: BEAM-7972
 URL: https://issues.apache.org/jira/browse/BEAM-7972
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core
Reporter: Ankur Goenka
Assignee: Ankur Goenka


Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.

The issue happen because of window function gets deserialized on java side 
which is not possible and hence default to global window function and result 
into window function mismatch later down the code.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294265=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294265
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:53
Start Date: 13/Aug/19 22:53
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-521034816
 
 
   > > > We'll need to break this PR up into two pieces eventually:
   > > > 
   > > > 1. A piece which just vendors calcite (so that it can go through the 
release process)
   > > > 2. The change which consumes vendored calcite.
   > > 
   > > 
   > > Yes. If `vendor/calcite-1_20_0/build.gradle`looks good, I'll open a 
self-contained pr for build vendored calcite package and we can go through the 
release process.
   > 
   > Yes, the build.gradle looks fine but I didn't validate where the vendored 
calcite artifact actually works. I'm relying on your proof of concept to 
validate the contents are good and meet the Beam SQL needs. If it doesn't work, 
you can always produce another update version and release 0.2.
   
   Thanks Luke. I opened pr #9333 for releasing vendor calcite. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294265)
Time Spent: 4h  (was: 3h 50m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-7971) Pycharm debugger for apache_beam/*_test.py broken

2019-08-13 Thread Udi Meiri (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906696#comment-16906696
 ] 

Udi Meiri commented on BEAM-7971:
-

"Run" for these tests still works from within Pycharm, "Debug" is broken.

> Pycharm debugger for apache_beam/*_test.py broken
> -
>
> Key: BEAM-7971
> URL: https://issues.apache.org/jira/browse/BEAM-7971
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core, testing
>Reporter: Udi Meiri
>Priority: Minor
>
> This currently affects pipeline_test.py and pvalue_test.py.
> It seems that "import io" is interpreted as importing apache_beam.io, which 
> fails.
> In Python 2.7 the stacktrace shows:
> {code}
> Testing started at 3:48 PM ...
> /usr/local/google/home/ehudm/virtualenvs/beamenv/bin/python 
> /usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/pydevd.py
>  --multiproc --qt-support=auto --client 127.0.0.1 --port 41493 --file 
> /usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pycharm/_jb_nosetest_runner.py
>  --path 
> /usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pvalue_test.py
> Traceback (most recent call last):
>   File 
> "/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/pydevd.py",
>  line 15, in 
> from _pydevd_bundle.pydevd_constants import IS_JYTH_LESS25, 
> IS_PY34_OR_GREATER, IS_PY36_OR_GREATER, IS_PYCHARM, get_thread_id, \
>   File 
> "/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/_pydevd_bundle/pydevd_constants.py",
>  line 169, in 
> from _pydev_imps._pydev_saved_modules import thread
>   File 
> "/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/_pydev_imps/_pydev_saved_modules.py",
>  line 15, in 
> import xmlrpclib
>   File "/usr/lib/python2.7/xmlrpclib.py", line 145, in 
> import httplib
>   File "/usr/lib/python2.7/httplib.py", line 80, in 
> import mimetools
>   File "/usr/lib/python2.7/mimetools.py", line 6, in 
> import tempfile
>   File "/usr/lib/python2.7/tempfile.py", line 32, in 
> import io as _io
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/io/__init__.py",
>  line 22, in 
> from apache_beam.io.avroio import *
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/__init__.py", 
> line 97, in 
> from apache_beam import coders
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/coders/__init__.py",
>  line 19, in 
> from apache_beam.coders.coders import *
>   File 
> "/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/coders/coders.py",
>  line 27, in 
> from builtins import object
>   File 
> "/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/builtins/__init__.py",
>  line 8, in 
> from future.builtins import *
>   File 
> "/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/future/builtins/__init__.py",
>  line 13, in 
> from future.builtins.misc import (ascii, chr, hex, input, isinstance, 
> next,
>   File 
> "/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/future/builtins/misc.py",
>  line 43, in 
> from io import open
> ImportError: cannot import name open
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7971) Pycharm debugger for apache_beam/*_test.py broken

2019-08-13 Thread Udi Meiri (JIRA)
Udi Meiri created BEAM-7971:
---

 Summary: Pycharm debugger for apache_beam/*_test.py broken
 Key: BEAM-7971
 URL: https://issues.apache.org/jira/browse/BEAM-7971
 Project: Beam
  Issue Type: Bug
  Components: sdk-py-core, testing
Reporter: Udi Meiri


This currently affects pipeline_test.py and pvalue_test.py.
It seems that "import io" is interpreted as importing apache_beam.io, which 
fails.

In Python 2.7 the stacktrace shows:
{code}
Testing started at 3:48 PM ...
/usr/local/google/home/ehudm/virtualenvs/beamenv/bin/python 
/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/pydevd.py
 --multiproc --qt-support=auto --client 127.0.0.1 --port 41493 --file 
/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pycharm/_jb_nosetest_runner.py
 --path 
/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/pvalue_test.py
Traceback (most recent call last):
  File 
"/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/pydevd.py",
 line 15, in 
from _pydevd_bundle.pydevd_constants import IS_JYTH_LESS25, 
IS_PY34_OR_GREATER, IS_PY36_OR_GREATER, IS_PYCHARM, get_thread_id, \
  File 
"/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/_pydevd_bundle/pydevd_constants.py",
 line 169, in 
from _pydev_imps._pydev_saved_modules import thread
  File 
"/usr/local/google/home/ehudm/.local/share/JetBrains/Toolbox/apps/PyCharm-C/ch-0/191.7479.30/helpers/pydev/_pydev_imps/_pydev_saved_modules.py",
 line 15, in 
import xmlrpclib
  File "/usr/lib/python2.7/xmlrpclib.py", line 145, in 
import httplib
  File "/usr/lib/python2.7/httplib.py", line 80, in 
import mimetools
  File "/usr/lib/python2.7/mimetools.py", line 6, in 
import tempfile
  File "/usr/lib/python2.7/tempfile.py", line 32, in 
import io as _io
  File 
"/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/io/__init__.py", 
line 22, in 
from apache_beam.io.avroio import *
  File 
"/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/__init__.py", 
line 97, in 
from apache_beam import coders
  File 
"/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/coders/__init__.py",
 line 19, in 
from apache_beam.coders.coders import *
  File 
"/usr/local/google/home/ehudm/src/beam/sdks/python/apache_beam/coders/coders.py",
 line 27, in 
from builtins import object
  File 
"/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/builtins/__init__.py",
 line 8, in 
from future.builtins import *
  File 
"/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/future/builtins/__init__.py",
 line 13, in 
from future.builtins.misc import (ascii, chr, hex, input, isinstance, next,
  File 
"/usr/local/google/home/ehudm/virtualenvs/beamenv/local/lib/python2.7/site-packages/future/builtins/misc.py",
 line 43, in 
from io import open
ImportError: cannot import name open
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294252=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294252
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:37
Start Date: 13/Aug/19 22:37
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9333: [BEAM-5820] release 
vendor calcite
URL: https://github.com/apache/beam/pull/9333#issuecomment-521034975
 
 
   R: @lukecwik 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294252)
Time Spent: 3h 50m  (was: 3h 40m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294251=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294251
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:36
Start Date: 13/Aug/19 22:36
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-521034816
 
 
   > > > We'll need to break this PR up into two pieces eventually:
   > > > 
   > > > 1. A piece which just vendors calcite (so that it can go through the 
release process)
   > > > 2. The change which consumes vendored calcite.
   > > 
   > > 
   > > Yes. If `vendor/calcite-1_20_0/build.gradle`looks good, I'll open a 
self-contained pr for build vendored calcite package and we can go through the 
release process.
   > 
   > Yes, the build.gradle looks fine but I didn't validate where the vendored 
calcite artifact actually works. I'm relying on your proof of concept to 
validate the contents are good and meet the Beam SQL needs. If it doesn't work, 
you can always produce another update version and release 0.2.
   
   sure. I opened pr #9333 for releasing vendor calcite.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294251)
Time Spent: 3h 40m  (was: 3.5h)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294249=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294249
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:34
Start Date: 13/Aug/19 22:34
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on pull request #9333: [BEAM-5820] 
release vendor calcite
URL: https://github.com/apache/beam/pull/9333
 
 
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_PVR_Flink_Cron/lastCompletedBuild/)
 | --- | --- | [![Build 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294228
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313628088
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
 
 Review comment:
   "while any other argument will be passed to `split_dataset`."  ->
   "while `plant` and `ratio` are passed to `split_dataset`."
   
   This might be repetitive, but makes the sentence significantly clearer. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294228)
Time Spent: 45.5h  (was: 45h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 45.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294238
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313635098
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
+
+To decide which partition to send each element, we'll have different buckets.
+For our case `[8, 2]` will have *8+2*=**10** buckets, where the first 8 
buckets represent the first partition and the last 2 buckets represent the 
second partition.
+
+First, we need to make sure that the ratio's length corresponds to the 
`num_partitions` we pass.
 
 Review comment:
   "need to make sure"-"check"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294239=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294239
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313635697
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
+
+To decide which partition to send each element, we'll have different buckets.
+For our case `[8, 2]` will have *8+2*=**10** buckets, where the first 8 
buckets represent the first partition and the last 2 buckets represent the 
second partition.
+
+First, we need to make sure that the ratio's length corresponds to the 
`num_partitions` we pass.
+We then get a bucket index for each element, in the range from 0 to 10 
(`num_buckets-1`).
+We could do `hash(element) % len(ratio)`, but instead we'll sum all the ASCII 
characters of the JSON representation to make it deterministic.
 
 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294233=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294233
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313632550
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
 
 Review comment:
   Delete this sentence if you choose the rewrite above
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294233)
Time Spent: 45h 50m  (was: 45h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294231=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294231
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313632402
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
 
 Review comment:
   "and an additional argument `ratio` that describes the ratio of the 
split."->"and an additional argument `ratio`, which is a list of numbers that 
represents that ratio of items in the partitions."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294231)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294236=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294236
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313635985
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
+
+To decide which partition to send each element, we'll have different buckets.
+For our case `[8, 2]` will have *8+2*=**10** buckets, where the first 8 
buckets represent the first partition and the last 2 buckets represent the 
second partition.
+
+First, we need to make sure that the ratio's length corresponds to the 
`num_partitions` we pass.
+We then get a bucket index for each element, in the range from 0 to 10 
(`num_buckets-1`).
+We could do `hash(element) % len(ratio)`, but instead we'll sum all the ASCII 
characters of the JSON representation to make it deterministic.

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294240=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294240
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313634516
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
+
+To decide which partition to send each element, we'll have different buckets.
+For our case `[8, 2]` will have *8+2*=**10** buckets, where the first 8 
buckets represent the first partition and the last 2 buckets represent the 
second partition.
 
 Review comment:
   "For our case `[8, 2]` will have *8+2*=**10** buckets,"->"For our case, `[8, 
2]`  has **10** buckets,"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294241=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294241
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313631755
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
 
 Review comment:
   add comma 
   
   "`split_dataset`, which"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294241)
Time Spent: 46h 20m  (was: 46h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294234=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294234
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313632984
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
 
 Review comment:
   Add comma  "`[8, 2]`, which"
   
   "8 will go into the first partition and 2 will go into the second."->"8 go 
into the first partition and 2 go into the second."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294234)

> Colab 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294230=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294230
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313628514
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
 
 Review comment:
   "it is common to split it into"->"it is a common task to split data into"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294230)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 45h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294237=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294237
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313634347
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
+
+To decide which partition to send each element, we'll have different buckets.
 
 Review comment:
   "In order to determine which partition to send each element, we have 
different buckets."
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294237)

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294232=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294232
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313628632
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
 
 Review comment:
   remove ","
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294232)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 45h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294235=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294235
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313635349
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
+We define `split_dataset` which receives the element, the number of 
partitions, and an additional argument `ratio` that describes the ratio of the 
split.
+The `ratio` is a list of numbers which represents the ratio how many items 
will go into each partition.
+If we want an 80%/20% split, we can specify a ratio of `[8, 2]` which means 
that for every 10 elements, 8 will go into the first partition and 2 will go 
into the second.
+
+To decide which partition to send each element, we'll have different buckets.
+For our case `[8, 2]` will have *8+2*=**10** buckets, where the first 8 
buckets represent the first partition and the last 2 buckets represent the 
second partition.
+
+First, we need to make sure that the ratio's length corresponds to the 
`num_partitions` we pass.
 
 Review comment:
   "ratio's length"->"ratio list's length"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294229=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294229
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:17
Start Date: 13/Aug/19 22:17
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9261: [BEAM-7389] Add 
code examples for Partition page
URL: https://github.com/apache/beam/pull/9261#discussion_r313628836
 
 

 ##
 File path: 
website/src/documentation/transforms/python/element-wise/partition.md
 ##
 @@ -39,12 +46,130 @@ You cannot determine the number of partitions in 
mid-pipeline
 See more information in the [Beam Programming Guide]({{ site.baseurl 
}}/documentation/programming-guide/#partition).
 
 ## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
 
-## Related transforms 
-* [Filter]({{ site.baseurl 
}}/documentation/transforms/python/elementwise/filter) is useful if the 
function is just 
+In the following examples, we create a pipeline with a `PCollection` of 
produce with their icon, name, and duration.
+Then, we apply `Partition` in multiple ways to split the `PCollection` into 
multiple `PCollections`.
+
+`Partition` accepts a function that receives the number of partitions,
+and returns the index of the desired partition for the element.
+The number of partitions passed must be a positive integer,
+and it must return an integer in the range `0` to `num_partitions-1`.
+
+### Example 1: Partition with a function
+
+In the following example, we have a known list of durations.
+We partition the `PCollection` into one `PCollection` for every duration type.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_function %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 2: Partition with a lambda function
+
+We can also use lambda functions to simplify **Example 1**.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py
 tag:partition_lambda %}```
+
+Output `PCollection`s:
+
+```
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition_test.py
 tag:partitions %}```
+
+
+  
+https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/partition.py;>
+  https://www.tensorflow.org/images/GitHub-Mark-32px.png;
+width="20px" height="20px" alt="View on GitHub" />
+  View on GitHub
+
+  
+
+
+
+### Example 3: Partition with multiple arguments
+
+You can pass functions with multiple arguments to `Partition`.
+They are passed as additional positional arguments or keyword arguments to the 
function.
+
+In this example, `split_dataset` takes `plant`, `num_partitions`, and `ratio` 
as arguments.
+`num_partitions` is used by `Partitions` as a positional argument,
+while any other argument will be passed to `split_dataset`.
+
+In machine learning, it is common to split it into
+[training and a testing 
datasets](https://en.wikipedia.org/wiki/Training,_validation,_and_test_sets).
+Typically, 80% of the data is used for training a model, and 20% is used for 
testing.
+
+We will split a `PCollection` dataset into training and testing datasets.
 
 Review comment:
   "We will"->"In this example, we"
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294229)
Time Spent: 45h 40m  (was: 45.5h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 45h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7965) Add retracting mode to model proto

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7965?focusedWorklogId=294225=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294225
 ]

ASF GitHub Bot logged work on BEAM-7965:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:13
Start Date: 13/Aug/19 22:13
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9329: [BEAM-7965] add 
retracting mode to model proto
URL: https://github.com/apache/beam/pull/9329#issuecomment-521029244
 
 
   Thanks Luke.  I also realized that proto files in `model` are `proto3` 
already.
   
   Undo re-generated protos for Go. 
   
   Thanks Daniel for creating JIRA to track re-generate Go protos in correct 
version: https://jira.apache.org/jira/browse/BEAM-7970.
   
   I think we can separate this PR's change with re-generation of protos for 
Go, given the created JIRA to track proto re-generation effort. I will also 
wait for @lostluck to see if he is also ok on this separation.  
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294225)
Time Spent: 1h 10m  (was: 1h)

> Add retracting mode to model proto
> --
>
> Key: BEAM-7965
> URL: https://issues.apache.org/jira/browse/BEAM-7965
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7965) Add retracting mode to model proto

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7965?focusedWorklogId=294222=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294222
 ]

ASF GitHub Bot logged work on BEAM-7965:


Author: ASF GitHub Bot
Created on: 13/Aug/19 22:04
Start Date: 13/Aug/19 22:04
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9329: [BEAM-7965] add 
retracting mode to model proto
URL: https://github.com/apache/beam/pull/9329#issuecomment-521026973
 
 
   I'm pretty sure that Go is using proto 3 and not proto 2
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294222)
Time Spent: 1h  (was: 50m)

> Add retracting mode to model proto
> --
>
> Key: BEAM-7965
> URL: https://issues.apache.org/jira/browse/BEAM-7965
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-08-13 Thread Daniel Oliveira (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Oliveira updated BEAM-7970:
--
Status: Open  (was: Triage Needed)

> Regenerate Go SDK proto files in correct version
> 
>
> Key: BEAM-7970
> URL: https://issues.apache.org/jira/browse/BEAM-7970
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-go
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>
> Generated proto files in the Go SDK currently include this bit:
> {{// This is a compile-time assertion to ensure that this generated file}}
> {{// is compatible with the proto package it is being compiled against.}}
> {{// A compilation error at this line likely means your copy of the}}
> {{// proto package needs to be updated.}}
> {{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}
>  
> This indicates that the protos are being generated as proto v2 for whatever 
> reason. Most likely, as mentioned by this post with someone with a similar 
> issue, because the proto generation binary needs to be rebuilt before 
> generating the files again: 
> [https://github.com/golang/protobuf/issues/449#issuecomment-340884839]
> This hasn't caused any errors so far, but might eventually cause errors if we 
> hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7970) Regenerate Go SDK proto files in correct version

2019-08-13 Thread Daniel Oliveira (JIRA)
Daniel Oliveira created BEAM-7970:
-

 Summary: Regenerate Go SDK proto files in correct version
 Key: BEAM-7970
 URL: https://issues.apache.org/jira/browse/BEAM-7970
 Project: Beam
  Issue Type: Bug
  Components: sdk-go
Reporter: Daniel Oliveira
Assignee: Daniel Oliveira


Generated proto files in the Go SDK currently include this bit:

{{// This is a compile-time assertion to ensure that this generated file}}
{{// is compatible with the proto package it is being compiled against.}}
{{// A compilation error at this line likely means your copy of the}}
{{// proto package needs to be updated.}}
{{const _ = proto.ProtoPackageIsVersion2 // please upgrade the proto package}}

 

This indicates that the protos are being generated as proto v2 for whatever 
reason. Most likely, as mentioned by this post with someone with a similar 
issue, because the proto generation binary needs to be rebuilt before 
generating the files again: 
[https://github.com/golang/protobuf/issues/449#issuecomment-340884839]

This hasn't caused any errors so far, but might eventually cause errors if we 
hit version differences between the v2 and v3 protos.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7124) adding kafkaio test in Python validateCrossLanguageRunner

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7124?focusedWorklogId=294211=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294211
 ]

ASF GitHub Bot logged work on BEAM-7124:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:40
Start Date: 13/Aug/19 21:40
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #8397: [WIP:BEAM-7124] 
adding kafkaio test in python validateCrossLanguageRunner
URL: https://github.com/apache/beam/pull/8397#issuecomment-521019932
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294211)
Time Spent: 50m  (was: 40m)

> adding kafkaio test in Python validateCrossLanguageRunner
> -
>
> Key: BEAM-7124
> URL: https://issues.apache.org/jira/browse/BEAM-7124
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> adding low-level kafkaio test in Python validateCrossLanguageRunner



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3342) Create a Cloud Bigtable IO connector for Python

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3342?focusedWorklogId=294203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294203
 ]

ASF GitHub Bot logged work on BEAM-3342:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:29
Start Date: 13/Aug/19 21:29
Worklog Time Spent: 10m 
  Work Description: pabloem commented on issue #8457: [BEAM-3342] Create a 
Cloud Bigtable IO connector for Python
URL: https://github.com/apache/beam/pull/8457#issuecomment-521016687
 
 
   I was out a few days. Looking again today.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294203)
Time Spent: 35h 10m  (was: 35h)

> Create a Cloud Bigtable IO connector for Python
> ---
>
> Key: BEAM-3342
> URL: https://issues.apache.org/jira/browse/BEAM-3342
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Solomon Duskis
>Assignee: Solomon Duskis
>Priority: Major
>  Time Spent: 35h 10m
>  Remaining Estimate: 0h
>
> I would like to create a Cloud Bigtable python connector.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294192=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294192
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:14
Start Date: 13/Aug/19 21:14
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9289: [BEAM-7389] Add 
code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r313615763
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+## Example
+
+Any non-string element can be converted to a string using sandard Python 
functions and methods.
+Many I/O transforms such as `TextIO` expect their input elements to be strings.
+
+```py
+{% github_sample 
/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/transforms/element_wise/to_string.py
 tag:to_string %}```
+
+Output `PCollection` after *to string*:
 
 Review comment:
   That makes more sense, thanks!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294192)
Time Spent: 45h 20m  (was: 45h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 45h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=294188=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294188
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:13
Start Date: 13/Aug/19 21:13
Worklog Time Spent: 10m 
  Work Description: rosetn commented on pull request #9289: [BEAM-7389] Add 
code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r313615158
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+## Example
+
+Any non-string element can be converted to a string using sandard Python 
functions and methods.
 
 Review comment:
   SG for now, we'll also need to think about how this class of colabs would 
change when their corresponding python transforms are added. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294188)
Time Spent: 45h 10m  (was: 45h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 45h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7495) Add support for dynamic worker re-balancing when reading BigQuery data using Cloud Dataflow

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7495?focusedWorklogId=294187=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294187
 ]

ASF GitHub Bot logged work on BEAM-7495:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:11
Start Date: 13/Aug/19 21:11
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9156: [BEAM-7495] Add 
fine-grained progress reporting
URL: https://github.com/apache/beam/pull/9156#issuecomment-521010817
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294187)
Time Spent: 12h 40m  (was: 12.5h)
Remaining Estimate: 491h 20m  (was: 491.5h)

> Add support for dynamic worker re-balancing when reading BigQuery data using 
> Cloud Dataflow
> ---
>
> Key: BEAM-7495
> URL: https://issues.apache.org/jira/browse/BEAM-7495
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-gcp
>Reporter: Aryan Naraghi
>Assignee: Aryan Naraghi
>Priority: Major
>   Original Estimate: 504h
>  Time Spent: 12h 40m
>  Remaining Estimate: 491h 20m
>
> Currently, the BigQuery connector for reading data using the BigQuery Storage 
> API does not support any of the facilities on the source for Dataflow to 
> split streams.
>  
> On the server side, the BigQuery Storage API supports splitting streams at a 
> fraction. By adding support to the connector, we enable Dataflow to split 
> streams, which unlocks dynamic worker re-balancing.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7820) Add hot key detection to Dataflow Runner

2019-08-13 Thread Sam Rohde (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Rohde resolved BEAM-7820.
-
   Resolution: Fixed
Fix Version/s: 2.16.0
   2.15.0

> Add hot key detection to Dataflow Runner
> 
>
> Key: BEAM-7820
> URL: https://issues.apache.org/jira/browse/BEAM-7820
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Sam Rohde
>Priority: Minor
> Fix For: 2.15.0, 2.16.0
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This tracks adding hot key detection in the Dataflow Runner. 
> There are times when a user's pipeline spuriously slows down due to hot keys. 
> During these times, users are unable to see under the hood at what the 
> pipeline is doing. This adds hot key detection to show the user when their 
> pipeline has a hot key.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7667) report GCS throttling time to Dataflow autoscaler

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7667?focusedWorklogId=294184=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294184
 ]

ASF GitHub Bot logged work on BEAM-7667:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:05
Start Date: 13/Aug/19 21:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8973: [BEAM-7667] 
report GCS throttling time to Dataflow autoscaler
URL: https://github.com/apache/beam/pull/8973#issuecomment-521008896
 
 
   LGTM. Thanks.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294184)
Time Spent: 1h 50m  (was: 1h 40m)

> report GCS throttling time to Dataflow autoscaler
> -
>
> Key: BEAM-7667
> URL: https://issues.apache.org/jira/browse/BEAM-7667
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> report GCS throttling time to Dataflow autoscaler.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7667) report GCS throttling time to Dataflow autoscaler

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7667?focusedWorklogId=294185=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294185
 ]

ASF GitHub Bot logged work on BEAM-7667:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:05
Start Date: 13/Aug/19 21:05
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #8973: [BEAM-7667] 
report GCS throttling time to Dataflow autoscaler
URL: https://github.com/apache/beam/pull/8973#issuecomment-521008937
 
 
   Retest this please
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294185)
Time Spent: 2h  (was: 1h 50m)

> report GCS throttling time to Dataflow autoscaler
> -
>
> Key: BEAM-7667
> URL: https://issues.apache.org/jira/browse/BEAM-7667
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Heejong Lee
>Assignee: Heejong Lee
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> report GCS throttling time to Dataflow autoscaler.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294179=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294179
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 21:00
Start Date: 13/Aug/19 21:00
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #9330: [BEAM-7969] 
Report FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#discussion_r313610309
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -400,9 +400,13 @@ public int getSize() {
   private final StreamingDataflowWorkerOptions options;
   private final boolean windmillServiceEnabled;
   private final long clientId;
+
   private final MetricTrackingWindmillServerStub metricTrackingWindmillServer;
   private final CounterSet pendingDeltaCounters = new CounterSet();
   private final CounterSet pendingCumulativeCounters = new CounterSet();
+  private final java.util.concurrent.ConcurrentLinkedQueue 
pendingMonitoringInfos =
 
 Review comment:
   done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294179)
Time Spent: 1h 10m  (was: 1h)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=294175=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294175
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:40
Start Date: 13/Aug/19 20:40
Worklog Time Spent: 10m 
  Work Description: stale[bot] commented on issue #7949: [BEAM-3713] Add 
pytest testing infrastructure
URL: https://github.com/apache/beam/pull/7949#issuecomment-521000367
 
 
   This pull request has been marked as stale due to 60 days of inactivity. It 
will be closed in 1 week if no further activity occurs. If you think that’s 
incorrect or this pull request requires a review, please simply write any 
comment. If closed, you can revive the PR at any time and @mention a reviewer 
or discuss it on the d...@beam.apache.org list. Thank you for your 
contributions.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294175)
Time Spent: 3h 40m  (was: 3.5h)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7965) Add retracting mode to model proto

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7965?focusedWorklogId=294173=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294173
 ]

ASF GitHub Bot logged work on BEAM-7965:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:36
Start Date: 13/Aug/19 20:36
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9329: [BEAM-7965] add 
retracting mode to model proto
URL: https://github.com/apache/beam/pull/9329#issuecomment-520999183
 
 
   I think the failed go check is due to new generated proto for Go used 3.x 
protobuf while current Go SDK uses 2.x protobuf. 
   
   I will wait Bobert's feedback on what's the best way to go (upgrade or stay 
with 2.x).  
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294173)
Time Spent: 50m  (was: 40m)

> Add retracting mode to model proto
> --
>
> Key: BEAM-7965
> URL: https://issues.apache.org/jira/browse/BEAM-7965
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294170=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294170
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:32
Start Date: 13/Aug/19 20:32
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on pull request #9330: [BEAM-7969] 
Report FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#discussion_r313599006
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -1867,6 +1873,28 @@ private void sendWorkerUpdatesToDataflowService(
   cumulativeCounters.extractUpdates(false, 
DataflowCounterUpdateExtractor.INSTANCE));
   counterUpdates.addAll(
   
deltaCounters.extractModifiedDeltaUpdates(DataflowCounterUpdateExtractor.INSTANCE));
+  if (hasExperiment(options, "beam_fn_api")) {
+while (!this.pendingMonitoringInfos.isEmpty()) {
+  final CounterUpdate item = this.pendingMonitoringInfos.poll();
 
 Review comment:
   Non-cumulative means delta.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294170)
Time Spent: 1h  (was: 50m)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294167=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294167
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:25
Start Date: 13/Aug/19 20:25
Worklog Time Spent: 10m 
  Work Description: edre commented on pull request #9330: [BEAM-7969] 
Report FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#discussion_r313595896
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -1867,6 +1873,28 @@ private void sendWorkerUpdatesToDataflowService(
   cumulativeCounters.extractUpdates(false, 
DataflowCounterUpdateExtractor.INSTANCE));
   counterUpdates.addAll(
   
deltaCounters.extractModifiedDeltaUpdates(DataflowCounterUpdateExtractor.INSTANCE));
+  if (hasExperiment(options, "beam_fn_api")) {
+while (!this.pendingMonitoringInfos.isEmpty()) {
+  final CounterUpdate item = this.pendingMonitoringInfos.poll();
 
 Review comment:
   It's ok to just report the full value as the delta, but only as long as we 
only report counters when finishing the work item.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294167)
Time Spent: 50m  (was: 40m)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7856) BigQuery table creation race condition error when executing pipeline on multiple workers

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7856?focusedWorklogId=294164=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294164
 ]

ASF GitHub Bot logged work on BEAM-7856:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:24
Start Date: 13/Aug/19 20:24
Worklog Time Spent: 10m 
  Work Description: chamikaramj commented on issue #9204: [BEAM-7856] 
Suppress error on table bigquery table already exists
URL: https://github.com/apache/beam/pull/9204#issuecomment-520994785
 
 
   LGTM for getting this in as a short term fix. But prob. file a JIRA for a 
proper fix.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294164)
Time Spent: 1h  (was: 50m)

> BigQuery table creation race condition error when executing pipeline on 
> multiple workers
> 
>
> Key: BEAM-7856
> URL: https://issues.apache.org/jira/browse/BEAM-7856
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> This is non-fatal issue and just prints error in the logs as far as I can 
> tell.
> The issue is when we check and create big query table on multiple workers at 
> the same time. This causes the race condition.
>  
> {noformat}
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute response = task() File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in  self._execute(lambda: worker.do_instruction(work), 
> work) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction request.instruction_id) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle bundle_processor.process_bundle(instruction_id)) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 593, in process_bundle data.ptransform_id].process_encoded(data.data) 
> File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded self.output(decoded_value) File 
> "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output def output(self, 
> windowed_value, output_index=0): File 
> "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive 
> self.consumer.process(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process with 
> self.scoped_process_state: File "apache_beam/runners/worker/operations.py", 
> line 594, in apache_beam.runners.worker.operations.DoOperation.process 
> delayed_application = self.dofn_receiver.receive(o) File 
> "apache_beam/runners/common.py", line 799, in 
> apache_beam.runners.common.DoFnRunner.receive self.process(windowed_value) 
> File "apache_beam/runners/common.py", line 805, in 
> apache_beam.runners.common.DoFnRunner.process self._reraise_augmented(exn) 
> File "apache_beam/runners/common.py", line 857, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented raise File 
> "apache_beam/runners/common.py", line 803, in 
> apache_beam.runners.common.DoFnRunner.process return 
> self.do_fn_invoker.invoke_process(windowed_value) File 
> "apache_beam/runners/common.py", line 610, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process 
> self._invoke_process_per_window( File "apache_beam/runners/common.py", line 
> 682, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window 
> output_processor.process_outputs( File "apache_beam/runners/common.py", line 
> 903, in apache_beam.runners.common._OutputProcessor.process_outputs def 
> process_outputs(self, windowed_input_element, results): File 
> "apache_beam/runners/common.py", line 942, in 
> apache_beam.runners.common._OutputProcessor.process_outputs 
> self.main_receivers.receive(windowed_value) File 
> "apache_beam/runners/worker/operations.py", line 

[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294165=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294165
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:24
Start Date: 13/Aug/19 20:24
Worklog Time Spent: 10m 
  Work Description: ajamato commented on pull request #9330: [BEAM-7969] 
Report FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#discussion_r313595441
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -400,9 +400,13 @@ public int getSize() {
   private final StreamingDataflowWorkerOptions options;
   private final boolean windmillServiceEnabled;
   private final long clientId;
+
   private final MetricTrackingWindmillServerStub metricTrackingWindmillServer;
   private final CounterSet pendingDeltaCounters = new CounterSet();
   private final CounterSet pendingCumulativeCounters = new CounterSet();
+  private final java.util.concurrent.ConcurrentLinkedQueue 
pendingMonitoringInfos =
 
 Review comment:
   can you please add some comments about how pendingMonitoringInfos is 
concurrently read/written to? i.e. Which threads are running concurrently and 
using this
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294165)
Time Spent: 40m  (was: 0.5h)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294163=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294163
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:23
Start Date: 13/Aug/19 20:23
Worklog Time Spent: 10m 
  Work Description: ajamato commented on pull request #9330: [BEAM-7969] 
Report FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#discussion_r313595026
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -1867,6 +1873,28 @@ private void sendWorkerUpdatesToDataflowService(
   cumulativeCounters.extractUpdates(false, 
DataflowCounterUpdateExtractor.INSTANCE));
   counterUpdates.addAll(
   
deltaCounters.extractModifiedDeltaUpdates(DataflowCounterUpdateExtractor.INSTANCE));
+  if (hasExperiment(options, "beam_fn_api")) {
+while (!this.pendingMonitoringInfos.isEmpty()) {
+  final CounterUpdate item = this.pendingMonitoringInfos.poll();
+
+  // This change will treat counter as delta.
+  // This is required because we receive cumulative results from FnAPI 
harness,
+  // while streaming job is expected to receive delta updates to 
counters on same
+  // WorkItem.
+  if (item.getCumulative()) {
+item.setCumulative(false);
+  } else {
+// In current world all counters coming from FnAPI are cumulative.
+// This is a safety check in case new counter type appears in 
FnAPI.
+throw new UnsupportedOperationException(
+"FnApi counters are expected to provide cumulative values."
++ " Please, update convertion to delta logic"
++ " if non-cumulative counter type is required.");
+  }
+
 
 Review comment:
   This method essentially has 3 inputs correct? pendingMonitoringInfos, 
deltaCounters and cumulativeCounters.
   And one output counterUpdates
   
   Can you write some unit tests which verifies that counterUpdates is 
populated properly?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294163)
Time Spent: 0.5h  (was: 20m)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294162=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294162
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:22
Start Date: 13/Aug/19 20:22
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313594719
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroUtils.java
 ##
 @@ -1,40 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io;
-
-import java.io.Serializable;
-import org.apache.avro.Schema;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers;
-
-/** Helpers for working with Avro. */
-class AvroUtils {
 
 Review comment:
   I asked on dev, but probably we already broke it a couple of times
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294162)
Time Spent: 3h 50m  (was: 3h 40m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294161=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294161
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:21
Start Date: 13/Aug/19 20:21
Worklog Time Spent: 10m 
  Work Description: ajamato commented on pull request #9330: [BEAM-7969] 
Report FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#discussion_r313594324
 
 

 ##
 File path: 
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorker.java
 ##
 @@ -1867,6 +1873,28 @@ private void sendWorkerUpdatesToDataflowService(
   cumulativeCounters.extractUpdates(false, 
DataflowCounterUpdateExtractor.INSTANCE));
   counterUpdates.addAll(
   
deltaCounters.extractModifiedDeltaUpdates(DataflowCounterUpdateExtractor.INSTANCE));
+  if (hasExperiment(options, "beam_fn_api")) {
+while (!this.pendingMonitoringInfos.isEmpty()) {
+  final CounterUpdate item = this.pendingMonitoringInfos.poll();
 
 Review comment:
   The SDK harness will give you the full value of the counter on each 
MonitoringInfo. Don't you need to convert it to a delta, since the last 
reported value? Or does setting cumulative have DFE solve this internally?
   
   You might want to confirm with @edre.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294161)
Time Spent: 20m  (was: 10m)

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7816) Support Avro dates in Schemas

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7816?focusedWorklogId=294160=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294160
 ]

ASF GitHub Bot logged work on BEAM-7816:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:21
Start Date: 13/Aug/19 20:21
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9152: [BEAM-7816] 
[BEAM-7817] Support Avro dates and enums in Schemas
URL: https://github.com/apache/beam/pull/9152
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294160)
Time Spent: 1h  (was: 50m)

> Support Avro dates in Schemas
> -
>
> Key: BEAM-7816
> URL: https://issues.apache.org/jira/browse/BEAM-7816
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Schema coders for "logicalType=date" don't work. Avro dates should become 
> Schema DATETIME, before we have a logical type for Date in java-sdks-core.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294159=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294159
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:20
Start Date: 13/Aug/19 20:20
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313593893
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -127,7 +127,7 @@ public static FixedBytesField withSize(int size) {
 
 /** Create a {@link FixedBytesField} from a Beam {@link FieldType}. */
 @Nullable
-public static FixedBytesField fromBeamFieldType(FieldType fieldType) {
+static FixedBytesField fromBeamFieldType(FieldType fieldType) {
 
 Review comment:
   Yes you are right but that class does a lot other magic too. We should 
probably decide on that later on, but I think is important because these are 
user friendly fixes and the current uber class hides multiple things.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294159)
Time Spent: 3h 40m  (was: 3.5h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7965) Add retracting mode to model proto

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7965?focusedWorklogId=294153=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294153
 ]

ASF GitHub Bot logged work on BEAM-7965:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:15
Start Date: 13/Aug/19 20:15
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9329: [BEAM-7965] add 
retracting mode to model proto
URL: https://github.com/apache/beam/pull/9329#issuecomment-520991543
 
 
   Thanks Luke for your pointer! 
   
   also R: @lostluck as this change affects Go SDK
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294153)
Time Spent: 40m  (was: 0.5h)

> Add retracting mode to model proto
> --
>
> Key: BEAM-7965
> URL: https://issues.apache.org/jira/browse/BEAM-7965
> Project: Beam
>  Issue Type: Sub-task
>  Components: beam-model
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7969?focusedWorklogId=294149=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294149
 ]

ASF GitHub Bot logged work on BEAM-7969:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:08
Start Date: 13/Aug/19 20:08
Worklog Time Spent: 10m 
  Work Description: Ardagan commented on issue #9330: [BEAM-7969] Report 
FnAPI counters as deltas in streaming jobs.
URL: https://github.com/apache/beam/pull/9330#issuecomment-520989316
 
 
   @rohdesamuel 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294149)
Time Spent: 10m
Remaining Estimate: 0h

> Streaming Dataflow worker doesn't report FnAPI metrics.
> ---
>
> Key: BEAM-7969
> URL: https://issues.apache.org/jira/browse/BEAM-7969
> Project: Beam
>  Issue Type: Bug
>  Components: java-fn-execution, runner-dataflow
>Reporter: Mikhail Gryzykhin
>Assignee: Mikhail Gryzykhin
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294144=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294144
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:04
Start Date: 13/Aug/19 20:04
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313587439
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroUtils.java
 ##
 @@ -1,40 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io;
-
-import java.io.Serializable;
-import org.apache.avro.Schema;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers;
-
-/** Helpers for working with Avro. */
-class AvroUtils {
 
 Review comment:
   Actually, I don't know how it works now, but it uses classes from vendored 
guava, that changes namespace each time we change guava version. I'm wondering 
if we already broke it without noticing.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294144)
Time Spent: 3.5h  (was: 3h 20m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294142
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:03
Start Date: 13/Aug/19 20:03
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313587102
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroUtils.java
 ##
 @@ -1,40 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io;
-
-import java.io.Serializable;
-import org.apache.avro.Schema;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers;
-
-/** Helpers for working with Avro. */
-class AvroUtils {
 
 Review comment:
   But we can't change it, or it will break Java serialization.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294142)
Time Spent: 3h 20m  (was: 3h 10m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294141=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294141
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:03
Start Date: 13/Aug/19 20:03
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313586988
 
 

 ##
 File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroUtils.java
 ##
 @@ -1,40 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.beam.sdk.io;
-
-import java.io.Serializable;
-import org.apache.avro.Schema;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Function;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Supplier;
-import org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Suppliers;
-
-/** Helpers for working with Avro. */
-class AvroUtils {
 
 Review comment:
   `AvroCoder.SerializableSchemaSupplier`
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294141)
Time Spent: 3h 10m  (was: 3h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294140=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294140
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 20:01
Start Date: 13/Aug/19 20:01
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313586430
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -127,7 +127,7 @@ public static FixedBytesField withSize(int size) {
 
 /** Create a {@link FixedBytesField} from a Beam {@link FieldType}. */
 @Nullable
-public static FixedBytesField fromBeamFieldType(FieldType fieldType) {
+static FixedBytesField fromBeamFieldType(FieldType fieldType) {
 
 Review comment:
   If you ask me, I would rather make the whole class protected (or delete it), 
hiding everything, and all user-facing functionality will move to AvroCoder
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294140)
Time Spent: 3h  (was: 2h 50m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (BEAM-7969) Streaming Dataflow worker doesn't report FnAPI metrics.

2019-08-13 Thread Mikhail Gryzykhin (JIRA)
Mikhail Gryzykhin created BEAM-7969:
---

 Summary: Streaming Dataflow worker doesn't report FnAPI metrics.
 Key: BEAM-7969
 URL: https://issues.apache.org/jira/browse/BEAM-7969
 Project: Beam
  Issue Type: Bug
  Components: java-fn-execution, runner-dataflow
Reporter: Mikhail Gryzykhin
Assignee: Mikhail Gryzykhin


EOM



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294137
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:59
Start Date: 13/Aug/19 19:59
Worklog Time Spent: 10m 
  Work Description: iemejia commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313585482
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -127,7 +127,7 @@ public static FixedBytesField withSize(int size) {
 
 /** Create a {@link FixedBytesField} from a Beam {@link FieldType}. */
 @Nullable
-public static FixedBytesField fromBeamFieldType(FieldType fieldType) {
+static FixedBytesField fromBeamFieldType(FieldType fieldType) {
 
 Review comment:
   I am going to revert the access commit then, but I still don't see the 
point. The complete `AvroUtils` class is `@Experimental` which means we are in 
the perfect moment to do refine those aspects. We will have a harder time to do 
so in the future but well I suppose that's not a big deal.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294137)
Time Spent: 2h 50m  (was: 2h 40m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294136
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:58
Start Date: 13/Aug/19 19:58
Worklog Time Spent: 10m 
  Work Description: lukecwik commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-520985656
 
 
   > > We'll need to break this PR up into two pieces eventually:
   > > 
   > > 1. A piece which just vendors calcite (so that it can go through the 
release process)
   > > 2. The change which consumes vendored calcite.
   > 
   > Yes. If `vendor/calcite-1_20_0/build.gradle`looks good, I'll open a 
self-contained pr for build vendored calcite package and we can go through the 
release process.
   
   Yes, the build.gradle looks fine but I didn't validate where the vendored 
calcite artifact actually works. I'm relying on your proof of concept to 
validate the contents are good and meet the Beam SQL needs. If it doesn't work, 
you can always produce another update version and release 0.2.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294136)
Time Spent: 3h 20m  (was: 3h 10m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294130=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294130
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:52
Start Date: 13/Aug/19 19:52
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on pull request #9189: [BEAM-5820] 
vendor calcite
URL: https://github.com/apache/beam/pull/9189#discussion_r313582544
 
 

 ##
 File path: sdks/java/extensions/sql/jdbc/build.gradle
 ##
 @@ -53,32 +50,13 @@ processResources {
   ]
 }
 
-shadowJar {
-  manifest {
-attributes "Main-Class": 
"org.apache.beam.sdk.extensions.sql.jdbc.BeamSqlLine"
 
 Review comment:
   cc: @akedin 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294130)
Time Spent: 3h 10m  (was: 3h)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-5820) Vendor Calcite

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-5820?focusedWorklogId=294129=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294129
 ]

ASF GitHub Bot logged work on BEAM-5820:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:51
Start Date: 13/Aug/19 19:51
Worklog Time Spent: 10m 
  Work Description: vectorijk commented on issue #9189: [BEAM-5820] vendor 
calcite
URL: https://github.com/apache/beam/pull/9189#issuecomment-520983060
 
 
   > We'll need to break this PR up into two pieces eventually:
   > 
   > 1. A piece which just vendors calcite (so that it can go through the 
release process)
   > 2. The change which consumes vendored calcite.
   
   Yes. If `vendor/calcite-1_20_0/build.gradle`looks good, I'll open a 
self-contained pr for build vendored calcite package and we can go through the 
release process.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294129)
Time Spent: 3h  (was: 2h 50m)

> Vendor Calcite
> --
>
> Key: BEAM-5820
> URL: https://issues.apache.org/jira/browse/BEAM-5820
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql
>Reporter: Kenneth Knowles
>Assignee: Kai Jiang
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-08-13 Thread Lukasz Gajowy (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906565#comment-16906565
 ] 

Lukasz Gajowy commented on BEAM-6923:
-

It was a local setup - I've only run ./start-cluster.sh from a freshly 
downloaded Flink (no custom setup). FWIW, I've reproduced the error on flink 
1.5.2 and 1.7.0.

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, heapdump size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294121=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294121
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:40
Start Date: 13/Aug/19 19:40
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #9130: [BEAM-7802] Expose a 
method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#issuecomment-520978853
 
 
   Changing AvroCoder will definitely break compatibility, especially streaming 
pipelines reading from PubSub or Kafka. In addition, SchemaCoder for Avro isn't 
as good (yet) as AvroCoder. As an example, it would serialize enums as strings, 
that is very inefficient when shuffling data. Another source of problems is 
that it doesn't support all Avro features. I believe once it matures we it 
could be the default, but we aren't there. In any case, I think it's a good 
exercise to think where we want to put SchemaCoder and how we are going to 
evolve AvroCoder, so, probably we should start a threat on dev@.
   
   The code looks good. I agree and support your motivation on making fewer 
things private, but I don't find it practical to break it now given that we 
know for sure that there are codebases relying on it being public to avoid 
limitations of existing APIs, so I propose to postpone this before things 
stabilize.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294121)
Time Spent: 2h 40m  (was: 2.5h)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6923) OOM errors in jobServer when using GCS artifactDir

2019-08-13 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906557#comment-16906557
 ] 

Ankur Goenka commented on BEAM-6923:


Hi [~ŁukaszG], I have not looked into this yet. I will try to take a look at it 
this week.

I am mostly likely going to run the pipeline.

Just to check, do you use local machine or do you use a cluster to run the test?

> OOM errors in jobServer when using GCS artifactDir
> --
>
> Key: BEAM-6923
> URL: https://issues.apache.org/jira/browse/BEAM-6923
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-java-harness
>Reporter: Lukasz Gajowy
>Priority: Major
> Attachments: Instance counts.png, Paths to GC root.png, 
> Telemetries.png, heapdump size-sorted.png
>
>
> When starting jobServer with artifactDir pointing to a GCS bucket: 
> {code:java}
> ./gradlew :beam-runners-flink_2.11-job-server:runShadow 
> -PflinkMasterUrl=localhost:8081 -PartifactsDir=gs://the-bucket{code}
> and running a Java portable pipeline with the following, portability related 
> pipeline options: 
> {code:java}
> --runner=PortableRunner --jobEndpoint=localhost:8099 
> --defaultEnvironmentType=DOCKER 
> --defaultEnvironmentConfig=gcr.io//java:latest'{code}
>  
> I'm facing a series of OOM errors, like this: 
> {code:java}
> Exception in thread "grpc-default-executor-3" java.lang.OutOfMemoryError: 
> Java heap space
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.buildContentChunk(MediaHttpUploader.java:606)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.resumableUpload(MediaHttpUploader.java:408)
> at 
> com.google.api.client.googleapis.media.MediaHttpUploader.upload(MediaHttpUploader.java:336)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:508)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:432)
> at 
> com.google.api.client.googleapis.services.AbstractGoogleClientRequest.execute(AbstractGoogleClientRequest.java:549)
> at 
> com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel$UploadOperation.call(AbstractGoogleAsyncWriteChannel.java:301)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
>  
> This does not happen when I'm using a local filesystem for the artifact 
> staging location. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (BEAM-6777) SDK Harness Resilience

2019-08-13 Thread Oded Valtzer (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906554#comment-16906554
 ] 

Oded Valtzer commented on BEAM-6777:


I am referring to a python streaming pipeline using the latest sdk (happened 
also in previous versions) on google dataflow.
i can provide links to pipelines in our project which reached into this state 
and eventually stopped doing any work and we cancelled them. does it what you 
look for?

> SDK Harness Resilience
> --
>
> Key: BEAM-6777
> URL: https://issues.apache.org/jira/browse/BEAM-6777
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-dataflow
>Reporter: Sam Rohde
>Assignee: Yueyang Qiu
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> If the Python SDK Harness crashes in any way (user code exception, OOM, etc) 
> the job will hang and waste resources. The fix is to add a daemon in the SDK 
> Harness and Runner Harness to communicate with Dataflow to restart the VM 
> when stuckness is detected.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7802) Expose a method to make an Avro-based PCollection into an Schema-based one

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7802?focusedWorklogId=294114=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294114
 ]

ASF GitHub Bot logged work on BEAM-7802:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:24
Start Date: 13/Aug/19 19:24
Worklog Time Spent: 10m 
  Work Description: kanterov commented on pull request #9130: [BEAM-7802] 
Expose a method to make an Avro-based PCollection into an Schema-based one
URL: https://github.com/apache/beam/pull/9130#discussion_r313571429
 
 

 ##
 File path: 
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
 ##
 @@ -127,7 +127,7 @@ public static FixedBytesField withSize(int size) {
 
 /** Create a {@link FixedBytesField} from a Beam {@link FieldType}. */
 @Nullable
-public static FixedBytesField fromBeamFieldType(FieldType fieldType) {
+static FixedBytesField fromBeamFieldType(FieldType fieldType) {
 
 Review comment:
   I agree, but at this stage, it doesn't seem practical for me, because by 
reducing access we are going to break existing code and make it problematic to 
upgrade it to the next Beam version. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294114)
Time Spent: 2.5h  (was: 2h 20m)

> Expose a method to make an Avro-based PCollection into an Schema-based one
> --
>
> Key: BEAM-7802
> URL: https://issues.apache.org/jira/browse/BEAM-7802
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Minor
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Avro can infer the Schema for an Avro based PCollection by using the 
> `withBeamSchemas` method, however if the user created a PCollection with Avro 
> objects or IndexedRecord/GenericRecord, he needs to manually set the schema 
> (or coder). The idea is to expose a method in schema.AvroUtils to ease this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Work logged] (BEAM-7816) Support Avro dates in Schemas

2019-08-13 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7816?focusedWorklogId=294109=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-294109
 ]

ASF GitHub Bot logged work on BEAM-7816:


Author: ASF GitHub Bot
Created on: 13/Aug/19 19:10
Start Date: 13/Aug/19 19:10
Worklog Time Spent: 10m 
  Work Description: kanterov commented on issue #9152: [BEAM-7816] 
[BEAM-7817] Support Avro dates and enums in Schemas
URL: https://github.com/apache/beam/pull/9152#issuecomment-520968270
 
 
   @RyanSkraba Thanks! `time-millis` isn't supported because, I didn't find any 
use-case for it in our codebase (or any other) and was lazy coding it, but it 
could be built in a separate PR if needed. You are right regarding 
`timestamp-micros`, it isn't possible to map it on `DATETIME` without losing 
precision, but it should be possible once we change the representation of 
`DATETIME`.
   
   @reuvenlax I'm going to continue and merge this PR given that it got a 
review. Let me know if you have questions or suggestions.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 294109)
Time Spent: 50m  (was: 40m)

> Support Avro dates in Schemas
> -
>
> Key: BEAM-7816
> URL: https://issues.apache.org/jira/browse/BEAM-7816
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Schema coders for "logicalType=date" don't work. Avro dates should become 
> Schema DATETIME, before we have a logical type for Date in java-sdks-core.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (BEAM-7546) Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.

2019-08-13 Thread Ankur Goenka (JIRA)


 [ 
https://issues.apache.org/jira/browse/BEAM-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur Goenka resolved BEAM-7546.

   Resolution: Fixed
Fix Version/s: 2.15.0

> Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.
> 
>
> Key: BEAM-7546
> URL: https://issues.apache.org/jira/browse/BEAM-7546
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
> Fix For: 2.15.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On a few occasions I see this test fail due to a temp directory being missing.
> Sample scan from 
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/745/: 
> https://scans.gradle.com/s/3ra4xw4hqvlyw/console-log?task=:sdks:python:portableWordCountBatch
> {noformat}
> [grpc-default-executor-0] ERROR sdk_worker._execute - Error processing 
> instruction 8. Original traceback is
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 589, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive
> self.consumer.process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 594, in 
> apache_beam.runners.worker.operations.DoOperation.process
> delayed_application = self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRun
> ner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
> return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 594, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
> self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 666, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
> windowed_value, self.process_method(*args_for_process))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", 
> line 1041, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 185, in open_writer
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 389, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/io/textio.py", 
> line 391, in open
> file_handle = super(_TextSink, self).open(temp_path)
>   File 
> 

[jira] [Commented] (BEAM-7546) Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.

2019-08-13 Thread Ankur Goenka (JIRA)


[ 
https://issues.apache.org/jira/browse/BEAM-7546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906531#comment-16906531
 ] 

Ankur Goenka commented on BEAM-7546:


Shall we close this issue as its not happening any more?

> Portable WordCount-on-Flink Precommit is flaky - temporary folder not found.
> 
>
> Key: BEAM-7546
> URL: https://issues.apache.org/jira/browse/BEAM-7546
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink
>Reporter: Valentyn Tymofieiev
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> On a few occasions I see this test fail due to a temp directory being missing.
> Sample scan from 
> https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/745/: 
> https://scans.gradle.com/s/3ra4xw4hqvlyw/console-log?task=:sdks:python:portableWordCountBatch
> {noformat}
> [grpc-default-executor-0] ERROR sdk_worker._execute - Error processing 
> instruction 8. Original traceback is
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 157, in _execute
> response = task()
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 190, in 
> self._execute(lambda: worker.do_instruction(work), work)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 342, in do_instruction
> request.instruction_id)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 368, in process_bundle
> bundle_processor.process_bundle(instruction_id))
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 589, in process_bundle
> ].process_encoded(data.data)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 143, in process_encoded
> self.output(decoded_value)
>   File "apache_beam/runners/worker/operations.py", line 255, in 
> apache_beam.runners.worker.operations.Operation.output
> def output(self, windowed_value, output_index=0):
>   File "apache_beam/runners/worker/operations.py", line 256, in 
> apache_beam.runners.worker.operations.Operation.output
> cython.cast(Receiver, 
> self.receivers[output_index]).receive(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 143, in 
> apache_beam.runners.worker.operations.SingletonConsumerSet.receive
> self.consumer.process(windowed_value)
>   File "apache_beam/runners/worker/operations.py", line 593, in 
> apache_beam.runners.worker.operations.DoOperation.process
> with self.scoped_process_state:
>   File "apache_beam/runners/worker/operations.py", line 594, in 
> apache_beam.runners.worker.operations.DoOperation.process
> delayed_application = self.dofn_receiver.receive(o)
>   File "apache_beam/runners/common.py", line 778, in 
> apache_beam.runners.common.DoFnRun
> ner.receive
> self.process(windowed_value)
>   File "apache_beam/runners/common.py", line 784, in 
> apache_beam.runners.common.DoFnRunner.process
> self._reraise_augmented(exn)
>   File "apache_beam/runners/common.py", line 851, in 
> apache_beam.runners.common.DoFnRunner._reraise_augmented
> raise_with_traceback(new_exn)
>   File "apache_beam/runners/common.py", line 782, in 
> apache_beam.runners.common.DoFnRunner.process
> return self.do_fn_invoker.invoke_process(windowed_value)
>   File "apache_beam/runners/common.py", line 594, in 
> apache_beam.runners.common.PerWindowInvoker.invoke_process
> self._invoke_process_per_window(
>   File "apache_beam/runners/common.py", line 666, in 
> apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
> windowed_value, self.process_method(*args_for_process))
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", 
> line 1041, in process
> self.writer = self.sink.open_writer(init_result, str(uuid.uuid4()))
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/options/value_provider.py",
>  line 137, in _f
> return fnc(self, *args, **kwargs)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 185, in open_writer
> return FileBasedSinkWriter(self, os.path.join(init_result, uid) + suffix)
>   File 
> "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", 
> line 389, in __init__
> self.temp_handle = self.sink.open(temp_shard_path)
>   File "/usr/local/lib/python2.7/site-packages/apache_beam/io/textio.py", 
> line 391, in open
> file_handle = super(_TextSink, self).open(temp_path)
>   File 
> 

  1   2   3   >