[jira] [Created] (BEAM-8098) BigQueryIO needs documentation on how data types in BigQuery and in Beam SDK correspond

2019-08-26 Thread Yueyang Qiu (Jira)
Yueyang Qiu created BEAM-8098:
-

 Summary: BigQueryIO needs documentation on how data types in 
BigQuery and in Beam SDK correspond
 Key: BEAM-8098
 URL: https://issues.apache.org/jira/browse/BEAM-8098
 Project: Beam
  Issue Type: Improvement
  Components: io-java-gcp
Reporter: Yueyang Qiu
Assignee: Yueyang Qiu


While working on [https://github.com/apache/beam/pull/9144], I realized there 
is a gap in BigQueryIO documentation on mapping between data types defined in 
BigQuery and in Beam SDK.

 

For example, if a user reads a BYTES field from BigQuery into Beam, it will be 
represented as java.nio.ByteBuffer type in Beam Java SDK. The user will need to 
do an explicit type cast to ByteBuffer in order to use the data, but there is 
no easy way the user can know which type they should cast to, unless digging 
into BigQueryIO's implementation (Java - Avro - BigQuery).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8097) Update the release guide

2019-08-26 Thread yifan zou (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yifan zou updated BEAM-8097:

Description: The release guide was modified based on the 2.14 release 
experience ([https://github.com/apache/beam/pull/9319]). But, it is reverted 
since we don't want separate the guide in multiple sections 
([https://github.com/apache/beam/pull/9436]). Please review the reverted guide 
and update the current guide with the up-to-date information.  (was: The 
release guide was modified based on the 2.14 release experience. But, it is 
reverted since we don't want separate the guide in multiple sections. Please 
review the reverted guide and update the current guide with the up-to-date 
information.)

> Update the release guide
> 
>
> Key: BEAM-8097
> URL: https://issues.apache.org/jira/browse/BEAM-8097
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: yifan zou
>Assignee: yifan zou
>Priority: Major
>
> The release guide was modified based on the 2.14 release experience 
> ([https://github.com/apache/beam/pull/9319]). But, it is reverted since we 
> don't want separate the guide in multiple sections 
> ([https://github.com/apache/beam/pull/9436]). Please review the reverted 
> guide and update the current guide with the up-to-date information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8097) Update the release guide

2019-08-26 Thread yifan zou (Jira)
yifan zou created BEAM-8097:
---

 Summary: Update the release guide
 Key: BEAM-8097
 URL: https://issues.apache.org/jira/browse/BEAM-8097
 Project: Beam
  Issue Type: Improvement
  Components: website
Reporter: yifan zou
Assignee: yifan zou


The release guide was modified based on the 2.14 release experience. But, it is 
reverted since we don't want separate the guide in multiple sections. Please 
review the reverted guide and update the current guide with the up-to-date 
information.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7966) Write portable Flink application jar

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301702
 ]

ASF GitHub Bot logged work on BEAM-7966:


Author: ASF GitHub Bot
Created on: 27/Aug/19 04:50
Start Date: 27/Aug/19 04:50
Worklog Time Spent: 10m 
  Work Description: tweise commented on issue #9331: [BEAM-7966] Write 
portable Flink application jar
URL: https://github.com/apache/beam/pull/9331#issuecomment-525135984
 
 
   Thanks for working on this! I think this may work very well with the k8s 
deployment plans.
   
   Related to prior discussion on the list: The jar file in its current form 
materializes the pipeline configuration, i.e. any user option that influences 
how transforms are configured or even the changes the shape of the pipeline. 
While the latter is probably a smaller percentage of use cases, the former 
isn't uncommon to adopt to different environments (dev/staging/production). For 
example, the database connection URL may be a parameter sourced from an 
environment variable or provided from the command line.
   
   As it stands, to have an environment specific configuration, the user would 
have to build the jar file for each target environment, which isn't intuitive.
   
   The entry point to Flink jobs is always configured with optional arguments. 
Would it be possible to support a token substitution mechanism so that such 
arguments provided to the Flink CLI or the k8s operator or whatever the tool 
may be can be injected into the proto?
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301702)
Time Spent: 0.5h  (was: 20m)

> Write portable Flink application jar
> 
>
> Key: BEAM-7966
> URL: https://issues.apache.org/jira/browse/BEAM-7966
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8096) Allow runner to configure "subnetwork"

2019-08-26 Thread Jack Whelpton (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916307#comment-16916307
 ] 

Jack Whelpton commented on BEAM-8096:
-

I suspect the change may be as simple as this:

[https://github.com/jackwhelpton/beam/commit/d1711f43e685c0f41b366437eb28cf5a25c436f0]

would this suffice for a PR? There don't appear to be any obvious tests around 
this area that I could build on, but if anybody can point me in the right 
direction that'd be great.

> Allow runner to configure "subnetwork"
> --
>
> Key: BEAM-8096
> URL: https://issues.apache.org/jira/browse/BEAM-8096
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-go
>Affects Versions: 2.15.0
>Reporter: Jack Whelpton
>Priority: Major
>
> When running a Dataflow job, the network can be specified using the --network 
> flag; however, there is no support for doing the same for the subnetwork. 
> This would be the go equivalent of the following Java code:
> [https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineWorkerPoolOptions.html#getSubnetwork--|https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L151]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8096) Allow runner to configure "subnetwork"

2019-08-26 Thread Jack Whelpton (Jira)
Jack Whelpton created BEAM-8096:
---

 Summary: Allow runner to configure "subnetwork"
 Key: BEAM-8096
 URL: https://issues.apache.org/jira/browse/BEAM-8096
 Project: Beam
  Issue Type: Improvement
  Components: sdk-go
Affects Versions: 2.15.0
Reporter: Jack Whelpton


When running a Dataflow job, the network can be specified using the --network 
flag; however, there is no support for doing the same for the subnetwork. This 
would be the go equivalent of the following Java code:

[https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineWorkerPoolOptions.html#getSubnetwork--|https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L151]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7021) ToString transform for Python SDK

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7021?focusedWorklogId=301654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301654
 ]

ASF GitHub Bot logged work on BEAM-7021:


Author: ASF GitHub Bot
Created on: 27/Aug/19 02:15
Start Date: 27/Aug/19 02:15
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9437: [BEAM-7021] 
Removed unused arg from ToString.Element
URL: https://github.com/apache/beam/pull/9437
 
 
   Removed unused arg from ToString.Element
   
   cc: @mszb let me know if there is a reason to keep this argument.
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301653
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 27/Aug/19 02:07
Start Date: 27/Aug/19 02:07
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9262: [BEAM-7389] Add code 
examples for Regex page
URL: https://github.com/apache/beam/pull/9262#issuecomment-525104612
 
 
   Is there a way to update the staged page?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301653)
Time Spent: 50h 20m  (was: 50h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 50h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301652
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 27/Aug/19 02:05
Start Date: 27/Aug/19 02:05
Worklog Time Spent: 10m 
  Work Description: aaltay commented on pull request #9435: [BEAM-7389] 
Update to use util.Regex transform
URL: https://github.com/apache/beam/pull/9435
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301652)
Time Spent: 50h 10m  (was: 50h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 50h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301651
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 27/Aug/19 02:04
Start Date: 27/Aug/19 02:04
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9433: [BEAM-7389] Update to 
use util.ToString transform
URL: https://github.com/apache/beam/pull/9433#issuecomment-525103876
 
 
   LGTM, could you fix the test issues?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301651)
Time Spent: 50h  (was: 49h 50m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 50h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8094) Keep python library version synced between setup.py and base_image_requirements.txt

2019-08-26 Thread Ahmet Altay (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916271#comment-16916271
 ] 

Ahmet Altay commented on BEAM-8094:
---

/cc [~tvalentyn]

> Keep python library version synced between setup.py and 
> base_image_requirements.txt
> ---
>
> Key: BEAM-8094
> URL: https://issues.apache.org/jira/browse/BEAM-8094
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Hannah Jiang
>Priority: Major
>
> There are some shared libraries in setup.py and base_image_requirements.txt.
> Find a way to keep versions are syncs in both files and new libraries added 
> to setup.py are added to base_image_requirements.txt if it is commonly used.
> Now it is manually synced.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301636
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 27/Aug/19 01:54
Start Date: 27/Aug/19 01:54
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-525101802
 
 
   Thank you. I'll go ahead and merge.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301636)
Time Spent: 2h 10m  (was: 2h)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301637
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 27/Aug/19 01:54
Start Date: 27/Aug/19 01:54
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9411: 
[BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301637)
Time Spent: 2h 20m  (was: 2h 10m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=301620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301620
 ]

ASF GitHub Bot logged work on BEAM-7984:


Author: ASF GitHub Bot
Created on: 27/Aug/19 01:36
Start Date: 27/Aug/19 01:36
Worklog Time Spent: 10m 
  Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder 
returned for typehints.List should be IterableCoder
URL: https://github.com/apache/beam/pull/9344#issuecomment-525098325
 
 
   > If the user specifies the type as List, should we ensure that we always 
return a List? (I.e should we explicitly state our assumption here that 
IterableCoder returns a list and also adda note in the implementation of 
IterableCoderImpl that if this changes we need to create a separate ListCoder? 
Or should we go ahead and create a separate ListCoder now?
   
   I dug into this a bit more and realized that I was mistaken about my 
assertion that `IterableCoderImpl.decode` always return a list.  Here's the 
relevant code from the base class:
   
   ```python
   class SequenceCoderImpl(StreamCoderImpl):
   
 def __init__(self, elem_coder,
  read_state=None, write_state=None, write_state_threshold=0):
   self._elem_coder = elem_coder
   self._read_state = read_state
   self._write_state = write_state
   self._write_state_threshold = write_state_threshold
   
 ...
   
 def decode_from_stream(self, in_stream, nested):
   size = in_stream.read_bigendian_int32()
   
   if size >= 0:
 elements = [self._elem_coder.decode_from_stream(in_stream, True)
 for _ in range(size)]
   else:
 elements = []
 count = in_stream.read_var_int64()
 while count > 0:
   for _ in range(count):
 elements.append(self._elem_coder.decode_from_stream(in_stream, 
True))
   count = in_stream.read_var_int64()
   
 if count == -1:
   if self._read_state is None:
 raise ValueError(
 'Cannot read state-written iterable without state reader.')
   
   state_token = in_stream.read_all(True)
   elements = _ConcatSequence(
   elements, self._read_state(state_token, self._elem_coder))
   
   return self._construct_from_sequence(elements)
   ```
   
   `_ConcatSequence` and `self._read_state` are both iterators. 
   
   Some options:
   
   - create a `ListCoder`:   the problem is this won't be portable, which 
defeats the original motivation here, for external transforms
   - make `IterableCoder` always return a `list`:   It currently returns an 
iterator in the (rare?) case that `read_state` is provided, in which case it 
_seems_ like it progressively yields elements as they are decoded, but I'm not 
100% clear on that.  If that _is_ the case, converting to a list on decode 
would undermine the usefulness of this feature. 
   - leave it as is:  `list` is iterable after all, and in the case that the 
user provides a list to encode, they will get a list back, since there 
shouldn't be a `read_state`. 
   
   So as it is, `IterableCoder` is true to its name in that it really only 
guarantees that the decoded value is iterable.  However, as the tests that I 
added imply, in the typical case, the result is actually a list.  So I think 
the best thing to do is to for me to update the test that I added to assert 
that the returned type is a list, so that if that ever changes, someone is 
forced to consider it.
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301620)
Time Spent: 0.5h  (was: 20m)

> [python] The coder returned for typehints.List should be IterableCoder
> --
>
> Key: BEAM-7984
> URL: https://issues.apache.org/jira/browse/BEAM-7984
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Chad Dombrova
>Assignee: Chad Dombrova
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> IterableCoder encodes a list and decodes to list, but 
> typecoders.registry.get_coder(typehints.List[bytes]) returns a 
> FastPrimitiveCoder.  I don't see any reason why this would be advantageous. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=301611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301611
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 27/Aug/19 00:34
Start Date: 27/Aug/19 00:34
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9334: [BEAM-7972] 
Always use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#discussion_r317851116
 
 

 ##
 File path: sdks/python/apache_beam/transforms/util.py
 ##
 @@ -612,29 +611,23 @@ def restore_timestamps(element):
 for (value, timestamp) in values]
 
 else:
-  # The linter is confused.
-  # hash(1) is used to force "runtime" selection of _IdentityWindowFn
-  # pylint: disable=abstract-class-instantiated
-  cls = hash(1) and _IdentityWindowFn
-  window_fn = cls(
-  windowing_saved.windowfn.get_window_coder())
-
-  def reify_timestamps(element, timestamp=DoFn.TimestampParam):
+  def reify_timestamps(element,
+   timestamp=DoFn.TimestampParam,
+   window=DoFn.WindowParam):
 key, value = element
-return key, TimestampedValue(value, timestamp)
+# Transport the window as part of the value and restore it later.
+return key, TimestampedValue((value, window), timestamp)
 
 Review comment:
   > > The defaulting to global window should be deleted since the Python SDK 
now does send a proper windowing strategy (same as Go SDK). The code was added 
as a migration path to allow for differences in where the Python/Go/Java SDKs 
were when submitting jobs to Dataflow.
   > 
   > So we should update the reshuffle code to not pass the non standard window 
from python.
   
   We shouldn't have to, but if the alternative is significant JRH refactoring, 
then this code should be OK and we can add a comment that we're working around 
bugs in the Dataflow JRH. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301611)
Time Spent: 3h  (was: 2h 50m)

> Portable Python Reshuffle does not work with windowed pcollection
> -
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8093) py36-gcp is missing from the current tox.ini

2019-08-26 Thread Udi Meiri (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916229#comment-16916229
 ] 

Udi Meiri commented on BEAM-8093:
-

https://github.com/apache/beam/pull/7949 has a fix for this, but might be 
delayed for a while since it's low-pri.

> py36-gcp is missing from the current tox.ini
> 
>
> Key: BEAM-8093
> URL: https://issues.apache.org/jira/browse/BEAM-8093
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Affects Versions: 2.16.0
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301592
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 26/Aug/19 23:41
Start Date: 26/Aug/19 23:41
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9262: [BEAM-7389] Add 
code examples for Regex page
URL: https://github.com/apache/beam/pull/9262#issuecomment-525074981
 
 
   > Should we cover the recently added RegEx transform here? (
   > 
   > 
https://github.com/apache/beam/blob/ab37b0fd6ce9a26bc6fa36f775df5ddeb067dd2a/sdks/python/apache_beam/transforms/util.py#L873
   > 
   > )
   
   Good point, updating the code samples on #9435, will update the docs 
afterwards.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301592)
Time Spent: 49h 50m  (was: 49h 40m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 49h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301591
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 26/Aug/19 23:40
Start Date: 26/Aug/19 23:40
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9435: 
[BEAM-7389] Update to use util.Regex transform
URL: https://github.com/apache/beam/pull/9435
 
 
   Update the code sample for `Regex` to use the `util.Regex` transform.
   
   R: @aaltay Can you take a look whenever you have a chance? Thanks!
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=301584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301584
 ]

ASF GitHub Bot logged work on BEAM-5878:


Author: ASF GitHub Bot
Created on: 26/Aug/19 23:11
Start Date: 26/Aug/19 23:11
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on pull request #9237: [BEAM-5878] 
support DoFns with Keyword-only arguments
URL: https://github.com/apache/beam/pull/9237#discussion_r317834907
 
 

 ##
 File path: sdks/python/apache_beam/internal/pickler.py
 ##
 @@ -136,6 +136,32 @@ def _reject_generators(unused_pickler, unused_obj):
 
 dill.dill.Pickler.dispatch[types.GeneratorType] = _reject_generators
 
+# TODO: Remove this once uqfoundation/dill#313 is fixed
+if sys.version_info[0] > 2:
+  # Monkey patch for dill._dill.Pickler to pickle functions
+  # with keyword-only args
+  _create_function = dill.dill._create_function
+
+  def _create_function_has_kwdefaults(fcode, fglobals, fname=None,
+  fdefaults=None, fclosure=None, 
fdict=None,
+  fkwdefaults=None):
+func = _create_function(fcode, fglobals, fname, fdefaults, fclosure, fdict)
+func.__kwdefaults__ = fkwdefaults
+return func
+
+  def new_save_reduce(self, func, args, state=None, listitems=None,
 
 Review comment:
   Can we use a more generic signature of new_save_reduce, for example `def 
new_save_reduce(self, func, args, **kwargs)`? The problem is that we assume a 
particular version of the API for pickle.save_reduce here, and we can see that 
it will change in Python 3.8, see 
https://github.com/python/cpython/blob/c75f0e5bdee3cfaba9fd5b3a8549dec0aba01ebe/Lib/pickle.py#L619.
 
   
   I think with a generic definition of `new_save_reduce`  we can still update 
`args` list , and pass `**kwargs` to `picker.save_reduce`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301584)
Time Spent: 9h 10m  (was: 9h)

> Support DoFns with Keyword-only arguments in Python 3.
> --
>
> Key: BEAM-5878
> URL: https://issues.apache.org/jira/browse/BEAM-5878
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: yoshiki obata
>Priority: Minor
> Fix For: 2.16.0
>
>  Time Spent: 9h 10m
>  Remaining Estimate: 0h
>
> Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to 
> define functions with keyword-only arguments. 
> Currently Beam does not handle them correctly. [~ruoyu] pointed out [one 
> place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118]
>  in our codebase that we should fix: in Python in 3.0 inspect.getargspec() 
> will fail on functions with keyword-only arguments, but a new method 
> [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec]
>  supports them.
> There may be implications for our (best-effort) type-hints machinery.
> We should also add a Py3-only unit tests that covers DoFn's with keyword-only 
> arguments once Beam Python 3 tests are in a good shape.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8045) Publish custom windows pattern

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8045?focusedWorklogId=301575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301575
 ]

ASF GitHub Bot logged work on BEAM-8045:


Author: ASF GitHub Bot
Created on: 26/Aug/19 23:01
Start Date: 26/Aug/19 23:01
Worklog Time Spent: 10m 
  Work Description: aaltay commented on issue #9406: [BEAM-8045] Custom 
windows patterns
URL: https://github.com/apache/beam/pull/9406#issuecomment-525065807
 
 
   As a reference for everyone linking the related PR: 
https://github.com/apache/beam/pull/9399
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301575)
Time Spent: 0.5h  (was: 20m)

> Publish custom windows pattern
> --
>
> Key: BEAM-8045
> URL: https://issues.apache.org/jira/browse/BEAM-8045
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Cyrus Maden
>Assignee: Cyrus Maden
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301569
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:59
Start Date: 26/Aug/19 22:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9283: [BEAM-7060] 
Type hints from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#discussion_r317828430
 
 

 ##
 File path: sdks/python/apache_beam/transforms/ptransform.py
 ##
 @@ -821,7 +823,9 @@ def expand(self, pcoll):
 
 # TODO(BEAM-5878) Support keyword-only arguments.
 try:
-  if 'type_hints' in getfullargspec(self._fn).args:
+  # TODO(udim): This looks like unused code. When is 'type_hints' used as 
an
 
 Review comment:
   IIRC, this was used to support something like
   
   ```
   @ptransformfn
   def MyTransform(pcoll, type_hints, arg):
  ...
   
   pcoll | MyPTransform(arg)
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301569)
Time Spent: 12h  (was: 11h 50m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301571
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:59
Start Date: 26/Aug/19 22:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9283: [BEAM-7060] 
Type hints from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#discussion_r317829250
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators.py
 ##
 @@ -171,22 +269,55 @@ def simple_output_type(self, context):
 if self.output_types:
   args, kwargs = self.output_types
   if len(args) != 1 or kwargs:
-raise TypeError('Expected simple output type hint for %s' % context)
+raise TypeError(
+'Expected single output type hint for %s but got: %s' % (
+context, self.output_types))
   return args[0]
 
+  def has_simple_output_type(self):
+"""Whether there's a single positional output type."""
+return (self.output_types and len(self.output_types[0]) == 1 and
+not self.output_types[1])
+
+  def strip_iterable(self):
+"""Removes outer Iterable (or equivalent) from output type.
+
+Only affects instances with simple output types, otherwise is a no-op.
+
+Example: Generator[Tuple(int, int)] becomes Tuple(int, int)
+
+Raises:
+  ValueError if output type is simple and not iterable.
+"""
+if not self.has_simple_output_type():
+  return
+yielded_type = typehints.get_yielded_type(self.output_types[0][0])
+self.output_types = ((yielded_type,), {})
+
   def copy(self):
 return IOTypeHints(self.input_types, self.output_types)
 
   def with_defaults(self, hints):
 if not hints:
   return self
-elif not self:
-  return hints
-return IOTypeHints(self.input_types or hints.input_types,
-   self.output_types or hints.output_types)
+if self._has_input_types():
+  input_types = self.input_types
+else:
+  input_types = hints.input_types
+if self._has_output_types():
+  output_types = self.output_types
+else:
+  output_types = hints.output_types
+return IOTypeHints(input_types, output_types)
+
+  def _has_input_types(self):
+return self.input_types is not None and any(self.input_types)
 
 Review comment:
   Here (and elsewhere), when is any(self.input_types) False but 
self.input_types not False?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301571)
Time Spent: 12h 20m  (was: 12h 10m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301570
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:59
Start Date: 26/Aug/19 22:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9283: [BEAM-7060] 
Type hints from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#discussion_r317830592
 
 

 ##
 File path: sdks/python/apache_beam/typehints/typed_pipeline_test.py
 ##
 @@ -290,5 +306,26 @@ def test_flat_type_hint(self):
   self.test_input | self.CustomTransform().with_output_types(int)
 
 
+class AnnotationsTest(unittest.TestCase):
+
+  def test_pardo_wrapper_builtin(self):
+th = beam.ParDo(str.strip).get_type_hints()
+if sys.version_info < (3, 7):
+  self.assertEqual(th.input_types, ((str,), {}))
+else:
+  # Python 3.7+ has annotations for CPython builtins
+  # (_MethodDescriptorType).
+  self.assertEqual(th.input_types, ((str, typehints.Any), {}))
+self.assertEqual(th.output_types, ((typehints.Any,), {}))
+
+th = beam.ParDo(list).get_type_hints()
+self.assertIsNone(th.input_types)
+self.assertIsNone(th.output_types)
 
 Review comment:
   Can we not infer the output type is list here? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301570)
Time Spent: 12h 10m  (was: 12h)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301568
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:59
Start Date: 26/Aug/19 22:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9283: [BEAM-7060] 
Type hints from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#discussion_r317829417
 
 

 ##
 File path: sdks/python/apache_beam/typehints/decorators.py
 ##
 @@ -265,23 +368,34 @@ def _unpack_positional_arg_hints(arg, hint):
   return hint
 
 
-def getcallargs_forhints(func, *typeargs, **typekwargs):
-  """Like inspect.getcallargs, but understands that Tuple[] and an Any unpack.
+def getcallargs_forhints(using_var_hints, func, *typeargs, **typekwargs):
+  """Like inspect.getcallargs, with support for declaring default args as Any.
+
+  In Python 2, understands that Tuple[] and an Any unpack.
+
+  Args:
+using_var_hints: For variable length arguments, whether to expect the bound
 
 Review comment:
   Is it possible to push this complication into the single caller? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301568)
Time Spent: 12h  (was: 11h 50m)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301572
 ]

ASF GitHub Bot logged work on BEAM-7060:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:59
Start Date: 26/Aug/19 22:59
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9283: [BEAM-7060] 
Type hints from Python 3 annotations
URL: https://github.com/apache/beam/pull/9283#discussion_r317832098
 
 

 ##
 File path: sdks/python/apache_beam/typehints/typehints_test.py
 ##
 @@ -1062,57 +797,92 @@ def test_hint_helper(self):
 self.assertTrue(is_consistent_with(str, Union[str, int]))
 self.assertFalse(is_consistent_with(Union[str, int], str))
 
-  def test_positional_arg_hints(self):
-self.assertEqual(typehints.Any, _positional_arg_hints('x', {}))
-self.assertEqual(int, _positional_arg_hints('x', {'x': int}))
-self.assertEqual(typehints.Tuple[int, typehints.Any],
- _positional_arg_hints(['x', 'y'], {'x': int}))
-
   def test_getcallargs_forhints(self):
 def func(a, b_c, *d):
   b, c = b_c # pylint: disable=unused-variable
   return None
 self.assertEqual(
 {'a': Any, 'b_c': Any, 'd': Tuple[Any, ...]},
-getcallargs_forhints(func, *[Any, Any]))
+getcallargs_forhints(False, func, *[Any, Any]))
+if sys.version_info >= (3,):
+  self.assertEqual(
+  {'a': Any, 'b_c': Any, 'd': Tuple[Union[int, str], ...]},
+  getcallargs_forhints(False, func, *[Any, Any, str, int]))
+else:
+  self.assertEqual(
+  {'a': Any, 'b_c': Any, 'd': Tuple[Any, ...]},
+  getcallargs_forhints(False, func, *[Any, Any, Any, int]))
 
 Review comment:
   OK, at least reference a JIRA? 
   
   It looks like the common pattern is that you have a Tuple[more specific 
type] -> Tuple[Any, ...]. Perhaps you could make a relax_on_py2 method that 
does this substitution and use that everywhere rather than duplicating the 
logic in each test? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301572)

> Design Py3-compatible typehints annotation support in Beam 3.
> -
>
> Key: BEAM-7060
> URL: https://issues.apache.org/jira/browse/BEAM-7060
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
>  Time Spent: 12h 20m
>  Remaining Estimate: 0h
>
> Existing [Typehints implementaiton in 
> Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/
> ] heavily relies on internal details of CPython implementation, and some of 
> the assumptions of this implementation broke as of Python 3.6, see for 
> example: https://issues.apache.org/jira/browse/BEAM-6877, which makes  
> typehints support unusable on Python 3.6 as of now. [Python 3 Kanban 
> Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail]
>  lists several specific typehints-related breakages, prefixed with "TypeHints 
> Py3 Error".
> We need to decide whether to:
> - Deprecate in-house typehints implementation.
> - Continue to support in-house implementation, which at this point is a stale 
> code and has other known issues.
> - Attempt to use some off-the-shelf libraries for supporting 
> type-annotations, like  Pytype, Mypy, PyAnnotate.
> WRT to this decision we also need to plan on immediate next steps to unblock 
> adoption of Beam for  Python 3.6+ users. One potential option may be to have 
> Beam SDK ignore any typehint annotations on Py 3.6+.
> cc: [~udim], [~altay], [~robertwb].



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7266) Pipeline run does not terminate because of Dataflow runner can not close file system writer

2019-08-26 Thread Ankur Goenka (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916198#comment-16916198
 ] 

Ankur Goenka commented on BEAM-7266:


Another user reported a similar issue 
https://stackoverflow.com/questions/57507122/pipeline-keeps-running-because-of-dataflow-runner-not-closing-file-system-writer

> Pipeline run does not terminate because of Dataflow runner can not close file 
> system writer
> ---
>
> Key: BEAM-7266
> URL: https://issues.apache.org/jira/browse/BEAM-7266
> Project: Beam
>  Issue Type: Bug
>  Components: io-py-gcp, runner-dataflow
>Affects Versions: 2.11.0, 2.12.0
>Reporter: Fabian
>Priority: Critical
> Fix For: Not applicable
>
>
> We are using Apache Beam in version 2.11.0 (Python SDK) with the Dataflow 
> runner running on the Google Cloud Platform. Two pipeline runs did not 
> terminate, i.e. after multiple days (instead of some minutes) they where 
> still running. The only error that was logged is:
> If fails to close a writer:
> {code:java}
> Traceback (most recent call last):
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 
> 649, in do_work
> work_executor.execute()
>   File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", 
> line 178, in execute
> op.finish()
>   File "dataflow_worker/native_operations.py", line 93, in 
> dataflow_worker.native_operations.NativeWriteOperation.finish
> def finish(self):
>   File "dataflow_worker/native_operations.py", line 94, in 
> dataflow_worker.native_operations.NativeWriteOperation.finish
> with self.scoped_finish_state:
>   File "dataflow_worker/native_operations.py", line 95, in 
> dataflow_worker.native_operations.NativeWriteOperation.finish
> self.writer.__exit__(None, None, None)
>   File 
> "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", 
> line 277, in __exit__
> self._data_file_writer.close()
>   File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 220, 
> in close
> self.writer.close()
>   File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py", line 
> 202, in close
> self._uploader.finish()
>   File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py", 
> line 606, in finish
> raise self._upload_thread.last_error  # pylint: disable=raising-bad-type
> NotImplementedError{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301562
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:37
Start Date: 26/Aug/19 22:37
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317826798
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 28h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7966) Write portable Flink application jar

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301553
 ]

ASF GitHub Bot logged work on BEAM-7966:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:10
Start Date: 26/Aug/19 22:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9331: [BEAM-7966] 
Write portable Flink application jar
URL: https://github.com/apache/beam/pull/9331#discussion_r317819152
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.jobsubmission;
+
+/** Contains common code for writing and reading portable pipeline jars. */
+public abstract class PortablePipelineJarUtils {
 
 Review comment:
   It'd be good here, or elsewhere, to document the spec of what the jar should 
contain (i.e. what the contents of each of these files is). Also, should they 
all be in some common subdirectory? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301553)
Remaining Estimate: 0h
Time Spent: 10m

> Write portable Flink application jar
> 
>
> Key: BEAM-7966
> URL: https://issues.apache.org/jira/browse/BEAM-7966
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7966) Write portable Flink application jar

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301554
 ]

ASF GitHub Bot logged work on BEAM-7966:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:10
Start Date: 26/Aug/19 22:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9331: [BEAM-7966] 
Write portable Flink application jar
URL: https://github.com/apache/beam/pull/9331#discussion_r317818691
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.jobsubmission;
+
+/** Contains common code for writing and reading portable pipeline jars. */
+public abstract class PortablePipelineJarUtils {
+  static final String ARTIFACT_FOLDER_NAME = "beam-artifact-staging";
+  static final String ARTIFACT_MANIFEST_NAME = "beam-artifact-manifest.json";
+  static final String PIPELINE_FILE_NAME = "beam-pipeline.textproto";
 
 Review comment:
   Nit: are these textprotos? I think they're binary. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301554)
Remaining Estimate: 0h
Time Spent: 10m

> Write portable Flink application jar
> 
>
> Key: BEAM-7966
> URL: https://issues.apache.org/jira/browse/BEAM-7966
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7966) Write portable Flink application jar

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301555
 ]

ASF GitHub Bot logged work on BEAM-7966:


Author: ASF GitHub Bot
Created on: 26/Aug/19 22:10
Start Date: 26/Aug/19 22:10
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9331: [BEAM-7966] 
Write portable Flink application jar
URL: https://github.com/apache/beam/pull/9331#discussion_r317819889
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java
 ##
 @@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.fnexecution.jobsubmission;
+
+/** Contains common code for writing and reading portable pipeline jars. */
+public abstract class PortablePipelineJarUtils {
+  static final String ARTIFACT_FOLDER_NAME = "beam-artifact-staging";
+  static final String ARTIFACT_MANIFEST_NAME = "beam-artifact-manifest.json";
 
 Review comment:
   Why is this json and the others raw proto (that ProxyManifest is also a 
proto message)? Or should (some of?) the others be json as well? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301555)
Time Spent: 20m  (was: 10m)

> Write portable Flink application jar
> 
>
> Key: BEAM-7966
> URL: https://issues.apache.org/jira/browse/BEAM-7966
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-flink
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]*



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301543
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:50
Start Date: 26/Aug/19 21:50
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] 
Simplify/generalize Spark reshuffle translation
URL: https://github.com/apache/beam/pull/9410#issuecomment-525047135
 
 
   Run Python Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301543)
Time Spent: 1h 40m  (was: 1.5h)

> Portable Spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301542
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:50
Start Date: 26/Aug/19 21:50
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] 
Simplify/generalize Spark reshuffle translation
URL: https://github.com/apache/beam/pull/9410#issuecomment-525047102
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301542)
Time Spent: 1.5h  (was: 1h 20m)

> Portable Spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301540
 ]

ASF GitHub Bot logged work on BEAM-8015:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:44
Start Date: 26/Aug/19 21:44
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get 
logs from Docker containers
URL: https://github.com/apache/beam/pull/9389#discussion_r317810021
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -134,10 +148,41 @@ public void killContainer(String containerId)
 runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId));
   }
 
-  /** Run the given command invocation and return stdout as a String. */
+  /**
+   * Removes docker container with container id.
+   *
+   * @throws IOException if an IOException occurs, or if the given container 
id either does not
+   * exist or is still running
+   */
+  public void removeContainer(String containerId)
+  throws IOException, TimeoutException, InterruptedException {
+checkArgument(containerId != null);
+checkArgument(
+CONTAINER_ID_PATTERN.matcher(containerId).matches(),
+"Container ID must be a 64-character hexadecimal string");
+runShortCommand(Arrays.asList(dockerExecutable, "rm", containerId));
+  }
+
   private String runShortCommand(List invocation)
   throws IOException, TimeoutException, InterruptedException {
+return runShortCommand(invocation, false, "");
+  }
+
+  /**
+   * Runs a command, blocks until {@link DockerCommand#commandTimeout} has 
elapsed, then returns the
+   * command's output.
+   *
+   * @param invocation command and arguments to be run
+   * @param redirectErrorStream if true, redirect stderr of the process to its 
stdout
+   * @param delimiter used for separating output lines
+   * @return stdout of the command, including stderr if {@code 
redirectErrorStream} is true
+   * @throws TimeoutException if command has not finished by {@link 
DockerCommand#commandTimeout}
+   */
+  private String runShortCommand(
+  List invocation, boolean redirectErrorStream, CharSequence 
delimiter)
+  throws IOException, TimeoutException, InterruptedException {
 ProcessBuilder pb = new ProcessBuilder(invocation);
+pb.redirectErrorStream(redirectErrorStream);
 
 Review comment:
   > I suppose my question is, how will we capture the output of the 
runShortCommand when redirectErrorStream is set to true?
   
   Here are all the cases:
   
   - `!redirectErrorStream && exitCode == 0` Return only the stdout of the 
command.
   - `!redirectErrorStream && exitCode != 0` Throw an exception that includes 
only the stderr of the command.
   - `redirectErrorStream && exitCode == 0` Return the stdout and stderr of the 
command.
   - `redirectErrorStream && exitCode != 0` Throw an exception that includes 
both the stdout and stderr of the command.
   
   It's not as simple as I would like it to be, but I think this is the 
behavior we want.
   
   > If this simply redirects stderr to stdout, why couldn't we capture stderr 
before?
   
   Before, we were only using stderr if the process exited with a nonzero code. 
I guess we wouldn't want to include it in the return value of `runShortCommand` 
normally because irrelevant warnings, etc. could potentially break the parsing 
of the result for some commands.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301540)
Time Spent: 2h 50m  (was: 2h 40m)

> Get logs for SDK worker Docker containers
> -
>
> Key: BEAM-8015
> URL: https://issues.apache.org/jira/browse/BEAM-8015
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently, when SDK worker containers fail to start up properly, an exception 
> is thrown that provides no information about what happened. We can improve 
> debugging by keeping containers around long enough to log their logs before 
> removing them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301539
 ]

ASF GitHub Bot logged work on BEAM-8015:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:44
Start Date: 26/Aug/19 21:44
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get 
logs from Docker containers
URL: https://github.com/apache/beam/pull/9389#discussion_r317810021
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -134,10 +148,41 @@ public void killContainer(String containerId)
 runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId));
   }
 
-  /** Run the given command invocation and return stdout as a String. */
+  /**
+   * Removes docker container with container id.
+   *
+   * @throws IOException if an IOException occurs, or if the given container 
id either does not
+   * exist or is still running
+   */
+  public void removeContainer(String containerId)
+  throws IOException, TimeoutException, InterruptedException {
+checkArgument(containerId != null);
+checkArgument(
+CONTAINER_ID_PATTERN.matcher(containerId).matches(),
+"Container ID must be a 64-character hexadecimal string");
+runShortCommand(Arrays.asList(dockerExecutable, "rm", containerId));
+  }
+
   private String runShortCommand(List invocation)
   throws IOException, TimeoutException, InterruptedException {
+return runShortCommand(invocation, false, "");
+  }
+
+  /**
+   * Runs a command, blocks until {@link DockerCommand#commandTimeout} has 
elapsed, then returns the
+   * command's output.
+   *
+   * @param invocation command and arguments to be run
+   * @param redirectErrorStream if true, redirect stderr of the process to its 
stdout
+   * @param delimiter used for separating output lines
+   * @return stdout of the command, including stderr if {@code 
redirectErrorStream} is true
+   * @throws TimeoutException if command has not finished by {@link 
DockerCommand#commandTimeout}
+   */
+  private String runShortCommand(
+  List invocation, boolean redirectErrorStream, CharSequence 
delimiter)
+  throws IOException, TimeoutException, InterruptedException {
 ProcessBuilder pb = new ProcessBuilder(invocation);
+pb.redirectErrorStream(redirectErrorStream);
 
 Review comment:
   > I suppose my question is, how will we capture the output of the 
runShortCommand when redirectErrorStream is set to true?
   
   Here are all the cases:
   
   - `!redirectErrorStream && exitCode == 0` Return only the stdout of the 
command.
   - `!redirectErrorStream && exitCode != 0` Throw an exception that includes 
only the stderr of the command.
   - `redirectErrorStream && exitCode == 0` Return the stdout and stderr of the 
command.
   - `redirectErrorStream && exitCode != 0` Throw an exception that includes 
both the stdout and stderr of the command.
   
   It's not as simple as I would like it to be, but I think this is the 
behavior we want.
   
   > If this simply redirects stderr to stdout, why couldn't we capture stderr 
before?
   
   Before, we were only using stderr if the process exited with a nonzero code. 
I guess we wouldn't want to include it in the return value of `runShortCommand` 
because irrelevant warnings, etc. could potentially break the parsing of the 
result for some commands.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301539)
Time Spent: 2h 40m  (was: 2.5h)

> Get logs for SDK worker Docker containers
> -
>
> Key: BEAM-8015
> URL: https://issues.apache.org/jira/browse/BEAM-8015
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently, when SDK worker containers fail to start up properly, an exception 
> is thrown that provides no information about what happened. We can improve 
> debugging by keeping containers around long enough to log their logs before 
> removing them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301534
 ]

ASF GitHub Bot logged work on BEAM-8015:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:40
Start Date: 26/Aug/19 21:40
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get 
logs from Docker containers
URL: https://github.com/apache/beam/pull/9389#discussion_r317810869
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -152,34 +148,43 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   containerId = docker.runImage(containerImage, dockerArgsBuilder.build(), 
args);
   LOG.debug("Created Docker Container with Container ID {}", containerId);
   // Wait on a client from the gRPC server.
-  while (instructionHandler == null) {
+  try {
+instructionHandler = clientSource.take(workerId, 
Duration.ofMinutes(1));
+  } catch (TimeoutException timeoutEx) {
+RuntimeException runtimeException =
+new RuntimeException(
+String.format(
+"Docker container %s failed to start up successfully 
within 1 minute.",
+containerImage),
+timeoutEx);
 try {
-  instructionHandler = clientSource.take(workerId, 
Duration.ofMinutes(1));
-} catch (TimeoutException timeoutEx) {
-  Preconditions.checkArgument(
-  docker.isContainerRunning(containerId), "No container running 
for id " + containerId);
-  LOG.info(
-  "Still waiting for startup of environment {} for worker id {}",
-  dockerPayload.getContainerImage(),
-  workerId);
-} catch (InterruptedException interruptEx) {
-  Thread.currentThread().interrupt();
-  throw new RuntimeException(interruptEx);
+  String containerLogs = docker.getContainerLogs(containerId);
+  LOG.error("Docker container {} logs:\n{}", containerId, 
containerLogs);
 
 Review comment:
   (Also, I changed the code a bit, it was incorrect before)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301534)
Time Spent: 2.5h  (was: 2h 20m)

> Get logs for SDK worker Docker containers
> -
>
> Key: BEAM-8015
> URL: https://issues.apache.org/jira/browse/BEAM-8015
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Currently, when SDK worker containers fail to start up properly, an exception 
> is thrown that provides no information about what happened. We can improve 
> debugging by keeping containers around long enough to log their logs before 
> removing them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301530
 ]

ASF GitHub Bot logged work on BEAM-8015:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:37
Start Date: 26/Aug/19 21:37
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get 
logs from Docker containers
URL: https://github.com/apache/beam/pull/9389#discussion_r317810021
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java
 ##
 @@ -134,10 +148,41 @@ public void killContainer(String containerId)
 runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId));
   }
 
-  /** Run the given command invocation and return stdout as a String. */
+  /**
+   * Removes docker container with container id.
+   *
+   * @throws IOException if an IOException occurs, or if the given container 
id either does not
+   * exist or is still running
+   */
+  public void removeContainer(String containerId)
+  throws IOException, TimeoutException, InterruptedException {
+checkArgument(containerId != null);
+checkArgument(
+CONTAINER_ID_PATTERN.matcher(containerId).matches(),
+"Container ID must be a 64-character hexadecimal string");
+runShortCommand(Arrays.asList(dockerExecutable, "rm", containerId));
+  }
+
   private String runShortCommand(List invocation)
   throws IOException, TimeoutException, InterruptedException {
+return runShortCommand(invocation, false, "");
+  }
+
+  /**
+   * Runs a command, blocks until {@link DockerCommand#commandTimeout} has 
elapsed, then returns the
+   * command's output.
+   *
+   * @param invocation command and arguments to be run
+   * @param redirectErrorStream if true, redirect stderr of the process to its 
stdout
+   * @param delimiter used for separating output lines
+   * @return stdout of the command, including stderr if {@code 
redirectErrorStream} is true
+   * @throws TimeoutException if command has not finished by {@link 
DockerCommand#commandTimeout}
+   */
+  private String runShortCommand(
+  List invocation, boolean redirectErrorStream, CharSequence 
delimiter)
+  throws IOException, TimeoutException, InterruptedException {
 ProcessBuilder pb = new ProcessBuilder(invocation);
+pb.redirectErrorStream(redirectErrorStream);
 
 Review comment:
   > I suppose my question is, how will we capture the output of the 
runShortCommand when redirectErrorStream is set to true?
   
   Here are all the cases:
   
   - `!redirectErrorStream && exitCode == 0` Return only the stdout of the 
command.
   - `!redirectErrorStream && exitCode != 0` Throw an exception that includes 
only the stderr of the command.
   - `redirectErrorStream && exitCode == 0` Return the stdout and stderr of the 
command.
   - `redirectErrorStream && exitCode != 0` Throw an exception that includes 
both the stdout and stderr of the command.
   
   It's not as simple as I would like it to be, but I think this is the 
behavior we want.
   
   > If this simply redirects stderr to stdout, why couldn't we capture stderr 
before?
   
   Before, we were only using stderr if there was an error. I guess we wouldn't 
want to include it in the return value of `runShortCommand` because irrelevant 
warnings, etc. could potentially break the parsing of the result for some 
commands.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301530)
Time Spent: 2h 10m  (was: 2h)

> Get logs for SDK worker Docker containers
> -
>
> Key: BEAM-8015
> URL: https://issues.apache.org/jira/browse/BEAM-8015
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently, when SDK worker containers fail to start up properly, an exception 
> is thrown that provides no information about what happened. We can improve 
> debugging by keeping containers around long enough to log their logs before 
> removing them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301531
 ]

ASF GitHub Bot logged work on BEAM-8015:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:37
Start Date: 26/Aug/19 21:37
Worklog Time Spent: 10m 
  Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get 
logs from Docker containers
URL: https://github.com/apache/beam/pull/9389#discussion_r317810029
 
 

 ##
 File path: 
runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java
 ##
 @@ -152,34 +148,43 @@ public RemoteEnvironment createEnvironment(Environment 
environment) throws Excep
   containerId = docker.runImage(containerImage, dockerArgsBuilder.build(), 
args);
   LOG.debug("Created Docker Container with Container ID {}", containerId);
   // Wait on a client from the gRPC server.
-  while (instructionHandler == null) {
+  try {
+instructionHandler = clientSource.take(workerId, 
Duration.ofMinutes(1));
+  } catch (TimeoutException timeoutEx) {
+RuntimeException runtimeException =
+new RuntimeException(
+String.format(
+"Docker container %s failed to start up successfully 
within 1 minute.",
+containerImage),
+timeoutEx);
 try {
-  instructionHandler = clientSource.take(workerId, 
Duration.ofMinutes(1));
-} catch (TimeoutException timeoutEx) {
-  Preconditions.checkArgument(
-  docker.isContainerRunning(containerId), "No container running 
for id " + containerId);
-  LOG.info(
-  "Still waiting for startup of environment {} for worker id {}",
-  dockerPayload.getContainerImage(),
-  workerId);
-} catch (InterruptedException interruptEx) {
-  Thread.currentThread().interrupt();
-  throw new RuntimeException(interruptEx);
+  String containerLogs = docker.getContainerLogs(containerId);
+  LOG.error("Docker container {} logs:\n{}", containerId, 
containerLogs);
 
 Review comment:
   No, see my other comment
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301531)
Time Spent: 2h 20m  (was: 2h 10m)

> Get logs for SDK worker Docker containers
> -
>
> Key: BEAM-8015
> URL: https://issues.apache.org/jira/browse/BEAM-8015
> Project: Beam
>  Issue Type: Improvement
>  Components: java-fn-execution
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Currently, when SDK worker containers fail to start up properly, an exception 
> is thrown that provides no information about what happened. We can improve 
> debugging by keeping containers around long enough to log their logs before 
> removing them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301528
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:35
Start Date: 26/Aug/19 21:35
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317809353
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% bq can't be final since this method is not a constructor.
   
   minor: I mean you can make `bq` as a static attr of your test class.
   
   >  It is fine though because BigqueryClient.getClient() does caching for us 
so it won't create another new client the second time we call it.
   
   minor: I don't think there is any caching logic in `BigqueryClient`.  ` 
BigqueryClient.getClient()` just returns `new BigqueryClient()` simply
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301528)
Time Spent: 28.5h  (was: 28h 20m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 28.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301527
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:30
Start Date: 26/Aug/19 21:30
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317807506
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 28h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301524
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:25
Start Date: 26/Aug/19 21:25
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-525039196
 
 
   Thank you all @boyuanzz @zfraa @amaliujia @reuvenlax for the review! I have 
squashed all the commits into one and it think this PR is ready to be merged 
once the precommit tests pass.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301524)
Time Spent: 28h 10m  (was: 28h)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 28h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301523
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:22
Start Date: 26/Aug/19 21:22
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317804729
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 28h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301522
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:22
Start Date: 26/Aug/19 21:22
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317804549
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=301520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301520
 ]

ASF GitHub Bot logged work on BEAM-3713:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:22
Start Date: 26/Aug/19 21:22
Worklog Time Spent: 10m 
  Work Description: udim commented on pull request #7949: [BEAM-3713] Add 
pytest testing infrastructure
URL: https://github.com/apache/beam/pull/7949#discussion_r317804459
 
 

 ##
 File path: sdks/python/setup.py
 ##
 @@ -201,6 +202,7 @@ def run(self):
 install_requires=REQUIRED_PACKAGES,
 python_requires=python_requires,
 test_suite='nose.collector',
 
 Review comment:
   I don't know what to put here for pytest. The instructions said to add the 
`setup_requires` line below.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301520)
Time Spent: 4h 10m  (was: 4h)

> Consider moving away from nose to nose2 or pytest.
> --
>
> Key: BEAM-3713
> URL: https://issues.apache.org/jira/browse/BEAM-3713
> Project: Beam
>  Issue Type: Test
>  Components: sdk-py-core, testing
>Reporter: Robert Bradshaw
>Assignee: Udi Meiri
>Priority: Minor
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Per 
> [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,]
>  , nose is in maintenance mode.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301521
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:22
Start Date: 26/Aug/19 21:22
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9434: 
[BEAM-8079] Move release Gradle build to a Jenkins job (Part - 2)
URL: https://github.com/apache/beam/pull/9434
 
 
   Update `verify_release_build.sh` so that it reuses `script.config` for 
configuration setup. Also relevant part of release doc is updated:
   
   1. `verify_release_build.sh` will use configurations in `script.config` 
instead of asking people input from terminal.
   2. Github access token is required for `hub` and git push.
   3. Move all Jenkins phrases to a list `JOB_TRIGGER_PHRASES` and have release 
guide point to it.
   4. Updated release guide about script usage and how to run build locally.
   
   Tested the script in a fresh Ubuntu 18 VM.
   
   +R: @yifanzou 
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [ ] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 

[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301516
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:08
Start Date: 26/Aug/19 21:08
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-525033262
 
 
   Java PreCommit failed due to irrelevant test failure.
   Release_Build failed with known issue: 
https://issues.apache.org/jira/browse/BEAM-7789. They are not release blocker.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301516)
Time Spent: 1h 50m  (was: 1h 40m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=301515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301515
 ]

ASF GitHub Bot logged work on BEAM-7972:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:05
Start Date: 26/Aug/19 21:05
Worklog Time Spent: 10m 
  Work Description: angoenka commented on issue #9334: [BEAM-7972] Always 
use Global window in reshuffle and then apply wind…
URL: https://github.com/apache/beam/pull/9334#issuecomment-525032257
 
 
   > The defaulting to global window should be deleted since the Python SDK now 
does send a proper windowing strategy (same as Go SDK). The code was added as a 
migration path to allow for differences in where the Python/Go/Java SDKs were 
when submitting jobs to Dataflow.
   
   So we should update the reshuffle code to not pass the non standard window 
from python.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301515)
Time Spent: 2h 50m  (was: 2h 40m)

> Portable Python Reshuffle does not work with windowed pcollection
> -
>
> Key: BEAM-7972
> URL: https://issues.apache.org/jira/browse/BEAM-7972
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Ankur Goenka
>Assignee: Ankur Goenka
>Priority: Major
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Streaming pipeline gets stuck when using Reshuffle with windowed pcollection.
> The issue happen because of window function gets deserialized on java side 
> which is not possible and hence default to global window function and result 
> into window function mismatch later down the code.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301513
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:01
Start Date: 26/Aug/19 21:01
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] 
Simplify/generalize Spark reshuffle translation
URL: https://github.com/apache/beam/pull/9410#issuecomment-525031072
 
 
   Run Java Spark PortableValidatesRunner Batch
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301513)
Time Spent: 1h 10m  (was: 1h)

> Portable Spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301514
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 26/Aug/19 21:01
Start Date: 26/Aug/19 21:01
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] 
Simplify/generalize Spark reshuffle translation
URL: https://github.com/apache/beam/pull/9410#issuecomment-525031112
 
 
   Run Python Spark ValidatesRunner
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301514)
Time Spent: 1h 20m  (was: 1h 10m)

> Portable Spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8095) pytest Py3.7 crashes on test_remote_runner_display_data

2019-08-26 Thread Udi Meiri (Jira)
Udi Meiri created BEAM-8095:
---

 Summary: pytest Py3.7 crashes on test_remote_runner_display_data
 Key: BEAM-8095
 URL: https://issues.apache.org/jira/browse/BEAM-8095
 Project: Beam
  Issue Type: Improvement
  Components: test-failures, testing
Reporter: Udi Meiri


Adding certain flags such as "-n 1" or "--debug" causes Python to abort.
The --debug flag logs some information in a pytestdebug.log, but I'm not 
familiar with it and what it means if anything.

{code}
$ pytest  
apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data
 
===
 test session starts 

platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0
rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: pytest.ini
plugins: forked-1.0.2, xdist-1.29.0
collected 1 item



   

apache_beam/runners/dataflow/dataflow_runner_test.py .  



 [100%]


 1 passed in 0.11s 
=
$ pytest  
apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data
  --debug
writing pytestdebug information to 
/usr/local/google/home/ehudm/src/beam/sdks/python/pytestdebug.log
===
 test session starts 

platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 -- 
/usr/local/google/home/ehudm/virtualenvs/beam-py37/bin/python3.7
using: pytest-5.1.1 pylib-1.8.0
setuptools registered plugins:
  pytest-forked-1.0.2 at 
/usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/pytest_forked/__init__.py
  pytest-xdist-1.29.0 at 
/usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/plugin.py
  pytest-xdist-1.29.0 at 
/usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/looponfail.py
rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: pytest.ini
plugins: forked-1.0.2, xdist-1.29.0
collected 1 item



   

apache_beam/runners/dataflow/dataflow_runner_test.py Aborted (core dumped)
{code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=301482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301482
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 26/Aug/19 20:43
Start Date: 26/Aug/19 20:43
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9388: [BEAM-7909] 
upgrade python lib versions to match to dataflow worker
URL: https://github.com/apache/beam/pull/9388#issuecomment-525024490
 
 
   > Thank you!
   > 
   > @Hannah-Jiang could you also file another JIRA issue for finding a process 
to keep this list in sync with setup.py.
   
   Thank you Ahmet for reviewing and merging it.
   I created a ticket: https://issues.apache.org/jira/browse/BEAM-8094
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301482)
Time Spent: 8h 10m  (was: 8h)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8094) Keep python library version synced between setup.py and base_image_requirements.txt

2019-08-26 Thread Hannah Jiang (Jira)
Hannah Jiang created BEAM-8094:
--

 Summary: Keep python library version synced between setup.py and 
base_image_requirements.txt
 Key: BEAM-8094
 URL: https://issues.apache.org/jira/browse/BEAM-8094
 Project: Beam
  Issue Type: Improvement
  Components: sdk-py-core
Reporter: Hannah Jiang


There are some shared libraries in setup.py and base_image_requirements.txt.

Find a way to keep versions are syncs in both files and new libraries added to 
setup.py are added to base_image_requirements.txt if it is commonly used.

Now it is manually synced.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301462
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 26/Aug/19 20:30
Start Date: 26/Aug/19 20:30
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on issue #9289: [BEAM-7389] Add 
code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#issuecomment-525019745
 
 
   Updating to use 
[`util.ToString`](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/transforms/util.html#ToString).
 I opened #9433 to update the code samples to use it. I also updated this page 
to use the new snippets, but they won't render correctly until the new code 
samples are merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301462)
Time Spent: 49.5h  (was: 49h 20m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 49.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301461
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 26/Aug/19 20:28
Start Date: 26/Aug/19 20:28
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9289: 
[BEAM-7389] Add code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r317783805
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
+
+
+localStorage.setItem('language', 'language-py')
+
+
 Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+## Example
+
+Any non-string element can be converted to a string using sandard Python 
functions and methods.
 
 Review comment:
   Apparently, there's 
[`util.ToString`](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/transforms/util.html#ToString).
 I opened #9433 to update the code samples to use it. I also updated this page 
to use the new snippets, but they won't render correctly until the new code 
samples are merged.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301461)
Time Spent: 49h 20m  (was: 49h 10m)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 49h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301460
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 26/Aug/19 20:26
Start Date: 26/Aug/19 20:26
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9289: 
[BEAM-7389] Add code examples for ToString page
URL: https://github.com/apache/beam/pull/9289#discussion_r317782957
 
 

 ##
 File path: website/src/documentation/transforms/python/element-wise/tostring.md
 ##
 @@ -19,9 +19,38 @@ limitations under the License.
 -->
 
 # ToString
-Transforms every element in an input collection a string.
 
-## Examples
-See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. 
+
+localStorage.setItem('language', 'language-py')
+
 
-## Related transforms 
\ No newline at end of file
+Transforms every element in an input collection to a string.
+
+## Example
+
+Any non-string element can be converted to a string using standard Python 
functions and methods.
+Many I/O transforms, such as `TextIO`, expect their input elements to be 
strings.
 
 Review comment:
   Thanks, changed
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301460)
Time Spent: 49h 10m  (was: 49h)

> Colab examples for element-wise transforms (Python)
> ---
>
> Key: BEAM-7389
> URL: https://issues.apache.org/jira/browse/BEAM-7389
> Project: Beam
>  Issue Type: Improvement
>  Components: website
>Reporter: Rose Nguyen
>Assignee: David Cavazos
>Priority: Minor
>  Time Spent: 49h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Chamikara Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916117#comment-16916117
 ] 

Chamikara Jayalath commented on BEAM-8089:
--

Have you tried running Dataflow in the same region as where your bucket located 
using option [1] ? Networks charges should not apply in this case according to 
[2].

 

[1] 
[https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java#L133]

[2] [https://cloud.google.com/storage/pricing]

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301453
 ]

ASF GitHub Bot logged work on BEAM-7389:


Author: ASF GitHub Bot
Created on: 26/Aug/19 20:12
Start Date: 26/Aug/19 20:12
Worklog Time Spent: 10m 
  Work Description: davidcavazos commented on pull request #9433: 
[BEAM-7389] Update to use util.ToString transform
URL: https://github.com/apache/beam/pull/9433
 
 
   Update the code sample for `ToString` to use the 
[util.ToString](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.ToString)
 transform.
   
   R: @aaltay Can you take a look whenever you have a chance? Thanks!
   
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [x] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 

[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301452
 ]

ASF GitHub Bot logged work on BEAM-7864:


Author: ASF GitHub Bot
Created on: 26/Aug/19 20:08
Start Date: 26/Aug/19 20:08
Worklog Time Spent: 10m 
  Work Description: ibzib commented on issue #9410: [BEAM-7864] Fix Spark 
reshuffle translation with Python SDK
URL: https://github.com/apache/beam/pull/9410#issuecomment-525011659
 
 
   Thanks for the review @RyanSkraba. In that case, I will change the shared 
translation instead of just the portable runner's.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301452)
Time Spent: 1h  (was: 50m)

> Portable Spark Reshuffle coder cast exception
> -
>
> Key: BEAM-7864
> URL: https://issues.apache.org/jira/browse/BEAM-7864
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Kyle Weaver
>Assignee: Kyle Weaver
>Priority: Major
>  Labels: portability-spark
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> running :sdks:python:test-suites:portable:py35:portableWordCountBatch in 
> either loopback or docker mode on master fails with exception:
>  
> java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder 
> cannot be cast to org.apache.beam.sdk.coders.KvCoder
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400)
>  at 
> org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147)
>  at 
> org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=301445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301445
 ]

ASF GitHub Bot logged work on BEAM-8014:


Author: ASF GitHub Bot
Created on: 26/Aug/19 19:41
Start Date: 26/Aug/19 19:41
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on issue #9376: [BEAM-8014] Using 
OffsetRange as restriction for OffsetRestrictionTracker
URL: https://github.com/apache/beam/pull/9376#issuecomment-525001614
 
 
   PTAL @robertwb Thanks for your help : D
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301445)
Time Spent: 1h 10m  (was: 1h)

> Use OffsetRange as restriction for OffsetRestrictionTracker
> ---
>
> Key: BEAM-8014
> URL: https://issues.apache.org/jira/browse/BEAM-8014
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-core
>Reporter: Boyuan Zhang
>Assignee: Boyuan Zhang
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Harshit Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916053#comment-16916053
 ] 

Harshit Dwivedi commented on BEAM-8089:
---

The data ingested into GCS is around 250Gb for us per day, so we are incurring 
a lot of network charges.

I wanted to avoid this charge by storing everything in Dataflow PD instead of 
GCS.

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Chamikara Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916052#comment-16916052
 ] 

Chamikara Jayalath commented on BEAM-8089:
--

BTW may I ask why you cannot use GCS in this case ? Dataflow already needs GCS 
to run and storage costs should be minimum.

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Chamikara Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916051#comment-16916051
 ] 

Chamikara Jayalath commented on BEAM-8089:
--

I don't think we can fork Beam code for a very specific scenario of the 
Dataflow runner (single worker with autoscaling disabled). In general, Dataflow 
does not fuse the step that write files and the step that execute the BQ job so 
these two steps may not execute in the same worker.

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301418
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:36
Start Date: 26/Aug/19 18:36
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9411: 
[BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#discussion_r317739727
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_Build.groovy
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+
+job('beam_Release_Gradle_Build') {
 
 Review comment:
   comments updated.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301418)
Time Spent: 1h 40m  (was: 1.5h)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301417
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:35
Start Date: 26/Aug/19 18:35
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-524976528
 
 
   Run Release Gradle Build
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301417)
Time Spent: 1.5h  (was: 1h 20m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301414
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:31
Start Date: 26/Aug/19 18:31
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9418: [BEAM-5428][WIP] State 
caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#issuecomment-524975017
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301414)
Time Spent: 5h 10m  (was: 5h)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301415
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:31
Start Date: 26/Aug/19 18:31
Worklog Time Spent: 10m 
  Work Description: mxm commented on issue #9418: [BEAM-5428][WIP] State 
caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#issuecomment-524975017
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301415)
Time Spent: 5h 20m  (was: 5h 10m)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (BEAM-8093) py36-gcp is missing from the current tox.ini

2019-08-26 Thread Valentyn Tymofieiev (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Valentyn Tymofieiev updated BEAM-8093:
--
Status: Open  (was: Triage Needed)

> py36-gcp is missing from the current tox.ini
> 
>
> Key: BEAM-8093
> URL: https://issues.apache.org/jira/browse/BEAM-8093
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Affects Versions: 2.16.0
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5663) Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5663?focusedWorklogId=301410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301410
 ]

ASF GitHub Bot logged work on BEAM-5663:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:29
Start Date: 26/Aug/19 18:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7988: [BEAM-5663] Add 
Python 3.6 tox environment
URL: https://github.com/apache/beam/pull/7988#issuecomment-524974148
 
 
   It's an oversight, thanks for calling out. Opened 
https://issues.apache.org/jira/browse/BEAM-8093
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301410)
Time Spent: 5h 40m  (was: 5.5h)

> Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)
> 
>
> Key: BEAM-5663
> URL: https://issues.apache.org/jira/browse/BEAM-5663
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Manu Zhang
>Assignee: Robbe
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test 
> failings across various Python 3 versions. It will be valuable to add tox 
> suites for Python 3.4, 3.5, 3.6 and 3.7



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5663) Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5663?focusedWorklogId=301411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301411
 ]

ASF GitHub Bot logged work on BEAM-5663:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:29
Start Date: 26/Aug/19 18:29
Worklog Time Spent: 10m 
  Work Description: tvalentyn commented on issue #7988: [BEAM-5663] Add 
Python 3.6 tox environment
URL: https://github.com/apache/beam/pull/7988#issuecomment-524974148
 
 
   It's an oversight, thanks for calling out, @udim. Opened 
https://issues.apache.org/jira/browse/BEAM-8093
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301411)
Time Spent: 5h 50m  (was: 5h 40m)

> Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)
> 
>
> Key: BEAM-5663
> URL: https://issues.apache.org/jira/browse/BEAM-5663
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Manu Zhang
>Assignee: Robbe
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test 
> failings across various Python 3 versions. It will be valuable to add tox 
> suites for Python 3.4, 3.5, 3.6 and 3.7



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (BEAM-8093) py36-gcp is missing from the current tox.ini

2019-08-26 Thread Valentyn Tymofieiev (Jira)
Valentyn Tymofieiev created BEAM-8093:
-

 Summary: py36-gcp is missing from the current tox.ini
 Key: BEAM-8093
 URL: https://issues.apache.org/jira/browse/BEAM-8093
 Project: Beam
  Issue Type: Sub-task
  Components: testing
Affects Versions: 2.16.0
Reporter: Valentyn Tymofieiev


cc: [~udim]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Harshit Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916037#comment-16916037
 ] 

Harshit Dwivedi commented on BEAM-8089:
---

For my use-case, I have a single worker and since that runs on a single VM, 
would it be possible to implement this?

I have disable autoscaling for my use case and hence the Dataflow job will 
always run on a single VM.

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301406
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:24
Start Date: 26/Aug/19 18:24
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317731169
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301405
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:24
Start Date: 26/Aug/19 18:24
Worklog Time Spent: 10m 
  Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#discussion_r317730080
 
 

 ##
 File path: 
sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java
 ##
 @@ -49,26 +57,74 @@
 @RunWith(JUnit4.class)
 public class BigQueryHllSketchCompatibilityIT {
 
-  private static final String DATASET_NAME = "zetasketch_compatibility_test";
+  private static final String APP_NAME;
+  private static final String PROJECT_ID;
+  private static final String DATASET_ID;
 
-  // Table for testReadSketchFromBigQuery()
+  private static final List TEST_DATA =
+  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+
+  // Data Table: used by testReadSketchFromBigQuery())
   // Schema: only one STRING field named "data".
   // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange"
-  private static final String DATA_TABLE_NAME = "hll_data";
+  private static final String DATA_TABLE_ID = "hll_data";
   private static final String DATA_FIELD_NAME = "data";
+  private static final String DATA_FIELD_TYPE = "STRING";
   private static final String QUERY_RESULT_FIELD_NAME = "sketch";
   private static final Long EXPECTED_COUNT = 3L;
 
-  // Table for testWriteSketchToBigQuery()
+  // Sketch Table: used by testWriteSketchToBigQuery()
   // Schema: only one BYTES field named "sketch".
   // Content: will be overridden by the sketch computed by the test pipeline 
each time the test runs
-  private static final String SKETCH_TABLE_NAME = "hll_sketch";
+  private static final String SKETCH_TABLE_ID = "hll_sketch";
   private static final String SKETCH_FIELD_NAME = "sketch";
-  private static final List TEST_DATA =
-  Arrays.asList("Apple", "Orange", "Banana", "Orange");
+  private static final String SKETCH_FIELD_TYPE = "BYTES";
   // SHA-1 hash of string "[3]", the string representation of a row that has 
only one field 3 in it
   private static final String EXPECTED_CHECKSUM = 
"f1e31df9806ce94c5bdbbfff9608324930f4d3f1";
 
+  static {
+ApplicationNameOptions options =
+TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class);
+APP_NAME = options.getAppName();
+PROJECT_ID = options.as(GcpOptions.class).getProject();
+DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301403
 ]

ASF GitHub Bot logged work on BEAM-7013:


Author: ASF GitHub Bot
Created on: 26/Aug/19 18:21
Start Date: 26/Aug/19 18:21
Worklog Time Spent: 10m 
  Work Description: robinyqiu commented on issue #9144: [BEAM-7013] 
Integrating ZetaSketch's HLL++ algorithm with Beam
URL: https://github.com/apache/beam/pull/9144#issuecomment-524971095
 
 
   Run Java PostCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301403)
Time Spent: 27h 20m  (was: 27h 10m)

> A new count distinct transform based on BigQuery compatible HyperLogLog++ 
> implementation
> 
>
> Key: BEAM-7013
> URL: https://issues.apache.org/jira/browse/BEAM-7013
> Project: Beam
>  Issue Type: New Feature
>  Components: extensions-java-sketching, sdk-java-core
>Reporter: Yueyang Qiu
>Assignee: Yueyang Qiu
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 27h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Chamikara Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916033#comment-16916033
 ] 

Chamikara Jayalath commented on BEAM-8089:
--

True, seems like this is supported in a limited way (wildcards not supported 
for example).

 

I think Beam will have a hard time supporting this since most Beam runners are 
distributed and use multiple nodes to write data (to files) in parallel. So 
there's no "single" local disk. This is why we use a distributed storage 
location to which all workers have access to write individual files (a 
directory in GCS in this case) and execute a single BQ load job for all files 
from there.

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Harshit Dwivedi (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916022#comment-16916022
 ] 

Harshit Dwivedi commented on BEAM-8089:
---

But the BQ documentation says that this can be done, 

[https://cloud.google.com/bigquery/docs/loading-data-local]

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301395
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:59
Start Date: 26/Aug/19 17:59
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #9418: 
[BEAM-5428][WIP] State caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#discussion_r317723117
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/statecache.py
 ##
 @@ -0,0 +1,103 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""TODO mxm"""
+import collections
+from threading import Lock
+
+# (cache_token, value)
+
+
+class StateCache(object):
+
+  def __init__(self, max_entries):
+self._cache = LRUCache(max_entries)
+self._lock = Lock()
+
+  def get(self, state_key, cache_tokens):
+#assert isinstance(state_key, StateKey.__class__)
+with self._lock:
+  cache_entry = self._cache.get(state_key)
+if cache_entry:
+  token, value = cache_entry
+  return value if token in cache_tokens else None
+else:
+  return None
+
+  def put(self, state_key, cache_token, value):
+#assert isinstance(state_key, StateKey.__class__)
+with self._lock:
+  return self._cache.put(state_key, (cache_token, value))
+
+  def clear(self, state_key):
+#assert isinstance(state_key, StateKey.__class__)
+with self._lock:
+  self._cache.clear(state_key)
+
+  def __len__(self):
+return len(self._cache)
+
+
+class LRUCache(object):
+
+  def __init__(self, max_entries):
+self._maxEntries = max_entries
+self._cache = collections.OrderedDict()
+
+  def get(self, key):
+value = self._cache.pop(key, None)
+if value:
+  self._cache[key] = value
+return value
+
+  def put(self, key, value):
+while len(self._cache) >= self._maxEntries:
+  self._cache.popitem(last=False)
+self._cache[key] = value
+
+  def clear(self, key):
+self._cache.pop(key, None)
+
+  def __len__(self):
+return len(self._cache)
+
+
+class CacheStateKey(object):
+
+  def __init__(self, transform_id, state_id, window, key):
+self.transform_id = transform_id
+self.state_id = state_id
+self.window = window
+self.key = key
+
+  def __eq__(self, other):
+return (isinstance(other, self.__class__) and
 
 Review comment:
   `state_key.SerializeToString()` can give you an inconsistent result on 
Python3. So I would avoid this method.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301395)
Time Spent: 5h  (was: 4h 50m)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=301394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301394
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:58
Start Date: 26/Aug/19 17:58
Worklog Time Spent: 10m 
  Work Description: robertwb commented on issue #9351: [BEAM-7909] support 
customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-524962193
 
 
   Would this PR better be named "Python 3 containers" or similar? I don't see 
customized container specific code here. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301394)
Time Spent: 8h  (was: 7h 50m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301393
 ]

ASF GitHub Bot logged work on BEAM-5428:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:57
Start Date: 26/Aug/19 17:57
Worklog Time Spent: 10m 
  Work Description: rakeshcusat commented on pull request #9418: 
[BEAM-5428][WIP] State caching in the Python SDK
URL: https://github.com/apache/beam/pull/9418#discussion_r317722482
 
 

 ##
 File path: sdks/python/apache_beam/runners/worker/sdk_worker.py
 ##
 @@ -464,6 +469,7 @@ def __init__(self, credentials=None):
 self._lock = threading.Lock()
 self._throwing_state_handler = ThrowingStateHandler()
 self._credentials = credentials
+self.state_cache = StateCache(100)
 
 Review comment:
   sounds good.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301393)
Time Spent: 4h 50m  (was: 4h 40m)

> Implement cross-bundle state caching.
> -
>
> Key: BEAM-5428
> URL: https://issues.apache.org/jira/browse/BEAM-5428
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-py-harness
>Reporter: Robert Bradshaw
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Tech spec: 
> [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m]
> Relevant document: 
> [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit]
> Mailing list link: 
> [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-26 Thread Gleb Kanterov (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gleb Kanterov closed BEAM-8080.
---
Resolution: Fixed

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301380
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:42
Start Date: 26/Aug/19 17:42
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-524956129
 
 
   Run Seed Job
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301380)
Time Spent: 1h 20m  (was: 1h 10m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301379
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:42
Start Date: 26/Aug/19 17:42
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move 
release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#issuecomment-524955951
 
 
   release_build job took longer than Jenkins timeout. So we may want to remove 
`--no-parallel` to make the build faster. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301379)
Time Spent: 1h 10m  (was: 1h)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow

2019-08-26 Thread Chamikara Jayalath (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915992#comment-16915992
 ] 

Chamikara Jayalath commented on BEAM-8089:
--

BQ cannot execute load jobs from local files. Files have to be in GCS.

 

So I think this is working as intended.

> Error while using customGcsTempLocation() with Dataflow
> ---
>
> Key: BEAM-8089
> URL: https://issues.apache.org/jira/browse/BEAM-8089
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-gcp
>Affects Versions: 2.13.0
>Reporter: Harshit Dwivedi
>Assignee: Chamikara Jayalath
>Priority: Major
>
> I have the following code snippet which writes content to BigQuery via File 
> Loads.
> Currently the files are being written to a GCS Bucket, but I want to write 
> them to the local file storage of Dataflow instead and want BigQuery to load 
> data from there.
>  
>  
>  
> {code:java}
> BigQueryIO
>  .writeTableRows()
>  .withNumFileShards(100)
>  .withTriggeringFrequency(Duration.standardSeconds(90))
>  .withMethod(BigQueryIO.Write.Method.FILE_LOADS)
>  .withSchema(getSchema())
>  .withoutValidation()
>  .withCustomGcsTempLocation(new ValueProvider() {
>     @Override
>     public String get(){
>          return "/home/harshit/testFiles";     
> }
>     @Override
>     public boolean isAccessible(){
>          return true;     
> }})
>  .withTimePartitioning(new TimePartitioning().setType("DAY"))
>  .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
>  .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
>  .to(tableName));
> {code}
>  
>  
> On running this, I don't see any files being written to the provided path and 
> the BQ load jobs fail with an IOException.
>  
> I looked at the docs, but I was unable to find any working example for this.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-7713) Migrate to "typing" module typing types in Beam typehints (on Py2 and Py3).

2019-08-26 Thread Valentyn Tymofieiev (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915991#comment-16915991
 ] 

Valentyn Tymofieiev commented on BEAM-7713:
---

Hey [~udim], could you please check what's the status of this issue? Thank you!

> Migrate to "typing" module typing types in Beam typehints (on Py2 and Py3).
> ---
>
> Key: BEAM-7713
> URL: https://issues.apache.org/jira/browse/BEAM-7713
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Valentyn Tymofieiev
>Assignee: Udi Meiri
>Priority: Major
> Fix For: 2.16.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails

2019-08-26 Thread Rui Wang (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915989#comment-16915989
 ] 

Rui Wang commented on BEAM-8042:


I see. It's triggered on the empty input table.

> Parsing of aggregate query fails
> 
>
> Key: BEAM-8042
> URL: https://issues.apache.org/jira/browse/BEAM-8042
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Critical
>
> {code}
>   @Rule
>   public TestPipeline pipeline = 
> TestPipeline.fromOptions(createPipelineOptions());
>   private static PipelineOptions createPipelineOptions() {
> BeamSqlPipelineOptions opts = 
> PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class);
> opts.setPlannerName(ZetaSQLQueryPlanner.class.getName());
> return opts;
>   }
>   @Test
>   public void testAggregate() {
> Schema inputSchema = Schema.builder()
> .addByteArrayField("id")
> .addInt64Field("has_f1")
> .addInt64Field("has_f2")
> .addInt64Field("has_f3")
> .addInt64Field("has_f4")
> .addInt64Field("has_f5")
> .addInt64Field("has_f6")
> .build();
> String sql = "SELECT \n" +
> "  id, \n" +
> "  COUNT(*) as count, \n" +
> "  SUM(has_f1) as f1_count, \n" +
> "  SUM(has_f2) as f2_count, \n" +
> "  SUM(has_f3) as f3_count, \n" +
> "  SUM(has_f4) as f4_count, \n" +
> "  SUM(has_f5) as f5_count, \n" +
> "  SUM(has_f6) as f6_count  \n" +
> "FROM PCOLLECTION \n" +
> "GROUP BY id";
> pipeline
> .apply(Create.empty(inputSchema))
> .apply(SqlTransform.query(sql));
> pipeline.run();
>   }
> {code}
> {code}
> Caused by: java.lang.RuntimeException: Error while applying rule 
> AggregateProjectMergeRule, args 
> [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)),
>  
> rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)]
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104)
>   at 
>   ... 39 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
>   at 
> org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73)
>   at 
> org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205)
>   ... 48 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301372
 ]

ASF GitHub Bot logged work on BEAM-8079:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:29
Start Date: 26/Aug/19 17:29
Worklog Time Spent: 10m 
  Work Description: markflyhigh commented on pull request #9411: 
[BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1)
URL: https://github.com/apache/beam/pull/9411#discussion_r317710049
 
 

 ##
 File path: .test-infra/jenkins/job_Release_Gradle_Build.groovy
 ##
 @@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import CommonJobProperties as commonJobProperties
+
+job('beam_Release_Gradle_Build') {
 
 Review comment:
   Would add a comment here. As for release guide, I'd love to update it with 
some changes to `verify_release_branch.sh`. That will be the second part right 
following this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301372)
Time Spent: 1h  (was: 50m)

> Move verify_release_build.sh to Jenkins job
> ---
>
> Key: BEAM-8079
> URL: https://issues.apache.org/jira/browse/BEAM-8079
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Mark Liu
>Assignee: Mark Liu
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> verify_release_build.sh is used for validation after release branch is cut. 
> Basically it does two things: 1. verify Gradle build with -PisRelease turned 
> on. 2. create a PR and run all PostCommit jobs against release branch. 
> However, release manager got many painpoints when running this script:
> 1. A lot of environment setup and some of tooling install easily broke the 
> script.
> 2. Running Gradle build locally too extremely long time.
> 3. Auto-pr-creation (use hub) doesn't work.
> We can move Gradle build to Jenkins in order to get rid of environment setup 
> work.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-5663) Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5663?focusedWorklogId=301366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301366
 ]

ASF GitHub Bot logged work on BEAM-5663:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:16
Start Date: 26/Aug/19 17:16
Worklog Time Spent: 10m 
  Work Description: udim commented on issue #7988: [BEAM-5663] Add Python 
3.6 tox environment
URL: https://github.com/apache/beam/pull/7988#issuecomment-524946573
 
 
   I'm going over tox.ini for a nose->pytest migration. I see that `py36-gcp` 
is missing from the current tox.ini. Is that an oversight or is there an open 
issue?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301366)
Time Spent: 5.5h  (was: 5h 20m)

> Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)
> 
>
> Key: BEAM-5663
> URL: https://issues.apache.org/jira/browse/BEAM-5663
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-py-core
>Reporter: Manu Zhang
>Assignee: Robbe
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test 
> failings across various Python 3 versions. It will be valuable to add tox 
> suites for Python 3.4, 3.5, 3.6 and 3.7



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=301355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301355
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:08
Start Date: 26/Aug/19 17:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067#discussion_r317697628
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java
 ##
 @@ -166,9 +172,62 @@ public WatermarkHoldState bindWatermark(
 }
   }
 
-  /** An {@link InMemoryState} implementation of {@link ValueState}. */
+
+  /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. 
*/
+  public static final class InMemoryReadModifyWrite
+  implements ReadModifyWriteState, 
InMemoryState> {
+private final Coder coder;
+
+private boolean isCleared = true;
+private @Nullable T value = null;
+
+public InMemoryReadModifyWrite(Coder coder) {
+  this.coder = coder;
+}
+
+@Override
+public void clear() {
+  // Even though we're clearing we can't remove this from the in-memory 
state map, since
+  // other users may already have a handle on this Value.
+  value = null;
+  isCleared = true;
+}
+
+@Override
+public InMemoryReadModifyWrite readLater() {
+  return this;
+}
+
+@Override
+public T read() {
+  return value;
+}
+
+@Override
+public void write(T input) {
+  isCleared = false;
+  this.value = input;
+}
+
+@Override
+public InMemoryReadModifyWrite copy() {
+  InMemoryReadModifyWrite that = new InMemoryReadModifyWrite<>(coder);
+  if (!this.isCleared) {
+that.isCleared = this.isCleared;
+that.value = uncheckedClone(coder, this.value);
+  }
+  return that;
+}
+
+@Override
+public boolean isCleared() {
+  return isCleared;
+}
+  }
+
+  /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. 
*/
   public static final class InMemoryValue
-  implements ValueState, InMemoryState> {
+  implements ValueState, InMemoryState> {
 
 Review comment:
   Seems like this could be mostly merged with InMemoryReadModifyWrite 
(possibly requiring a common baseclass). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301355)
Time Spent: 3h 10m  (was: 3h)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=301357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301357
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:08
Start Date: 26/Aug/19 17:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067#discussion_r317698856
 
 

 ##
 File path: 
runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java
 ##
 @@ -166,9 +172,62 @@ public WatermarkHoldState bindWatermark(
 }
   }
 
-  /** An {@link InMemoryState} implementation of {@link ValueState}. */
+
+  /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. 
*/
+  public static final class InMemoryReadModifyWrite
+  implements ReadModifyWriteState, 
InMemoryState> {
+private final Coder coder;
+
+private boolean isCleared = true;
+private @Nullable T value = null;
+
+public InMemoryReadModifyWrite(Coder coder) {
+  this.coder = coder;
+}
+
+@Override
+public void clear() {
+  // Even though we're clearing we can't remove this from the in-memory 
state map, since
+  // other users may already have a handle on this Value.
+  value = null;
+  isCleared = true;
+}
+
+@Override
+public InMemoryReadModifyWrite readLater() {
+  return this;
+}
+
+@Override
+public T read() {
+  return value;
+}
+
+@Override
+public void write(T input) {
+  isCleared = false;
+  this.value = input;
+}
+
+@Override
+public InMemoryReadModifyWrite copy() {
+  InMemoryReadModifyWrite that = new InMemoryReadModifyWrite<>(coder);
+  if (!this.isCleared) {
+that.isCleared = this.isCleared;
+that.value = uncheckedClone(coder, this.value);
+  }
+  return that;
+}
+
+@Override
+public boolean isCleared() {
+  return isCleared;
+}
+  }
+
+  /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. 
*/
   public static final class InMemoryValue
-  implements ValueState, InMemoryState> {
+  implements ValueState, InMemoryState> {
 
 Review comment:
   Actually, I wonder if InMemoryReadModifyWrite was a subclass of 
InMemoryValue one could get rid of some of the parallel code here (e.g. 
bindValue vs. bindReadModifyWrite). Haven't pursued this to its conclusion 
though. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301357)
Time Spent: 3h 20m  (was: 3h 10m)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=301356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301356
 ]

ASF GitHub Bot logged work on BEAM-7739:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:08
Start Date: 26/Aug/19 17:08
Worklog Time Spent: 10m 
  Work Description: robertwb commented on pull request #9067: [BEAM-7739] 
Implement ReadModifyWriteState in Python SDK
URL: https://github.com/apache/beam/pull/9067#discussion_r317699641
 
 

 ##
 File path: 
runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkBroadcastStateInternals.java
 ##
 @@ -273,14 +280,73 @@ void clearInternal() {
   }
 
   private class FlinkBroadcastValueState extends AbstractBroadcastState
-  implements ValueState {
+  implements ValueState {
 
 Review comment:
   Similarly, perhaps ReadModifyWriteState could extend ValueState?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301356)
Time Spent: 3h 10m  (was: 3h)

> Add ValueState in Python sdk
> 
>
> Key: BEAM-7739
> URL: https://issues.apache.org/jira/browse/BEAM-7739
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-py-core
>Reporter: Rakesh Kumar
>Assignee: Rakesh Kumar
>Priority: Major
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> Currently ValueState is missing from Python Sdks but it is existing in Java 
> sdks. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=301350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301350
 ]

ASF GitHub Bot logged work on BEAM-8080:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:01
Start Date: 26/Aug/19 17:01
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on pull request #9414: [BEAM-8080] 
[SQL] Fix relocation of com.google.types
URL: https://github.com/apache/beam/pull/9414
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301350)
Time Spent: 1.5h  (was: 1h 20m)

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=301349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301349
 ]

ASF GitHub Bot logged work on BEAM-8080:


Author: ASF GitHub Bot
Created on: 26/Aug/19 17:01
Start Date: 26/Aug/19 17:01
Worklog Time Spent: 10m 
  Work Description: amaliujia commented on issue #9414: [BEAM-8080] [SQL] 
Fix relocation of com.google.types
URL: https://github.com/apache/beam/pull/9414#issuecomment-524941037
 
 
   After we have vendor calcite in place, Beam ZetaSQL should be extracted into 
its own module.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301349)
Time Spent: 1h 20m  (was: 1h 10m)

> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
> ---
>
> Key: BEAM-8080
> URL: https://issues.apache.org/jira/browse/BEAM-8080
> Project: Beam
>  Issue Type: Sub-task
>  Components: dsl-sql-zetasql
>Reporter: Gleb Kanterov
>Assignee: Gleb Kanterov
>Priority: Critical
> Fix For: 2.16.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> {code}
> java.lang.NoClassDefFoundError: 
> org/apache/beam/repackaged/sql/com/google/type/Date
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380)
>   at 
> org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87)
>   at 
> org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66)
>   at 
> org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=301348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301348
 ]

ASF GitHub Bot logged work on BEAM-7909:


Author: ASF GitHub Bot
Created on: 26/Aug/19 16:58
Start Date: 26/Aug/19 16:58
Worklog Time Spent: 10m 
  Work Description: Hannah-Jiang commented on issue #9351: [BEAM-7909] 
support customized container for Python
URL: https://github.com/apache/beam/pull/9351#issuecomment-524939709
 
 
   Run Portable_Python PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301348)
Time Spent: 7h 50m  (was: 7h 40m)

> Write integration tests to test customized containers
> -
>
> Key: BEAM-7909
> URL: https://issues.apache.org/jira/browse/BEAM-7909
> Project: Beam
>  Issue Type: Sub-task
>  Components: build-system
>Reporter: Hannah Jiang
>Assignee: Hannah Jiang
>Priority: Major
>  Time Spent: 7h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-08-26 Thread Pablo Estrada (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915944#comment-16915944
 ] 

Pablo Estrada commented on BEAM-5495:
-

This has not been addressed I guess, right? 

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Priority: Major
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7730?focusedWorklogId=301323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301323
 ]

ASF GitHub Bot logged work on BEAM-7730:


Author: ASF GitHub Bot
Created on: 26/Aug/19 16:11
Start Date: 26/Aug/19 16:11
Worklog Time Spent: 10m 
  Work Description: dmvk commented on issue #9296: WIP: [BEAM-7730] 
Introduce Flink 1.9 Runner
URL: https://github.com/apache/beam/pull/9296#issuecomment-524922978
 
 
   Run Java PreCommit
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301323)
Time Spent: 0.5h  (was: 20m)

> Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
> -
>
> Key: BEAM-7730
> URL: https://issues.apache.org/jira/browse/BEAM-7730
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink
>Reporter: sunjincheng
>Assignee: David Moravek
>Priority: Major
> Fix For: 2.16.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target 
> and make Flink Runner compatible with Flink 1.9.
> I will add the brief changes after the Flink 1.9.0 released. 
> And I appreciate it if you can leave your suggestions or comments!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-6730) Enable configuration of Java transforms (specifically IO) through other SDKs

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6730?focusedWorklogId=301318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301318
 ]

ASF GitHub Bot logged work on BEAM-6730:


Author: ASF GitHub Bot
Created on: 26/Aug/19 16:00
Start Date: 26/Aug/19 16:00
Worklog Time Spent: 10m 
  Work Description: manuelaguilar commented on issue #7875: [BEAM-6730] 
Support configuring transforms externally in Java SDK / Expose Java's 
GenerateSequence in Python
URL: https://github.com/apache/beam/pull/7875#issuecomment-524919097
 
 
   @mxm I'm looking to add a windowed write transform from a Python pipeline. 
Would registering TextIO as an external transform and wrap the specific 
withWindowedWrites option in Python be recommendable?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301318)
Time Spent: 12h  (was: 11h 50m)

> Enable configuration of Java transforms (specifically IO) through other SDKs
> 
>
> Key: BEAM-6730
> URL: https://issues.apache.org/jira/browse/BEAM-6730
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-flink, sdk-java-core, sdk-py-core
>Reporter: Maximilian Michels
>Assignee: Maximilian Michels
>Priority: Major
> Fix For: 2.12.0
>
>  Time Spent: 12h
>  Remaining Estimate: 0h
>
> Since https://github.com/apache/beam/pull/7316 we can reference external 
> transforms which are transforms only available in a "foreign" SDKs. This 
> allows us to fill the gap in terms of missing transforms in the Python and Go 
> SDK, specifically IO transforms.
> We can start collecting/exposing transforms that Beam users need. The 
> following transforms could be interesting:
> - KafkaIO / KinesisIO
> - CassandraIO / ElasticserchIO / Hbase / Redis
> - JDBC
> - S3 file system
> - GenerateSequence
> See also https://s.apache.org/beam-cross-language-io and BEAM-6485.
> CC [~robertwb] [~chamikara] [~thw]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7978) ArithmeticExceptions on getting backlog bytes

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7978?focusedWorklogId=301299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301299
 ]

ASF GitHub Bot logged work on BEAM-7978:


Author: ASF GitHub Bot
Created on: 26/Aug/19 15:52
Start Date: 26/Aug/19 15:52
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on issue #9432: [BEAM-7978] 
Return BACKLOG_UNKNOWN in case of unknown watermark
URL: https://github.com/apache/beam/pull/9432#issuecomment-524915707
 
 
   @chamikaramj Could you take a look, pls? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301299)
Time Spent: 20m  (was: 10m)

> ArithmeticExceptions on getting backlog bytes 
> --
>
> Key: BEAM-7978
> URL: https://issues.apache.org/jira/browse/BEAM-7978
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.14.0
>Reporter: Mateusz
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hello,
> Beam 2.14.0
>  (and to be more precise 
> [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec])
>  introduced a change in watermark calculation in Kinesis IO causing below 
> error:
> {code:java}
> exception:  "java.lang.RuntimeException: Unknown kinesis failure, when trying 
> to reach kinesis
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155)
>   at 
> org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArithmeticException: Value cannot fit in an int: 
> 153748963401
>   at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229)
>   at 
> org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141)
>   at 
> org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72)
>   at org.joda.time.Minutes.minutesBetween(Minutes.java:101)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210)
>   ... 10 more
> {code}
> We spotted this issue on Dataflow runner. It's problematic as inability to 
> get backlog bytes seems to result in constant recreation of KinesisReader.
> The issue happens if the backlog bytes are retrieved before watermark value 
> is updated from initial default value. Easy way to reproduce it is to create 
> a pipeline with Kinesis source for a stream where no records are being put. 
> While debugging it locally, you can observe that the watermark is set to the 
> value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes 
> (default watermark idle duration threshold is set to 2 minutes) , the 
> watermark is set to value of 
> [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]),
>  so the next backlog bytes retrieval should be correct. However, as described 
> before, running the pipeline on Dataflow runner results in KinesisReader 
> being closed just after creation, so the watermark won't be fixed.
> The reason of the issue is following: The introduced watermark policies are 
> 

[jira] [Commented] (BEAM-7978) ArithmeticExceptions on getting backlog bytes

2019-08-26 Thread Alexey Romanenko (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915893#comment-16915893
 ] 

Alexey Romanenko commented on BEAM-7978:


[~Juraszek] Sure, I created a PR for that

> ArithmeticExceptions on getting backlog bytes 
> --
>
> Key: BEAM-7978
> URL: https://issues.apache.org/jira/browse/BEAM-7978
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kinesis
>Affects Versions: 2.14.0
>Reporter: Mateusz
>Assignee: Alexey Romanenko
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
> Beam 2.14.0
>  (and to be more precise 
> [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec])
>  introduced a change in watermark calculation in Kinesis IO causing below 
> error:
> {code:java}
> exception:  "java.lang.RuntimeException: Unknown kinesis failure, when trying 
> to reach kinesis
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155)
>   at 
> org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149)
>   at 
> org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArithmeticException: Value cannot fit in an int: 
> 153748963401
>   at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229)
>   at 
> org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141)
>   at 
> org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72)
>   at org.joda.time.Minutes.minutesBetween(Minutes.java:101)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169)
>   at 
> org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210)
>   ... 10 more
> {code}
> We spotted this issue on Dataflow runner. It's problematic as inability to 
> get backlog bytes seems to result in constant recreation of KinesisReader.
> The issue happens if the backlog bytes are retrieved before watermark value 
> is updated from initial default value. Easy way to reproduce it is to create 
> a pipeline with Kinesis source for a stream where no records are being put. 
> While debugging it locally, you can observe that the watermark is set to the 
> value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes 
> (default watermark idle duration threshold is set to 2 minutes) , the 
> watermark is set to value of 
> [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]),
>  so the next backlog bytes retrieval should be correct. However, as described 
> before, running the pipeline on Dataflow runner results in KinesisReader 
> being closed just after creation, so the watermark won't be fixed.
> The reason of the issue is following: The introduced watermark policies are 
> relying on 
> [WatermarkParameters|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkParameters.java]
>  which initialises currentWatermark and eventTime to 
> [BoundedWindow.TIMESTAMP_MIN_VALUE|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ecR52].
>  This result in watermark being set to new Instant(-9223372036854775L) at the 
> KinesisReader creation. Calculated [period between the watermark and the 
> current 
> timestamp|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L169]
>  is bigger than 

[jira] [Work logged] (BEAM-7978) ArithmeticExceptions on getting backlog bytes

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7978?focusedWorklogId=301293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301293
 ]

ASF GitHub Bot logged work on BEAM-7978:


Author: ASF GitHub Bot
Created on: 26/Aug/19 15:35
Start Date: 26/Aug/19 15:35
Worklog Time Spent: 10m 
  Work Description: aromanenko-dev commented on pull request #9432: 
[BEAM-7978] Return BACKLOG_UNKNOWN in case of unknown watermark
URL: https://github.com/apache/beam/pull/9432
 
 
   
   
   Thank you for your contribution! Follow this checklist to help us 
incorporate your contribution quickly and easily:
   
- [x] [**Choose 
reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and 
mention them in a comment (`R: @username`).
- [x] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue, if applicable. This will automatically link the pull request to the 
issue.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   Post-Commit Tests Status (on master branch)
   

   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/)
 | --- | --- | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/)
 | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)
   Python | [![Build 
Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
 
Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/)
 | --- | 

[jira] [Work logged] (BEAM-7872) Simpler Flink cluster set up in load tests

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7872?focusedWorklogId=301289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301289
 ]

ASF GitHub Bot logged work on BEAM-7872:


Author: ASF GitHub Bot
Created on: 26/Aug/19 15:22
Start Date: 26/Aug/19 15:22
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #9213: [BEAM-7872] 
Simpler Flink cluster set up in load tests
URL: https://github.com/apache/beam/pull/9213#discussion_r317655439
 
 

 ##
 File path: .test-infra/jenkins/Portability.groovy
 ##
 @@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Provides properties for running job under the portability framework.
+ */
+class Portability {
+  static String beamRepository = 'gcr.io/apache-beam-testing/beam_portability'
 
 Review comment:
   Can we move this to `loadTestBuilder`? instead? Other than that, please use 
UPPERCASE and add final keyword (constant). 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301289)
Time Spent: 13h 50m  (was: 13h 40m)

> Simpler Flink cluster set up in load tests
> --
>
> Key: BEAM-7872
> URL: https://issues.apache.org/jira/browse/BEAM-7872
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Creating a new load test running on Flink runner could be easier by providing 
> a single `setUp` function which would encapsulate the process of creating 
> Flink cluster and registering teardown steps



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Work logged] (BEAM-7872) Simpler Flink cluster set up in load tests

2019-08-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7872?focusedWorklogId=301290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301290
 ]

ASF GitHub Bot logged work on BEAM-7872:


Author: ASF GitHub Bot
Created on: 26/Aug/19 15:22
Start Date: 26/Aug/19 15:22
Worklog Time Spent: 10m 
  Work Description: lgajowy commented on pull request #9213: [BEAM-7872] 
Simpler Flink cluster set up in load tests
URL: https://github.com/apache/beam/pull/9213#discussion_r317653246
 
 

 ##
 File path: .test-infra/jenkins/Portability.groovy
 ##
 @@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Provides properties for running job under the portability framework.
+ */
+class Portability {
+  static String beamRepository = 'gcr.io/apache-beam-testing/beam_portability'
+  static String flinkVersion = '1.7'
 
 Review comment:
   This constant is used only in :docker task name - I think it should be 
inlined in the task name rather than exported to a class with cosntants only
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 301290)
Time Spent: 13h 50m  (was: 13h 40m)

> Simpler Flink cluster set up in load tests
> --
>
> Key: BEAM-7872
> URL: https://issues.apache.org/jira/browse/BEAM-7872
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Creating a new load test running on Flink runner could be easier by providing 
> a single `setUp` function which would encapsulate the process of creating 
> Flink cluster and registering teardown steps



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


  1   2   >