[jira] [Created] (BEAM-8098) BigQueryIO needs documentation on how data types in BigQuery and in Beam SDK correspond
Yueyang Qiu created BEAM-8098: - Summary: BigQueryIO needs documentation on how data types in BigQuery and in Beam SDK correspond Key: BEAM-8098 URL: https://issues.apache.org/jira/browse/BEAM-8098 Project: Beam Issue Type: Improvement Components: io-java-gcp Reporter: Yueyang Qiu Assignee: Yueyang Qiu While working on [https://github.com/apache/beam/pull/9144], I realized there is a gap in BigQueryIO documentation on mapping between data types defined in BigQuery and in Beam SDK. For example, if a user reads a BYTES field from BigQuery into Beam, it will be represented as java.nio.ByteBuffer type in Beam Java SDK. The user will need to do an explicit type cast to ByteBuffer in order to use the data, but there is no easy way the user can know which type they should cast to, unless digging into BigQueryIO's implementation (Java - Avro - BigQuery). -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8097) Update the release guide
[ https://issues.apache.org/jira/browse/BEAM-8097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yifan zou updated BEAM-8097: Description: The release guide was modified based on the 2.14 release experience ([https://github.com/apache/beam/pull/9319]). But, it is reverted since we don't want separate the guide in multiple sections ([https://github.com/apache/beam/pull/9436]). Please review the reverted guide and update the current guide with the up-to-date information. (was: The release guide was modified based on the 2.14 release experience. But, it is reverted since we don't want separate the guide in multiple sections. Please review the reverted guide and update the current guide with the up-to-date information.) > Update the release guide > > > Key: BEAM-8097 > URL: https://issues.apache.org/jira/browse/BEAM-8097 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: yifan zou >Assignee: yifan zou >Priority: Major > > The release guide was modified based on the 2.14 release experience > ([https://github.com/apache/beam/pull/9319]). But, it is reverted since we > don't want separate the guide in multiple sections > ([https://github.com/apache/beam/pull/9436]). Please review the reverted > guide and update the current guide with the up-to-date information. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8097) Update the release guide
yifan zou created BEAM-8097: --- Summary: Update the release guide Key: BEAM-8097 URL: https://issues.apache.org/jira/browse/BEAM-8097 Project: Beam Issue Type: Improvement Components: website Reporter: yifan zou Assignee: yifan zou The release guide was modified based on the 2.14 release experience. But, it is reverted since we don't want separate the guide in multiple sections. Please review the reverted guide and update the current guide with the up-to-date information. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7966) Write portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301702=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301702 ] ASF GitHub Bot logged work on BEAM-7966: Author: ASF GitHub Bot Created on: 27/Aug/19 04:50 Start Date: 27/Aug/19 04:50 Worklog Time Spent: 10m Work Description: tweise commented on issue #9331: [BEAM-7966] Write portable Flink application jar URL: https://github.com/apache/beam/pull/9331#issuecomment-525135984 Thanks for working on this! I think this may work very well with the k8s deployment plans. Related to prior discussion on the list: The jar file in its current form materializes the pipeline configuration, i.e. any user option that influences how transforms are configured or even the changes the shape of the pipeline. While the latter is probably a smaller percentage of use cases, the former isn't uncommon to adopt to different environments (dev/staging/production). For example, the database connection URL may be a parameter sourced from an environment variable or provided from the command line. As it stands, to have an environment specific configuration, the user would have to build the jar file for each target environment, which isn't intuitive. The entry point to Flink jobs is always configured with optional arguments. Would it be possible to support a token substitution mechanism so that such arguments provided to the Flink CLI or the k8s operator or whatever the tool may be can be injected into the proto? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301702) Time Spent: 0.5h (was: 20m) > Write portable Flink application jar > > > Key: BEAM-7966 > URL: https://issues.apache.org/jira/browse/BEAM-7966 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 0.5h > Remaining Estimate: 0h > > *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]* -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8096) Allow runner to configure "subnetwork"
[ https://issues.apache.org/jira/browse/BEAM-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916307#comment-16916307 ] Jack Whelpton commented on BEAM-8096: - I suspect the change may be as simple as this: [https://github.com/jackwhelpton/beam/commit/d1711f43e685c0f41b366437eb28cf5a25c436f0] would this suffice for a PR? There don't appear to be any obvious tests around this area that I could build on, but if anybody can point me in the right direction that'd be great. > Allow runner to configure "subnetwork" > -- > > Key: BEAM-8096 > URL: https://issues.apache.org/jira/browse/BEAM-8096 > Project: Beam > Issue Type: Improvement > Components: sdk-go >Affects Versions: 2.15.0 >Reporter: Jack Whelpton >Priority: Major > > When running a Dataflow job, the network can be specified using the --network > flag; however, there is no support for doing the same for the subnetwork. > This would be the go equivalent of the following Java code: > [https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineWorkerPoolOptions.html#getSubnetwork--|https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L151] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8096) Allow runner to configure "subnetwork"
Jack Whelpton created BEAM-8096: --- Summary: Allow runner to configure "subnetwork" Key: BEAM-8096 URL: https://issues.apache.org/jira/browse/BEAM-8096 Project: Beam Issue Type: Improvement Components: sdk-go Affects Versions: 2.15.0 Reporter: Jack Whelpton When running a Dataflow job, the network can be specified using the --network flag; however, there is no support for doing the same for the subnetwork. This would be the go equivalent of the following Java code: [https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/options/DataflowPipelineWorkerPoolOptions.html#getSubnetwork--|https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java#L151] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7021) ToString transform for Python SDK
[ https://issues.apache.org/jira/browse/BEAM-7021?focusedWorklogId=301654=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301654 ] ASF GitHub Bot logged work on BEAM-7021: Author: ASF GitHub Bot Created on: 27/Aug/19 02:15 Start Date: 27/Aug/19 02:15 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9437: [BEAM-7021] Removed unused arg from ToString.Element URL: https://github.com/apache/beam/pull/9437 Removed unused arg from ToString.Element cc: @mszb let me know if there is a reason to keep this argument. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301653=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301653 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 27/Aug/19 02:07 Start Date: 27/Aug/19 02:07 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9262: [BEAM-7389] Add code examples for Regex page URL: https://github.com/apache/beam/pull/9262#issuecomment-525104612 Is there a way to update the staged page? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301653) Time Spent: 50h 20m (was: 50h 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 50h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301652=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301652 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 27/Aug/19 02:05 Start Date: 27/Aug/19 02:05 Worklog Time Spent: 10m Work Description: aaltay commented on pull request #9435: [BEAM-7389] Update to use util.Regex transform URL: https://github.com/apache/beam/pull/9435 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301652) Time Spent: 50h 10m (was: 50h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 50h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301651=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301651 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 27/Aug/19 02:04 Start Date: 27/Aug/19 02:04 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9433: [BEAM-7389] Update to use util.ToString transform URL: https://github.com/apache/beam/pull/9433#issuecomment-525103876 LGTM, could you fix the test issues? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301651) Time Spent: 50h (was: 49h 50m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 50h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8094) Keep python library version synced between setup.py and base_image_requirements.txt
[ https://issues.apache.org/jira/browse/BEAM-8094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916271#comment-16916271 ] Ahmet Altay commented on BEAM-8094: --- /cc [~tvalentyn] > Keep python library version synced between setup.py and > base_image_requirements.txt > --- > > Key: BEAM-8094 > URL: https://issues.apache.org/jira/browse/BEAM-8094 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Hannah Jiang >Priority: Major > > There are some shared libraries in setup.py and base_image_requirements.txt. > Find a way to keep versions are syncs in both files and new libraries added > to setup.py are added to base_image_requirements.txt if it is commonly used. > Now it is manually synced. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301636=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301636 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 27/Aug/19 01:54 Start Date: 27/Aug/19 01:54 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#issuecomment-525101802 Thank you. I'll go ahead and merge. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301636) Time Spent: 2h 10m (was: 2h) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301637=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301637 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 27/Aug/19 01:54 Start Date: 27/Aug/19 01:54 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301637) Time Spent: 2h 20m (was: 2h 10m) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7984) [python] The coder returned for typehints.List should be IterableCoder
[ https://issues.apache.org/jira/browse/BEAM-7984?focusedWorklogId=301620=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301620 ] ASF GitHub Bot logged work on BEAM-7984: Author: ASF GitHub Bot Created on: 27/Aug/19 01:36 Start Date: 27/Aug/19 01:36 Worklog Time Spent: 10m Work Description: chadrik commented on issue #9344: [BEAM-7984] The coder returned for typehints.List should be IterableCoder URL: https://github.com/apache/beam/pull/9344#issuecomment-525098325 > If the user specifies the type as List, should we ensure that we always return a List? (I.e should we explicitly state our assumption here that IterableCoder returns a list and also adda note in the implementation of IterableCoderImpl that if this changes we need to create a separate ListCoder? Or should we go ahead and create a separate ListCoder now? I dug into this a bit more and realized that I was mistaken about my assertion that `IterableCoderImpl.decode` always return a list. Here's the relevant code from the base class: ```python class SequenceCoderImpl(StreamCoderImpl): def __init__(self, elem_coder, read_state=None, write_state=None, write_state_threshold=0): self._elem_coder = elem_coder self._read_state = read_state self._write_state = write_state self._write_state_threshold = write_state_threshold ... def decode_from_stream(self, in_stream, nested): size = in_stream.read_bigendian_int32() if size >= 0: elements = [self._elem_coder.decode_from_stream(in_stream, True) for _ in range(size)] else: elements = [] count = in_stream.read_var_int64() while count > 0: for _ in range(count): elements.append(self._elem_coder.decode_from_stream(in_stream, True)) count = in_stream.read_var_int64() if count == -1: if self._read_state is None: raise ValueError( 'Cannot read state-written iterable without state reader.') state_token = in_stream.read_all(True) elements = _ConcatSequence( elements, self._read_state(state_token, self._elem_coder)) return self._construct_from_sequence(elements) ``` `_ConcatSequence` and `self._read_state` are both iterators. Some options: - create a `ListCoder`: the problem is this won't be portable, which defeats the original motivation here, for external transforms - make `IterableCoder` always return a `list`: It currently returns an iterator in the (rare?) case that `read_state` is provided, in which case it _seems_ like it progressively yields elements as they are decoded, but I'm not 100% clear on that. If that _is_ the case, converting to a list on decode would undermine the usefulness of this feature. - leave it as is: `list` is iterable after all, and in the case that the user provides a list to encode, they will get a list back, since there shouldn't be a `read_state`. So as it is, `IterableCoder` is true to its name in that it really only guarantees that the decoded value is iterable. However, as the tests that I added imply, in the typical case, the result is actually a list. So I think the best thing to do is to for me to update the test that I added to assert that the returned type is a list, so that if that ever changes, someone is forced to consider it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301620) Time Spent: 0.5h (was: 20m) > [python] The coder returned for typehints.List should be IterableCoder > -- > > Key: BEAM-7984 > URL: https://issues.apache.org/jira/browse/BEAM-7984 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Chad Dombrova >Assignee: Chad Dombrova >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > IterableCoder encodes a list and decodes to list, but > typecoders.registry.get_coder(typehints.List[bytes]) returns a > FastPrimitiveCoder. I don't see any reason why this would be advantageous. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection
[ https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=301611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301611 ] ASF GitHub Bot logged work on BEAM-7972: Author: ASF GitHub Bot Created on: 27/Aug/19 00:34 Start Date: 27/Aug/19 00:34 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9334: [BEAM-7972] Always use Global window in reshuffle and then apply wind… URL: https://github.com/apache/beam/pull/9334#discussion_r317851116 ## File path: sdks/python/apache_beam/transforms/util.py ## @@ -612,29 +611,23 @@ def restore_timestamps(element): for (value, timestamp) in values] else: - # The linter is confused. - # hash(1) is used to force "runtime" selection of _IdentityWindowFn - # pylint: disable=abstract-class-instantiated - cls = hash(1) and _IdentityWindowFn - window_fn = cls( - windowing_saved.windowfn.get_window_coder()) - - def reify_timestamps(element, timestamp=DoFn.TimestampParam): + def reify_timestamps(element, + timestamp=DoFn.TimestampParam, + window=DoFn.WindowParam): key, value = element -return key, TimestampedValue(value, timestamp) +# Transport the window as part of the value and restore it later. +return key, TimestampedValue((value, window), timestamp) Review comment: > > The defaulting to global window should be deleted since the Python SDK now does send a proper windowing strategy (same as Go SDK). The code was added as a migration path to allow for differences in where the Python/Go/Java SDKs were when submitting jobs to Dataflow. > > So we should update the reshuffle code to not pass the non standard window from python. We shouldn't have to, but if the alternative is significant JRH refactoring, then this code should be OK and we can add a comment that we're working around bugs in the Dataflow JRH. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301611) Time Spent: 3h (was: 2h 50m) > Portable Python Reshuffle does not work with windowed pcollection > - > > Key: BEAM-7972 > URL: https://issues.apache.org/jira/browse/BEAM-7972 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 3h > Remaining Estimate: 0h > > Streaming pipeline gets stuck when using Reshuffle with windowed pcollection. > The issue happen because of window function gets deserialized on java side > which is not possible and hence default to global window function and result > into window function mismatch later down the code. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8093) py36-gcp is missing from the current tox.ini
[ https://issues.apache.org/jira/browse/BEAM-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916229#comment-16916229 ] Udi Meiri commented on BEAM-8093: - https://github.com/apache/beam/pull/7949 has a fix for this, but might be delayed for a while since it's low-pri. > py36-gcp is missing from the current tox.ini > > > Key: BEAM-8093 > URL: https://issues.apache.org/jira/browse/BEAM-8093 > Project: Beam > Issue Type: Sub-task > Components: testing >Affects Versions: 2.16.0 >Reporter: Valentyn Tymofieiev >Priority: Major > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301592=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301592 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 26/Aug/19 23:41 Start Date: 26/Aug/19 23:41 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #9262: [BEAM-7389] Add code examples for Regex page URL: https://github.com/apache/beam/pull/9262#issuecomment-525074981 > Should we cover the recently added RegEx transform here? ( > > https://github.com/apache/beam/blob/ab37b0fd6ce9a26bc6fa36f775df5ddeb067dd2a/sdks/python/apache_beam/transforms/util.py#L873 > > ) Good point, updating the code samples on #9435, will update the docs afterwards. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301592) Time Spent: 49h 50m (was: 49h 40m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 49h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301591=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301591 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 26/Aug/19 23:40 Start Date: 26/Aug/19 23:40 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9435: [BEAM-7389] Update to use util.Regex transform URL: https://github.com/apache/beam/pull/9435 Update the code sample for `Regex` to use the `util.Regex` transform. R: @aaltay Can you take a look whenever you have a chance? Thanks! Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-5878) Support DoFns with Keyword-only arguments in Python 3.
[ https://issues.apache.org/jira/browse/BEAM-5878?focusedWorklogId=301584=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301584 ] ASF GitHub Bot logged work on BEAM-5878: Author: ASF GitHub Bot Created on: 26/Aug/19 23:11 Start Date: 26/Aug/19 23:11 Worklog Time Spent: 10m Work Description: tvalentyn commented on pull request #9237: [BEAM-5878] support DoFns with Keyword-only arguments URL: https://github.com/apache/beam/pull/9237#discussion_r317834907 ## File path: sdks/python/apache_beam/internal/pickler.py ## @@ -136,6 +136,32 @@ def _reject_generators(unused_pickler, unused_obj): dill.dill.Pickler.dispatch[types.GeneratorType] = _reject_generators +# TODO: Remove this once uqfoundation/dill#313 is fixed +if sys.version_info[0] > 2: + # Monkey patch for dill._dill.Pickler to pickle functions + # with keyword-only args + _create_function = dill.dill._create_function + + def _create_function_has_kwdefaults(fcode, fglobals, fname=None, + fdefaults=None, fclosure=None, fdict=None, + fkwdefaults=None): +func = _create_function(fcode, fglobals, fname, fdefaults, fclosure, fdict) +func.__kwdefaults__ = fkwdefaults +return func + + def new_save_reduce(self, func, args, state=None, listitems=None, Review comment: Can we use a more generic signature of new_save_reduce, for example `def new_save_reduce(self, func, args, **kwargs)`? The problem is that we assume a particular version of the API for pickle.save_reduce here, and we can see that it will change in Python 3.8, see https://github.com/python/cpython/blob/c75f0e5bdee3cfaba9fd5b3a8549dec0aba01ebe/Lib/pickle.py#L619. I think with a generic definition of `new_save_reduce` we can still update `args` list , and pass `**kwargs` to `picker.save_reduce`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301584) Time Spent: 9h 10m (was: 9h) > Support DoFns with Keyword-only arguments in Python 3. > -- > > Key: BEAM-5878 > URL: https://issues.apache.org/jira/browse/BEAM-5878 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: yoshiki obata >Priority: Minor > Fix For: 2.16.0 > > Time Spent: 9h 10m > Remaining Estimate: 0h > > Python 3.0 [adds a possibility|https://www.python.org/dev/peps/pep-3102/] to > define functions with keyword-only arguments. > Currently Beam does not handle them correctly. [~ruoyu] pointed out [one > place|https://github.com/apache/beam/blob/a56ce43109c97c739fa08adca45528c41e3c925c/sdks/python/apache_beam/typehints/decorators.py#L118] > in our codebase that we should fix: in Python in 3.0 inspect.getargspec() > will fail on functions with keyword-only arguments, but a new method > [inspect.getfullargspec()|https://docs.python.org/3/library/inspect.html#inspect.getfullargspec] > supports them. > There may be implications for our (best-effort) type-hints machinery. > We should also add a Py3-only unit tests that covers DoFn's with keyword-only > arguments once Beam Python 3 tests are in a good shape. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8045) Publish custom windows pattern
[ https://issues.apache.org/jira/browse/BEAM-8045?focusedWorklogId=301575=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301575 ] ASF GitHub Bot logged work on BEAM-8045: Author: ASF GitHub Bot Created on: 26/Aug/19 23:01 Start Date: 26/Aug/19 23:01 Worklog Time Spent: 10m Work Description: aaltay commented on issue #9406: [BEAM-8045] Custom windows patterns URL: https://github.com/apache/beam/pull/9406#issuecomment-525065807 As a reference for everyone linking the related PR: https://github.com/apache/beam/pull/9399 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301575) Time Spent: 0.5h (was: 20m) > Publish custom windows pattern > -- > > Key: BEAM-8045 > URL: https://issues.apache.org/jira/browse/BEAM-8045 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Cyrus Maden >Assignee: Cyrus Maden >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301569=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301569 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 26/Aug/19 22:59 Start Date: 26/Aug/19 22:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#discussion_r317828430 ## File path: sdks/python/apache_beam/transforms/ptransform.py ## @@ -821,7 +823,9 @@ def expand(self, pcoll): # TODO(BEAM-5878) Support keyword-only arguments. try: - if 'type_hints' in getfullargspec(self._fn).args: + # TODO(udim): This looks like unused code. When is 'type_hints' used as an Review comment: IIRC, this was used to support something like ``` @ptransformfn def MyTransform(pcoll, type_hints, arg): ... pcoll | MyPTransform(arg) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301569) Time Spent: 12h (was: 11h 50m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301571=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301571 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 26/Aug/19 22:59 Start Date: 26/Aug/19 22:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#discussion_r317829250 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -171,22 +269,55 @@ def simple_output_type(self, context): if self.output_types: args, kwargs = self.output_types if len(args) != 1 or kwargs: -raise TypeError('Expected simple output type hint for %s' % context) +raise TypeError( +'Expected single output type hint for %s but got: %s' % ( +context, self.output_types)) return args[0] + def has_simple_output_type(self): +"""Whether there's a single positional output type.""" +return (self.output_types and len(self.output_types[0]) == 1 and +not self.output_types[1]) + + def strip_iterable(self): +"""Removes outer Iterable (or equivalent) from output type. + +Only affects instances with simple output types, otherwise is a no-op. + +Example: Generator[Tuple(int, int)] becomes Tuple(int, int) + +Raises: + ValueError if output type is simple and not iterable. +""" +if not self.has_simple_output_type(): + return +yielded_type = typehints.get_yielded_type(self.output_types[0][0]) +self.output_types = ((yielded_type,), {}) + def copy(self): return IOTypeHints(self.input_types, self.output_types) def with_defaults(self, hints): if not hints: return self -elif not self: - return hints -return IOTypeHints(self.input_types or hints.input_types, - self.output_types or hints.output_types) +if self._has_input_types(): + input_types = self.input_types +else: + input_types = hints.input_types +if self._has_output_types(): + output_types = self.output_types +else: + output_types = hints.output_types +return IOTypeHints(input_types, output_types) + + def _has_input_types(self): +return self.input_types is not None and any(self.input_types) Review comment: Here (and elsewhere), when is any(self.input_types) False but self.input_types not False? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301571) Time Spent: 12h 20m (was: 12h 10m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301570=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301570 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 26/Aug/19 22:59 Start Date: 26/Aug/19 22:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#discussion_r317830592 ## File path: sdks/python/apache_beam/typehints/typed_pipeline_test.py ## @@ -290,5 +306,26 @@ def test_flat_type_hint(self): self.test_input | self.CustomTransform().with_output_types(int) +class AnnotationsTest(unittest.TestCase): + + def test_pardo_wrapper_builtin(self): +th = beam.ParDo(str.strip).get_type_hints() +if sys.version_info < (3, 7): + self.assertEqual(th.input_types, ((str,), {})) +else: + # Python 3.7+ has annotations for CPython builtins + # (_MethodDescriptorType). + self.assertEqual(th.input_types, ((str, typehints.Any), {})) +self.assertEqual(th.output_types, ((typehints.Any,), {})) + +th = beam.ParDo(list).get_type_hints() +self.assertIsNone(th.input_types) +self.assertIsNone(th.output_types) Review comment: Can we not infer the output type is list here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301570) Time Spent: 12h 10m (was: 12h) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 12h 10m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301568=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301568 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 26/Aug/19 22:59 Start Date: 26/Aug/19 22:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#discussion_r317829417 ## File path: sdks/python/apache_beam/typehints/decorators.py ## @@ -265,23 +368,34 @@ def _unpack_positional_arg_hints(arg, hint): return hint -def getcallargs_forhints(func, *typeargs, **typekwargs): - """Like inspect.getcallargs, but understands that Tuple[] and an Any unpack. +def getcallargs_forhints(using_var_hints, func, *typeargs, **typekwargs): + """Like inspect.getcallargs, with support for declaring default args as Any. + + In Python 2, understands that Tuple[] and an Any unpack. + + Args: +using_var_hints: For variable length arguments, whether to expect the bound Review comment: Is it possible to push this complication into the single caller? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301568) Time Spent: 12h (was: 11h 50m) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 12h > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7060) Design Py3-compatible typehints annotation support in Beam 3.
[ https://issues.apache.org/jira/browse/BEAM-7060?focusedWorklogId=301572=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301572 ] ASF GitHub Bot logged work on BEAM-7060: Author: ASF GitHub Bot Created on: 26/Aug/19 22:59 Start Date: 26/Aug/19 22:59 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9283: [BEAM-7060] Type hints from Python 3 annotations URL: https://github.com/apache/beam/pull/9283#discussion_r317832098 ## File path: sdks/python/apache_beam/typehints/typehints_test.py ## @@ -1062,57 +797,92 @@ def test_hint_helper(self): self.assertTrue(is_consistent_with(str, Union[str, int])) self.assertFalse(is_consistent_with(Union[str, int], str)) - def test_positional_arg_hints(self): -self.assertEqual(typehints.Any, _positional_arg_hints('x', {})) -self.assertEqual(int, _positional_arg_hints('x', {'x': int})) -self.assertEqual(typehints.Tuple[int, typehints.Any], - _positional_arg_hints(['x', 'y'], {'x': int})) - def test_getcallargs_forhints(self): def func(a, b_c, *d): b, c = b_c # pylint: disable=unused-variable return None self.assertEqual( {'a': Any, 'b_c': Any, 'd': Tuple[Any, ...]}, -getcallargs_forhints(func, *[Any, Any])) +getcallargs_forhints(False, func, *[Any, Any])) +if sys.version_info >= (3,): + self.assertEqual( + {'a': Any, 'b_c': Any, 'd': Tuple[Union[int, str], ...]}, + getcallargs_forhints(False, func, *[Any, Any, str, int])) +else: + self.assertEqual( + {'a': Any, 'b_c': Any, 'd': Tuple[Any, ...]}, + getcallargs_forhints(False, func, *[Any, Any, Any, int])) Review comment: OK, at least reference a JIRA? It looks like the common pattern is that you have a Tuple[more specific type] -> Tuple[Any, ...]. Perhaps you could make a relax_on_py2 method that does this substitution and use that everywhere rather than duplicating the logic in each test? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301572) > Design Py3-compatible typehints annotation support in Beam 3. > - > > Key: BEAM-7060 > URL: https://issues.apache.org/jira/browse/BEAM-7060 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Time Spent: 12h 20m > Remaining Estimate: 0h > > Existing [Typehints implementaiton in > Beam|[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/typehints/ > ] heavily relies on internal details of CPython implementation, and some of > the assumptions of this implementation broke as of Python 3.6, see for > example: https://issues.apache.org/jira/browse/BEAM-6877, which makes > typehints support unusable on Python 3.6 as of now. [Python 3 Kanban > Board|https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=245=detail] > lists several specific typehints-related breakages, prefixed with "TypeHints > Py3 Error". > We need to decide whether to: > - Deprecate in-house typehints implementation. > - Continue to support in-house implementation, which at this point is a stale > code and has other known issues. > - Attempt to use some off-the-shelf libraries for supporting > type-annotations, like Pytype, Mypy, PyAnnotate. > WRT to this decision we also need to plan on immediate next steps to unblock > adoption of Beam for Python 3.6+ users. One potential option may be to have > Beam SDK ignore any typehint annotations on Py 3.6+. > cc: [~udim], [~altay], [~robertwb]. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7266) Pipeline run does not terminate because of Dataflow runner can not close file system writer
[ https://issues.apache.org/jira/browse/BEAM-7266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916198#comment-16916198 ] Ankur Goenka commented on BEAM-7266: Another user reported a similar issue https://stackoverflow.com/questions/57507122/pipeline-keeps-running-because-of-dataflow-runner-not-closing-file-system-writer > Pipeline run does not terminate because of Dataflow runner can not close file > system writer > --- > > Key: BEAM-7266 > URL: https://issues.apache.org/jira/browse/BEAM-7266 > Project: Beam > Issue Type: Bug > Components: io-py-gcp, runner-dataflow >Affects Versions: 2.11.0, 2.12.0 >Reporter: Fabian >Priority: Critical > Fix For: Not applicable > > > We are using Apache Beam in version 2.11.0 (Python SDK) with the Dataflow > runner running on the Google Cloud Platform. Two pipeline runs did not > terminate, i.e. after multiple days (instead of some minutes) they where > still running. The only error that was logged is: > If fails to close a writer: > {code:java} > Traceback (most recent call last): > File > "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line > 649, in do_work > work_executor.execute() > File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", > line 178, in execute > op.finish() > File "dataflow_worker/native_operations.py", line 93, in > dataflow_worker.native_operations.NativeWriteOperation.finish > def finish(self): > File "dataflow_worker/native_operations.py", line 94, in > dataflow_worker.native_operations.NativeWriteOperation.finish > with self.scoped_finish_state: > File "dataflow_worker/native_operations.py", line 95, in > dataflow_worker.native_operations.NativeWriteOperation.finish > self.writer.__exit__(None, None, None) > File > "/usr/local/lib/python2.7/dist-packages/dataflow_worker/nativeavroio.py", > line 277, in __exit__ > self._data_file_writer.close() > File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 220, > in close > self.writer.close() > File > "/usr/local/lib/python2.7/dist-packages/apache_beam/io/filesystemio.py", line > 202, in close > self._uploader.finish() > File "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/gcsio.py", > line 606, in finish > raise self._upload_thread.last_error # pylint: disable=raising-bad-type > NotImplementedError{code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301562=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301562 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 22:37 Start Date: 26/Aug/19 22:37 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317826798 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 28h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7966) Write portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301553=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301553 ] ASF GitHub Bot logged work on BEAM-7966: Author: ASF GitHub Bot Created on: 26/Aug/19 22:10 Start Date: 26/Aug/19 22:10 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9331: [BEAM-7966] Write portable Flink application jar URL: https://github.com/apache/beam/pull/9331#discussion_r317819152 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.jobsubmission; + +/** Contains common code for writing and reading portable pipeline jars. */ +public abstract class PortablePipelineJarUtils { Review comment: It'd be good here, or elsewhere, to document the spec of what the jar should contain (i.e. what the contents of each of these files is). Also, should they all be in some common subdirectory? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301553) Remaining Estimate: 0h Time Spent: 10m > Write portable Flink application jar > > > Key: BEAM-7966 > URL: https://issues.apache.org/jira/browse/BEAM-7966 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 10m > Remaining Estimate: 0h > > *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]* -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7966) Write portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301554=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301554 ] ASF GitHub Bot logged work on BEAM-7966: Author: ASF GitHub Bot Created on: 26/Aug/19 22:10 Start Date: 26/Aug/19 22:10 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9331: [BEAM-7966] Write portable Flink application jar URL: https://github.com/apache/beam/pull/9331#discussion_r317818691 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.jobsubmission; + +/** Contains common code for writing and reading portable pipeline jars. */ +public abstract class PortablePipelineJarUtils { + static final String ARTIFACT_FOLDER_NAME = "beam-artifact-staging"; + static final String ARTIFACT_MANIFEST_NAME = "beam-artifact-manifest.json"; + static final String PIPELINE_FILE_NAME = "beam-pipeline.textproto"; Review comment: Nit: are these textprotos? I think they're binary. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301554) Remaining Estimate: 0h Time Spent: 10m > Write portable Flink application jar > > > Key: BEAM-7966 > URL: https://issues.apache.org/jira/browse/BEAM-7966 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 10m > Remaining Estimate: 0h > > *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]* -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7966) Write portable Flink application jar
[ https://issues.apache.org/jira/browse/BEAM-7966?focusedWorklogId=301555=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301555 ] ASF GitHub Bot logged work on BEAM-7966: Author: ASF GitHub Bot Created on: 26/Aug/19 22:10 Start Date: 26/Aug/19 22:10 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9331: [BEAM-7966] Write portable Flink application jar URL: https://github.com/apache/beam/pull/9331#discussion_r317819889 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/jobsubmission/PortablePipelineJarUtils.java ## @@ -0,0 +1,26 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.fnexecution.jobsubmission; + +/** Contains common code for writing and reading portable pipeline jars. */ +public abstract class PortablePipelineJarUtils { + static final String ARTIFACT_FOLDER_NAME = "beam-artifact-staging"; + static final String ARTIFACT_MANIFEST_NAME = "beam-artifact-manifest.json"; Review comment: Why is this json and the others raw proto (that ProxyManifest is also a proto message)? Or should (some of?) the others be json as well? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301555) Time Spent: 20m (was: 10m) > Write portable Flink application jar > > > Key: BEAM-7966 > URL: https://issues.apache.org/jira/browse/BEAM-7966 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-flink > Time Spent: 20m > Remaining Estimate: 0h > > *[https://docs.google.com/document/d/1kj_9JWxGWOmSGeZ5hbLVDXSTv-zBrx4kQRqOq85RYD4/edit#heading=h.oes73844vmhl]* -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301543=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301543 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 26/Aug/19 21:50 Start Date: 26/Aug/19 21:50 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] Simplify/generalize Spark reshuffle translation URL: https://github.com/apache/beam/pull/9410#issuecomment-525047135 Run Python Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301543) Time Spent: 1h 40m (was: 1.5h) > Portable Spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 40m > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301542=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301542 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 26/Aug/19 21:50 Start Date: 26/Aug/19 21:50 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] Simplify/generalize Spark reshuffle translation URL: https://github.com/apache/beam/pull/9410#issuecomment-525047102 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301542) Time Spent: 1.5h (was: 1h 20m) > Portable Spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1.5h > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers
[ https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301540=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301540 ] ASF GitHub Bot logged work on BEAM-8015: Author: ASF GitHub Bot Created on: 26/Aug/19 21:44 Start Date: 26/Aug/19 21:44 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get logs from Docker containers URL: https://github.com/apache/beam/pull/9389#discussion_r317810021 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java ## @@ -134,10 +148,41 @@ public void killContainer(String containerId) runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId)); } - /** Run the given command invocation and return stdout as a String. */ + /** + * Removes docker container with container id. + * + * @throws IOException if an IOException occurs, or if the given container id either does not + * exist or is still running + */ + public void removeContainer(String containerId) + throws IOException, TimeoutException, InterruptedException { +checkArgument(containerId != null); +checkArgument( +CONTAINER_ID_PATTERN.matcher(containerId).matches(), +"Container ID must be a 64-character hexadecimal string"); +runShortCommand(Arrays.asList(dockerExecutable, "rm", containerId)); + } + private String runShortCommand(List invocation) throws IOException, TimeoutException, InterruptedException { +return runShortCommand(invocation, false, ""); + } + + /** + * Runs a command, blocks until {@link DockerCommand#commandTimeout} has elapsed, then returns the + * command's output. + * + * @param invocation command and arguments to be run + * @param redirectErrorStream if true, redirect stderr of the process to its stdout + * @param delimiter used for separating output lines + * @return stdout of the command, including stderr if {@code redirectErrorStream} is true + * @throws TimeoutException if command has not finished by {@link DockerCommand#commandTimeout} + */ + private String runShortCommand( + List invocation, boolean redirectErrorStream, CharSequence delimiter) + throws IOException, TimeoutException, InterruptedException { ProcessBuilder pb = new ProcessBuilder(invocation); +pb.redirectErrorStream(redirectErrorStream); Review comment: > I suppose my question is, how will we capture the output of the runShortCommand when redirectErrorStream is set to true? Here are all the cases: - `!redirectErrorStream && exitCode == 0` Return only the stdout of the command. - `!redirectErrorStream && exitCode != 0` Throw an exception that includes only the stderr of the command. - `redirectErrorStream && exitCode == 0` Return the stdout and stderr of the command. - `redirectErrorStream && exitCode != 0` Throw an exception that includes both the stdout and stderr of the command. It's not as simple as I would like it to be, but I think this is the behavior we want. > If this simply redirects stderr to stdout, why couldn't we capture stderr before? Before, we were only using stderr if the process exited with a nonzero code. I guess we wouldn't want to include it in the return value of `runShortCommand` normally because irrelevant warnings, etc. could potentially break the parsing of the result for some commands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301540) Time Spent: 2h 50m (was: 2h 40m) > Get logs for SDK worker Docker containers > - > > Key: BEAM-8015 > URL: https://issues.apache.org/jira/browse/BEAM-8015 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently, when SDK worker containers fail to start up properly, an exception > is thrown that provides no information about what happened. We can improve > debugging by keeping containers around long enough to log their logs before > removing them. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers
[ https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301539 ] ASF GitHub Bot logged work on BEAM-8015: Author: ASF GitHub Bot Created on: 26/Aug/19 21:44 Start Date: 26/Aug/19 21:44 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get logs from Docker containers URL: https://github.com/apache/beam/pull/9389#discussion_r317810021 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java ## @@ -134,10 +148,41 @@ public void killContainer(String containerId) runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId)); } - /** Run the given command invocation and return stdout as a String. */ + /** + * Removes docker container with container id. + * + * @throws IOException if an IOException occurs, or if the given container id either does not + * exist or is still running + */ + public void removeContainer(String containerId) + throws IOException, TimeoutException, InterruptedException { +checkArgument(containerId != null); +checkArgument( +CONTAINER_ID_PATTERN.matcher(containerId).matches(), +"Container ID must be a 64-character hexadecimal string"); +runShortCommand(Arrays.asList(dockerExecutable, "rm", containerId)); + } + private String runShortCommand(List invocation) throws IOException, TimeoutException, InterruptedException { +return runShortCommand(invocation, false, ""); + } + + /** + * Runs a command, blocks until {@link DockerCommand#commandTimeout} has elapsed, then returns the + * command's output. + * + * @param invocation command and arguments to be run + * @param redirectErrorStream if true, redirect stderr of the process to its stdout + * @param delimiter used for separating output lines + * @return stdout of the command, including stderr if {@code redirectErrorStream} is true + * @throws TimeoutException if command has not finished by {@link DockerCommand#commandTimeout} + */ + private String runShortCommand( + List invocation, boolean redirectErrorStream, CharSequence delimiter) + throws IOException, TimeoutException, InterruptedException { ProcessBuilder pb = new ProcessBuilder(invocation); +pb.redirectErrorStream(redirectErrorStream); Review comment: > I suppose my question is, how will we capture the output of the runShortCommand when redirectErrorStream is set to true? Here are all the cases: - `!redirectErrorStream && exitCode == 0` Return only the stdout of the command. - `!redirectErrorStream && exitCode != 0` Throw an exception that includes only the stderr of the command. - `redirectErrorStream && exitCode == 0` Return the stdout and stderr of the command. - `redirectErrorStream && exitCode != 0` Throw an exception that includes both the stdout and stderr of the command. It's not as simple as I would like it to be, but I think this is the behavior we want. > If this simply redirects stderr to stdout, why couldn't we capture stderr before? Before, we were only using stderr if the process exited with a nonzero code. I guess we wouldn't want to include it in the return value of `runShortCommand` because irrelevant warnings, etc. could potentially break the parsing of the result for some commands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301539) Time Spent: 2h 40m (was: 2.5h) > Get logs for SDK worker Docker containers > - > > Key: BEAM-8015 > URL: https://issues.apache.org/jira/browse/BEAM-8015 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h 40m > Remaining Estimate: 0h > > Currently, when SDK worker containers fail to start up properly, an exception > is thrown that provides no information about what happened. We can improve > debugging by keeping containers around long enough to log their logs before > removing them. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers
[ https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301534=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301534 ] ASF GitHub Bot logged work on BEAM-8015: Author: ASF GitHub Bot Created on: 26/Aug/19 21:40 Start Date: 26/Aug/19 21:40 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get logs from Docker containers URL: https://github.com/apache/beam/pull/9389#discussion_r317810869 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java ## @@ -152,34 +148,43 @@ public RemoteEnvironment createEnvironment(Environment environment) throws Excep containerId = docker.runImage(containerImage, dockerArgsBuilder.build(), args); LOG.debug("Created Docker Container with Container ID {}", containerId); // Wait on a client from the gRPC server. - while (instructionHandler == null) { + try { +instructionHandler = clientSource.take(workerId, Duration.ofMinutes(1)); + } catch (TimeoutException timeoutEx) { +RuntimeException runtimeException = +new RuntimeException( +String.format( +"Docker container %s failed to start up successfully within 1 minute.", +containerImage), +timeoutEx); try { - instructionHandler = clientSource.take(workerId, Duration.ofMinutes(1)); -} catch (TimeoutException timeoutEx) { - Preconditions.checkArgument( - docker.isContainerRunning(containerId), "No container running for id " + containerId); - LOG.info( - "Still waiting for startup of environment {} for worker id {}", - dockerPayload.getContainerImage(), - workerId); -} catch (InterruptedException interruptEx) { - Thread.currentThread().interrupt(); - throw new RuntimeException(interruptEx); + String containerLogs = docker.getContainerLogs(containerId); + LOG.error("Docker container {} logs:\n{}", containerId, containerLogs); Review comment: (Also, I changed the code a bit, it was incorrect before) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301534) Time Spent: 2.5h (was: 2h 20m) > Get logs for SDK worker Docker containers > - > > Key: BEAM-8015 > URL: https://issues.apache.org/jira/browse/BEAM-8015 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2.5h > Remaining Estimate: 0h > > Currently, when SDK worker containers fail to start up properly, an exception > is thrown that provides no information about what happened. We can improve > debugging by keeping containers around long enough to log their logs before > removing them. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers
[ https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301530=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301530 ] ASF GitHub Bot logged work on BEAM-8015: Author: ASF GitHub Bot Created on: 26/Aug/19 21:37 Start Date: 26/Aug/19 21:37 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get logs from Docker containers URL: https://github.com/apache/beam/pull/9389#discussion_r317810021 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerCommand.java ## @@ -134,10 +148,41 @@ public void killContainer(String containerId) runShortCommand(Arrays.asList(dockerExecutable, "kill", containerId)); } - /** Run the given command invocation and return stdout as a String. */ + /** + * Removes docker container with container id. + * + * @throws IOException if an IOException occurs, or if the given container id either does not + * exist or is still running + */ + public void removeContainer(String containerId) + throws IOException, TimeoutException, InterruptedException { +checkArgument(containerId != null); +checkArgument( +CONTAINER_ID_PATTERN.matcher(containerId).matches(), +"Container ID must be a 64-character hexadecimal string"); +runShortCommand(Arrays.asList(dockerExecutable, "rm", containerId)); + } + private String runShortCommand(List invocation) throws IOException, TimeoutException, InterruptedException { +return runShortCommand(invocation, false, ""); + } + + /** + * Runs a command, blocks until {@link DockerCommand#commandTimeout} has elapsed, then returns the + * command's output. + * + * @param invocation command and arguments to be run + * @param redirectErrorStream if true, redirect stderr of the process to its stdout + * @param delimiter used for separating output lines + * @return stdout of the command, including stderr if {@code redirectErrorStream} is true + * @throws TimeoutException if command has not finished by {@link DockerCommand#commandTimeout} + */ + private String runShortCommand( + List invocation, boolean redirectErrorStream, CharSequence delimiter) + throws IOException, TimeoutException, InterruptedException { ProcessBuilder pb = new ProcessBuilder(invocation); +pb.redirectErrorStream(redirectErrorStream); Review comment: > I suppose my question is, how will we capture the output of the runShortCommand when redirectErrorStream is set to true? Here are all the cases: - `!redirectErrorStream && exitCode == 0` Return only the stdout of the command. - `!redirectErrorStream && exitCode != 0` Throw an exception that includes only the stderr of the command. - `redirectErrorStream && exitCode == 0` Return the stdout and stderr of the command. - `redirectErrorStream && exitCode != 0` Throw an exception that includes both the stdout and stderr of the command. It's not as simple as I would like it to be, but I think this is the behavior we want. > If this simply redirects stderr to stdout, why couldn't we capture stderr before? Before, we were only using stderr if there was an error. I guess we wouldn't want to include it in the return value of `runShortCommand` because irrelevant warnings, etc. could potentially break the parsing of the result for some commands. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301530) Time Spent: 2h 10m (was: 2h) > Get logs for SDK worker Docker containers > - > > Key: BEAM-8015 > URL: https://issues.apache.org/jira/browse/BEAM-8015 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently, when SDK worker containers fail to start up properly, an exception > is thrown that provides no information about what happened. We can improve > debugging by keeping containers around long enough to log their logs before > removing them. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8015) Get logs for SDK worker Docker containers
[ https://issues.apache.org/jira/browse/BEAM-8015?focusedWorklogId=301531=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301531 ] ASF GitHub Bot logged work on BEAM-8015: Author: ASF GitHub Bot Created on: 26/Aug/19 21:37 Start Date: 26/Aug/19 21:37 Worklog Time Spent: 10m Work Description: ibzib commented on pull request #9389: [BEAM-8015] Get logs from Docker containers URL: https://github.com/apache/beam/pull/9389#discussion_r317810029 ## File path: runners/java-fn-execution/src/main/java/org/apache/beam/runners/fnexecution/environment/DockerEnvironmentFactory.java ## @@ -152,34 +148,43 @@ public RemoteEnvironment createEnvironment(Environment environment) throws Excep containerId = docker.runImage(containerImage, dockerArgsBuilder.build(), args); LOG.debug("Created Docker Container with Container ID {}", containerId); // Wait on a client from the gRPC server. - while (instructionHandler == null) { + try { +instructionHandler = clientSource.take(workerId, Duration.ofMinutes(1)); + } catch (TimeoutException timeoutEx) { +RuntimeException runtimeException = +new RuntimeException( +String.format( +"Docker container %s failed to start up successfully within 1 minute.", +containerImage), +timeoutEx); try { - instructionHandler = clientSource.take(workerId, Duration.ofMinutes(1)); -} catch (TimeoutException timeoutEx) { - Preconditions.checkArgument( - docker.isContainerRunning(containerId), "No container running for id " + containerId); - LOG.info( - "Still waiting for startup of environment {} for worker id {}", - dockerPayload.getContainerImage(), - workerId); -} catch (InterruptedException interruptEx) { - Thread.currentThread().interrupt(); - throw new RuntimeException(interruptEx); + String containerLogs = docker.getContainerLogs(containerId); + LOG.error("Docker container {} logs:\n{}", containerId, containerLogs); Review comment: No, see my other comment This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301531) Time Spent: 2h 20m (was: 2h 10m) > Get logs for SDK worker Docker containers > - > > Key: BEAM-8015 > URL: https://issues.apache.org/jira/browse/BEAM-8015 > Project: Beam > Issue Type: Improvement > Components: java-fn-execution >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Time Spent: 2h 20m > Remaining Estimate: 0h > > Currently, when SDK worker containers fail to start up properly, an exception > is thrown that provides no information about what happened. We can improve > debugging by keeping containers around long enough to log their logs before > removing them. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301528=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301528 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 21:35 Start Date: 26/Aug/19 21:35 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317809353 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% bq can't be final since this method is not a constructor. minor: I mean you can make `bq` as a static attr of your test class. > It is fine though because BigqueryClient.getClient() does caching for us so it won't create another new client the second time we call it. minor: I don't think there is any caching logic in `BigqueryClient`. ` BigqueryClient.getClient()` just returns `new BigqueryClient()` simply This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301528) Time Spent: 28.5h (was: 28h 20m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 28.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301527=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301527 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 21:30 Start Date: 26/Aug/19 21:30 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317807506 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 28h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301524=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301524 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 21:25 Start Date: 26/Aug/19 21:25 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#issuecomment-525039196 Thank you all @boyuanzz @zfraa @amaliujia @reuvenlax for the review! I have squashed all the commits into one and it think this PR is ready to be merged once the precommit tests pass. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301524) Time Spent: 28h 10m (was: 28h) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 28h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301523=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301523 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 21:22 Start Date: 26/Aug/19 21:22 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317804729 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 28h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301522=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301522 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 21:22 Start Date: 26/Aug/19 21:22 Worklog Time Spent: 10m Work Description: robinyqiu commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317804549 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 27h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-3713) Consider moving away from nose to nose2 or pytest.
[ https://issues.apache.org/jira/browse/BEAM-3713?focusedWorklogId=301520=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301520 ] ASF GitHub Bot logged work on BEAM-3713: Author: ASF GitHub Bot Created on: 26/Aug/19 21:22 Start Date: 26/Aug/19 21:22 Worklog Time Spent: 10m Work Description: udim commented on pull request #7949: [BEAM-3713] Add pytest testing infrastructure URL: https://github.com/apache/beam/pull/7949#discussion_r317804459 ## File path: sdks/python/setup.py ## @@ -201,6 +202,7 @@ def run(self): install_requires=REQUIRED_PACKAGES, python_requires=python_requires, test_suite='nose.collector', Review comment: I don't know what to put here for pytest. The instructions said to add the `setup_requires` line below. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301520) Time Spent: 4h 10m (was: 4h) > Consider moving away from nose to nose2 or pytest. > -- > > Key: BEAM-3713 > URL: https://issues.apache.org/jira/browse/BEAM-3713 > Project: Beam > Issue Type: Test > Components: sdk-py-core, testing >Reporter: Robert Bradshaw >Assignee: Udi Meiri >Priority: Minor > Time Spent: 4h 10m > Remaining Estimate: 0h > > Per > [https://nose.readthedocs.io/en/latest/|https://nose.readthedocs.io/en/latest/,] > , nose is in maintenance mode. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301521=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301521 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 21:22 Start Date: 26/Aug/19 21:22 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #9434: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 2) URL: https://github.com/apache/beam/pull/9434 Update `verify_release_build.sh` so that it reuses `script.config` for configuration setup. Also relevant part of release doc is updated: 1. `verify_release_build.sh` will use configurations in `script.config` instead of asking people input from terminal. 2. Github access token is required for `hub` and git push. 3. Move all Jenkins phrases to a list `JOB_TRIGGER_PHRASES` and have release guide point to it. 4. Updated release guide about script usage and how to run build locally. Tested the script in a fresh Ubuntu 18 VM. +R: @yifanzou Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301516=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301516 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 21:08 Start Date: 26/Aug/19 21:08 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#issuecomment-525033262 Java PreCommit failed due to irrelevant test failure. Release_Build failed with known issue: https://issues.apache.org/jira/browse/BEAM-7789. They are not release blocker. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301516) Time Spent: 1h 50m (was: 1h 40m) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7972) Portable Python Reshuffle does not work with windowed pcollection
[ https://issues.apache.org/jira/browse/BEAM-7972?focusedWorklogId=301515=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301515 ] ASF GitHub Bot logged work on BEAM-7972: Author: ASF GitHub Bot Created on: 26/Aug/19 21:05 Start Date: 26/Aug/19 21:05 Worklog Time Spent: 10m Work Description: angoenka commented on issue #9334: [BEAM-7972] Always use Global window in reshuffle and then apply wind… URL: https://github.com/apache/beam/pull/9334#issuecomment-525032257 > The defaulting to global window should be deleted since the Python SDK now does send a proper windowing strategy (same as Go SDK). The code was added as a migration path to allow for differences in where the Python/Go/Java SDKs were when submitting jobs to Dataflow. So we should update the reshuffle code to not pass the non standard window from python. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301515) Time Spent: 2h 50m (was: 2h 40m) > Portable Python Reshuffle does not work with windowed pcollection > - > > Key: BEAM-7972 > URL: https://issues.apache.org/jira/browse/BEAM-7972 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Ankur Goenka >Assignee: Ankur Goenka >Priority: Major > Time Spent: 2h 50m > Remaining Estimate: 0h > > Streaming pipeline gets stuck when using Reshuffle with windowed pcollection. > The issue happen because of window function gets deserialized on java side > which is not possible and hence default to global window function and result > into window function mismatch later down the code. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301513=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301513 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 26/Aug/19 21:01 Start Date: 26/Aug/19 21:01 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] Simplify/generalize Spark reshuffle translation URL: https://github.com/apache/beam/pull/9410#issuecomment-525031072 Run Java Spark PortableValidatesRunner Batch This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301513) Time Spent: 1h 10m (was: 1h) > Portable Spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 10m > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301514=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301514 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 26/Aug/19 21:01 Start Date: 26/Aug/19 21:01 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] Simplify/generalize Spark reshuffle translation URL: https://github.com/apache/beam/pull/9410#issuecomment-525031112 Run Python Spark ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301514) Time Spent: 1h 20m (was: 1h 10m) > Portable Spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h 20m > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8095) pytest Py3.7 crashes on test_remote_runner_display_data
Udi Meiri created BEAM-8095: --- Summary: pytest Py3.7 crashes on test_remote_runner_display_data Key: BEAM-8095 URL: https://issues.apache.org/jira/browse/BEAM-8095 Project: Beam Issue Type: Improvement Components: test-failures, testing Reporter: Udi Meiri Adding certain flags such as "-n 1" or "--debug" causes Python to abort. The --debug flag logs some information in a pytestdebug.log, but I'm not familiar with it and what it means if anything. {code} $ pytest apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data === test session starts platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: pytest.ini plugins: forked-1.0.2, xdist-1.29.0 collected 1 item apache_beam/runners/dataflow/dataflow_runner_test.py . [100%] 1 passed in 0.11s = $ pytest apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_remote_runner_display_data --debug writing pytestdebug information to /usr/local/google/home/ehudm/src/beam/sdks/python/pytestdebug.log === test session starts platform linux -- Python 3.7.3rc1, pytest-5.1.1, py-1.8.0, pluggy-0.12.0 -- /usr/local/google/home/ehudm/virtualenvs/beam-py37/bin/python3.7 using: pytest-5.1.1 pylib-1.8.0 setuptools registered plugins: pytest-forked-1.0.2 at /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/pytest_forked/__init__.py pytest-xdist-1.29.0 at /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/plugin.py pytest-xdist-1.29.0 at /usr/local/google/home/ehudm/virtualenvs/beam-py37/lib/python3.7/site-packages/xdist/looponfail.py rootdir: /usr/local/google/home/ehudm/src/beam/sdks/python, inifile: pytest.ini plugins: forked-1.0.2, xdist-1.29.0 collected 1 item apache_beam/runners/dataflow/dataflow_runner_test.py Aborted (core dumped) {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=301482=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301482 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 26/Aug/19 20:43 Start Date: 26/Aug/19 20:43 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #9388: [BEAM-7909] upgrade python lib versions to match to dataflow worker URL: https://github.com/apache/beam/pull/9388#issuecomment-525024490 > Thank you! > > @Hannah-Jiang could you also file another JIRA issue for finding a process to keep this list in sync with setup.py. Thank you Ahmet for reviewing and merging it. I created a ticket: https://issues.apache.org/jira/browse/BEAM-8094 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301482) Time Spent: 8h 10m (was: 8h) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8094) Keep python library version synced between setup.py and base_image_requirements.txt
Hannah Jiang created BEAM-8094: -- Summary: Keep python library version synced between setup.py and base_image_requirements.txt Key: BEAM-8094 URL: https://issues.apache.org/jira/browse/BEAM-8094 Project: Beam Issue Type: Improvement Components: sdk-py-core Reporter: Hannah Jiang There are some shared libraries in setup.py and base_image_requirements.txt. Find a way to keep versions are syncs in both files and new libraries added to setup.py are added to base_image_requirements.txt if it is commonly used. Now it is manually synced. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301462=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301462 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 26/Aug/19 20:30 Start Date: 26/Aug/19 20:30 Worklog Time Spent: 10m Work Description: davidcavazos commented on issue #9289: [BEAM-7389] Add code examples for ToString page URL: https://github.com/apache/beam/pull/9289#issuecomment-525019745 Updating to use [`util.ToString`](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/transforms/util.html#ToString). I opened #9433 to update the code samples to use it. I also updated this page to use the new snippets, but they won't render correctly until the new code samples are merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301462) Time Spent: 49.5h (was: 49h 20m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 49.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301461=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301461 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 26/Aug/19 20:28 Start Date: 26/Aug/19 20:28 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9289: [BEAM-7389] Add code examples for ToString page URL: https://github.com/apache/beam/pull/9289#discussion_r317783805 ## File path: website/src/documentation/transforms/python/element-wise/tostring.md ## @@ -19,9 +19,38 @@ limitations under the License. --> # ToString + + +localStorage.setItem('language', 'language-py') + + Transforms every element in an input collection a string. -## Examples -See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. +## Example + +Any non-string element can be converted to a string using sandard Python functions and methods. Review comment: Apparently, there's [`util.ToString`](https://beam.apache.org/releases/pydoc/current/_modules/apache_beam/transforms/util.html#ToString). I opened #9433 to update the code samples to use it. I also updated this page to use the new snippets, but they won't render correctly until the new code samples are merged. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301461) Time Spent: 49h 20m (was: 49h 10m) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 49h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301460=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301460 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 26/Aug/19 20:26 Start Date: 26/Aug/19 20:26 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9289: [BEAM-7389] Add code examples for ToString page URL: https://github.com/apache/beam/pull/9289#discussion_r317782957 ## File path: website/src/documentation/transforms/python/element-wise/tostring.md ## @@ -19,9 +19,38 @@ limitations under the License. --> # ToString -Transforms every element in an input collection a string. -## Examples -See [BEAM-7389](https://issues.apache.org/jira/browse/BEAM-7389) for updates. + +localStorage.setItem('language', 'language-py') + -## Related transforms \ No newline at end of file +Transforms every element in an input collection to a string. + +## Example + +Any non-string element can be converted to a string using standard Python functions and methods. +Many I/O transforms, such as `TextIO`, expect their input elements to be strings. Review comment: Thanks, changed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301460) Time Spent: 49h 10m (was: 49h) > Colab examples for element-wise transforms (Python) > --- > > Key: BEAM-7389 > URL: https://issues.apache.org/jira/browse/BEAM-7389 > Project: Beam > Issue Type: Improvement > Components: website >Reporter: Rose Nguyen >Assignee: David Cavazos >Priority: Minor > Time Spent: 49h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916117#comment-16916117 ] Chamikara Jayalath commented on BEAM-8089: -- Have you tried running Dataflow in the same region as where your bucket located using option [1] ? Networks charges should not apply in this case according to [2]. [1] [https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java#L133] [2] [https://cloud.google.com/storage/pricing] > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7389) Colab examples for element-wise transforms (Python)
[ https://issues.apache.org/jira/browse/BEAM-7389?focusedWorklogId=301453=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301453 ] ASF GitHub Bot logged work on BEAM-7389: Author: ASF GitHub Bot Created on: 26/Aug/19 20:12 Start Date: 26/Aug/19 20:12 Worklog Time Spent: 10m Work Description: davidcavazos commented on pull request #9433: [BEAM-7389] Update to use util.ToString transform URL: https://github.com/apache/beam/pull/9433 Update the code sample for `ToString` to use the [util.ToString](https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.util.html#apache_beam.transforms.util.ToString) transform. R: @aaltay Can you take a look whenever you have a chance? Thanks! Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[jira] [Work logged] (BEAM-7864) Portable Spark Reshuffle coder cast exception
[ https://issues.apache.org/jira/browse/BEAM-7864?focusedWorklogId=301452=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301452 ] ASF GitHub Bot logged work on BEAM-7864: Author: ASF GitHub Bot Created on: 26/Aug/19 20:08 Start Date: 26/Aug/19 20:08 Worklog Time Spent: 10m Work Description: ibzib commented on issue #9410: [BEAM-7864] Fix Spark reshuffle translation with Python SDK URL: https://github.com/apache/beam/pull/9410#issuecomment-525011659 Thanks for the review @RyanSkraba. In that case, I will change the shared translation instead of just the portable runner's. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301452) Time Spent: 1h (was: 50m) > Portable Spark Reshuffle coder cast exception > - > > Key: BEAM-7864 > URL: https://issues.apache.org/jira/browse/BEAM-7864 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Kyle Weaver >Assignee: Kyle Weaver >Priority: Major > Labels: portability-spark > Time Spent: 1h > Remaining Estimate: 0h > > running :sdks:python:test-suites:portable:py35:portableWordCountBatch in > either loopback or docker mode on master fails with exception: > > java.lang.ClassCastException: org.apache.beam.sdk.coders.LengthPrefixCoder > cannot be cast to org.apache.beam.sdk.coders.KvCoder > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translateReshuffle(SparkBatchPortablePipelineTranslator.java:400) > at > org.apache.beam.runners.spark.translation.SparkBatchPortablePipelineTranslator.translate(SparkBatchPortablePipelineTranslator.java:147) > at > org.apache.beam.runners.spark.SparkPipelineRunner.lambda$run$1(SparkPipelineRunner.java:96) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8014) Use OffsetRange as restriction for OffsetRestrictionTracker
[ https://issues.apache.org/jira/browse/BEAM-8014?focusedWorklogId=301445=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301445 ] ASF GitHub Bot logged work on BEAM-8014: Author: ASF GitHub Bot Created on: 26/Aug/19 19:41 Start Date: 26/Aug/19 19:41 Worklog Time Spent: 10m Work Description: boyuanzz commented on issue #9376: [BEAM-8014] Using OffsetRange as restriction for OffsetRestrictionTracker URL: https://github.com/apache/beam/pull/9376#issuecomment-525001614 PTAL @robertwb Thanks for your help : D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301445) Time Spent: 1h 10m (was: 1h) > Use OffsetRange as restriction for OffsetRestrictionTracker > --- > > Key: BEAM-8014 > URL: https://issues.apache.org/jira/browse/BEAM-8014 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Reporter: Boyuan Zhang >Assignee: Boyuan Zhang >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916053#comment-16916053 ] Harshit Dwivedi commented on BEAM-8089: --- The data ingested into GCS is around 250Gb for us per day, so we are incurring a lot of network charges. I wanted to avoid this charge by storing everything in Dataflow PD instead of GCS. > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916052#comment-16916052 ] Chamikara Jayalath commented on BEAM-8089: -- BTW may I ask why you cannot use GCS in this case ? Dataflow already needs GCS to run and storage costs should be minimum. > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916051#comment-16916051 ] Chamikara Jayalath commented on BEAM-8089: -- I don't think we can fork Beam code for a very specific scenario of the Dataflow runner (single worker with autoscaling disabled). In general, Dataflow does not fuse the step that write files and the step that execute the BQ job so these two steps may not execute in the same worker. > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301418=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301418 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 18:36 Start Date: 26/Aug/19 18:36 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#discussion_r317739727 ## File path: .test-infra/jenkins/job_Release_Gradle_Build.groovy ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties + +job('beam_Release_Gradle_Build') { Review comment: comments updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301418) Time Spent: 1h 40m (was: 1.5h) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 40m > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301417=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301417 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 18:35 Start Date: 26/Aug/19 18:35 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#issuecomment-524976528 Run Release Gradle Build This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301417) Time Spent: 1.5h (was: 1h 20m) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1.5h > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301414=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301414 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 26/Aug/19 18:31 Start Date: 26/Aug/19 18:31 Worklog Time Spent: 10m Work Description: mxm commented on issue #9418: [BEAM-5428][WIP] State caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#issuecomment-524975017 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301414) Time Spent: 5h 10m (was: 5h) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 5h 10m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301415=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301415 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 26/Aug/19 18:31 Start Date: 26/Aug/19 18:31 Worklog Time Spent: 10m Work Description: mxm commented on issue #9418: [BEAM-5428][WIP] State caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#issuecomment-524975017 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301415) Time Spent: 5h 20m (was: 5h 10m) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 5h 20m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8093) py36-gcp is missing from the current tox.ini
[ https://issues.apache.org/jira/browse/BEAM-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valentyn Tymofieiev updated BEAM-8093: -- Status: Open (was: Triage Needed) > py36-gcp is missing from the current tox.ini > > > Key: BEAM-8093 > URL: https://issues.apache.org/jira/browse/BEAM-8093 > Project: Beam > Issue Type: Sub-task > Components: testing >Affects Versions: 2.16.0 >Reporter: Valentyn Tymofieiev >Priority: Major > > cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5663) Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)
[ https://issues.apache.org/jira/browse/BEAM-5663?focusedWorklogId=301410=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301410 ] ASF GitHub Bot logged work on BEAM-5663: Author: ASF GitHub Bot Created on: 26/Aug/19 18:29 Start Date: 26/Aug/19 18:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7988: [BEAM-5663] Add Python 3.6 tox environment URL: https://github.com/apache/beam/pull/7988#issuecomment-524974148 It's an oversight, thanks for calling out. Opened https://issues.apache.org/jira/browse/BEAM-8093 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301410) Time Spent: 5h 40m (was: 5.5h) > Add tox suites for various Python 3 versions (3.5, 3.6, 3.7) > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 5h 40m > Remaining Estimate: 0h > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5663) Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)
[ https://issues.apache.org/jira/browse/BEAM-5663?focusedWorklogId=301411=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301411 ] ASF GitHub Bot logged work on BEAM-5663: Author: ASF GitHub Bot Created on: 26/Aug/19 18:29 Start Date: 26/Aug/19 18:29 Worklog Time Spent: 10m Work Description: tvalentyn commented on issue #7988: [BEAM-5663] Add Python 3.6 tox environment URL: https://github.com/apache/beam/pull/7988#issuecomment-524974148 It's an oversight, thanks for calling out, @udim. Opened https://issues.apache.org/jira/browse/BEAM-8093 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301411) Time Spent: 5h 50m (was: 5h 40m) > Add tox suites for various Python 3 versions (3.5, 3.6, 3.7) > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (BEAM-8093) py36-gcp is missing from the current tox.ini
Valentyn Tymofieiev created BEAM-8093: - Summary: py36-gcp is missing from the current tox.ini Key: BEAM-8093 URL: https://issues.apache.org/jira/browse/BEAM-8093 Project: Beam Issue Type: Sub-task Components: testing Affects Versions: 2.16.0 Reporter: Valentyn Tymofieiev cc: [~udim] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916037#comment-16916037 ] Harshit Dwivedi commented on BEAM-8089: --- For my use-case, I have a single worker and since that runs on a single VM, would it be possible to implement this? I have disable autoscaling for my use case and hence the Dataflow job will always run on a single VM. > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301406=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301406 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 18:24 Start Date: 26/Aug/19 18:24 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317731169 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 27h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301405=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301405 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 18:24 Start Date: 26/Aug/19 18:24 Worklog Time Spent: 10m Work Description: boyuanzz commented on pull request #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#discussion_r317730080 ## File path: sdks/java/extensions/zetasketch/src/test/java/org/apache/beam/sdk/extensions/zetasketch/BigQueryHllSketchCompatibilityIT.java ## @@ -49,26 +57,74 @@ @RunWith(JUnit4.class) public class BigQueryHllSketchCompatibilityIT { - private static final String DATASET_NAME = "zetasketch_compatibility_test"; + private static final String APP_NAME; + private static final String PROJECT_ID; + private static final String DATASET_ID; - // Table for testReadSketchFromBigQuery() + private static final List TEST_DATA = + Arrays.asList("Apple", "Orange", "Banana", "Orange"); + + // Data Table: used by testReadSketchFromBigQuery()) // Schema: only one STRING field named "data". // Content: prepopulated with 4 rows: "Apple", "Orange", "Banana", "Orange" - private static final String DATA_TABLE_NAME = "hll_data"; + private static final String DATA_TABLE_ID = "hll_data"; private static final String DATA_FIELD_NAME = "data"; + private static final String DATA_FIELD_TYPE = "STRING"; private static final String QUERY_RESULT_FIELD_NAME = "sketch"; private static final Long EXPECTED_COUNT = 3L; - // Table for testWriteSketchToBigQuery() + // Sketch Table: used by testWriteSketchToBigQuery() // Schema: only one BYTES field named "sketch". // Content: will be overridden by the sketch computed by the test pipeline each time the test runs - private static final String SKETCH_TABLE_NAME = "hll_sketch"; + private static final String SKETCH_TABLE_ID = "hll_sketch"; private static final String SKETCH_FIELD_NAME = "sketch"; - private static final List TEST_DATA = - Arrays.asList("Apple", "Orange", "Banana", "Orange"); + private static final String SKETCH_FIELD_TYPE = "BYTES"; // SHA-1 hash of string "[3]", the string representation of a row that has only one field 3 in it private static final String EXPECTED_CHECKSUM = "f1e31df9806ce94c5bdbbfff9608324930f4d3f1"; + static { +ApplicationNameOptions options = +TestPipeline.testingPipelineOptions().as(ApplicationNameOptions.class); +APP_NAME = options.getAppName(); +PROJECT_ID = options.as(GcpOptions.class).getProject(); +DATASET_ID = String.format("zetasketch_%tY_% A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 27.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7013) A new count distinct transform based on BigQuery compatible HyperLogLog++ implementation
[ https://issues.apache.org/jira/browse/BEAM-7013?focusedWorklogId=301403=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301403 ] ASF GitHub Bot logged work on BEAM-7013: Author: ASF GitHub Bot Created on: 26/Aug/19 18:21 Start Date: 26/Aug/19 18:21 Worklog Time Spent: 10m Work Description: robinyqiu commented on issue #9144: [BEAM-7013] Integrating ZetaSketch's HLL++ algorithm with Beam URL: https://github.com/apache/beam/pull/9144#issuecomment-524971095 Run Java PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301403) Time Spent: 27h 20m (was: 27h 10m) > A new count distinct transform based on BigQuery compatible HyperLogLog++ > implementation > > > Key: BEAM-7013 > URL: https://issues.apache.org/jira/browse/BEAM-7013 > Project: Beam > Issue Type: New Feature > Components: extensions-java-sketching, sdk-java-core >Reporter: Yueyang Qiu >Assignee: Yueyang Qiu >Priority: Major > Fix For: 2.16.0 > > Time Spent: 27h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916033#comment-16916033 ] Chamikara Jayalath commented on BEAM-8089: -- True, seems like this is supported in a limited way (wildcards not supported for example). I think Beam will have a hard time supporting this since most Beam runners are distributed and use multiple nodes to write data (to files) in parallel. So there's no "single" local disk. This is why we use a distributed storage location to which all workers have access to write individual files (a directory in GCS in this case) and execute a single BQ load job for all files from there. > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916022#comment-16916022 ] Harshit Dwivedi commented on BEAM-8089: --- But the BQ documentation says that this can be done, [https://cloud.google.com/bigquery/docs/loading-data-local] > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301395=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301395 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 26/Aug/19 17:59 Start Date: 26/Aug/19 17:59 Worklog Time Spent: 10m Work Description: rakeshcusat commented on pull request #9418: [BEAM-5428][WIP] State caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r317723117 ## File path: sdks/python/apache_beam/runners/worker/statecache.py ## @@ -0,0 +1,103 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +"""TODO mxm""" +import collections +from threading import Lock + +# (cache_token, value) + + +class StateCache(object): + + def __init__(self, max_entries): +self._cache = LRUCache(max_entries) +self._lock = Lock() + + def get(self, state_key, cache_tokens): +#assert isinstance(state_key, StateKey.__class__) +with self._lock: + cache_entry = self._cache.get(state_key) +if cache_entry: + token, value = cache_entry + return value if token in cache_tokens else None +else: + return None + + def put(self, state_key, cache_token, value): +#assert isinstance(state_key, StateKey.__class__) +with self._lock: + return self._cache.put(state_key, (cache_token, value)) + + def clear(self, state_key): +#assert isinstance(state_key, StateKey.__class__) +with self._lock: + self._cache.clear(state_key) + + def __len__(self): +return len(self._cache) + + +class LRUCache(object): + + def __init__(self, max_entries): +self._maxEntries = max_entries +self._cache = collections.OrderedDict() + + def get(self, key): +value = self._cache.pop(key, None) +if value: + self._cache[key] = value +return value + + def put(self, key, value): +while len(self._cache) >= self._maxEntries: + self._cache.popitem(last=False) +self._cache[key] = value + + def clear(self, key): +self._cache.pop(key, None) + + def __len__(self): +return len(self._cache) + + +class CacheStateKey(object): + + def __init__(self, transform_id, state_id, window, key): +self.transform_id = transform_id +self.state_id = state_id +self.window = window +self.key = key + + def __eq__(self, other): +return (isinstance(other, self.__class__) and Review comment: `state_key.SerializeToString()` can give you an inconsistent result on Python3. So I would avoid this method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301395) Time Spent: 5h (was: 4h 50m) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 5h > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=301394=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301394 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 26/Aug/19 17:58 Start Date: 26/Aug/19 17:58 Worklog Time Spent: 10m Work Description: robertwb commented on issue #9351: [BEAM-7909] support customized container for Python URL: https://github.com/apache/beam/pull/9351#issuecomment-524962193 Would this PR better be named "Python 3 containers" or similar? I don't see customized container specific code here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301394) Time Spent: 8h (was: 7h 50m) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5428) Implement cross-bundle state caching.
[ https://issues.apache.org/jira/browse/BEAM-5428?focusedWorklogId=301393=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301393 ] ASF GitHub Bot logged work on BEAM-5428: Author: ASF GitHub Bot Created on: 26/Aug/19 17:57 Start Date: 26/Aug/19 17:57 Worklog Time Spent: 10m Work Description: rakeshcusat commented on pull request #9418: [BEAM-5428][WIP] State caching in the Python SDK URL: https://github.com/apache/beam/pull/9418#discussion_r317722482 ## File path: sdks/python/apache_beam/runners/worker/sdk_worker.py ## @@ -464,6 +469,7 @@ def __init__(self, credentials=None): self._lock = threading.Lock() self._throwing_state_handler = ThrowingStateHandler() self._credentials = credentials +self.state_cache = StateCache(100) Review comment: sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301393) Time Spent: 4h 50m (was: 4h 40m) > Implement cross-bundle state caching. > - > > Key: BEAM-5428 > URL: https://issues.apache.org/jira/browse/BEAM-5428 > Project: Beam > Issue Type: Improvement > Components: sdk-py-harness >Reporter: Robert Bradshaw >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 4h 50m > Remaining Estimate: 0h > > Tech spec: > [https://docs.google.com/document/d/1BOozW0bzBuz4oHJEuZNDOHdzaV5Y56ix58Ozrqm2jFg/edit#heading=h.7ghoih5aig5m] > Relevant document: > [https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit#|https://docs.google.com/document/d/1ltVqIW0XxUXI6grp17TgeyIybk3-nDF8a0-Nqw-s9mY/edit] > Mailing list link: > [https://lists.apache.org/thread.html/caa8d9bc6ca871d13de2c5e6ba07fdc76f85d26497d95d90893aa1f6@%3Cdev.beam.apache.org%3E] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Closed] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date
[ https://issues.apache.org/jira/browse/BEAM-8080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gleb Kanterov closed BEAM-8080. --- Resolution: Fixed > java.lang.NoClassDefFoundError: > org/apache/beam/repackaged/sql/com/google/type/Date > --- > > Key: BEAM-8080 > URL: https://issues.apache.org/jira/browse/BEAM-8080 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Critical > Fix For: 2.16.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {code} > java.lang.NoClassDefFoundError: > org/apache/beam/repackaged/sql/com/google/type/Date > at > org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380) > at > org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301380=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301380 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 17:42 Start Date: 26/Aug/19 17:42 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#issuecomment-524956129 Run Seed Job This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301380) Time Spent: 1h 20m (was: 1h 10m) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301379=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301379 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 17:42 Start Date: 26/Aug/19 17:42 Worklog Time Spent: 10m Work Description: markflyhigh commented on issue #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#issuecomment-524955951 release_build job took longer than Jenkins timeout. So we may want to remove `--no-parallel` to make the build faster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301379) Time Spent: 1h 10m (was: 1h) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h 10m > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8089) Error while using customGcsTempLocation() with Dataflow
[ https://issues.apache.org/jira/browse/BEAM-8089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915992#comment-16915992 ] Chamikara Jayalath commented on BEAM-8089: -- BQ cannot execute load jobs from local files. Files have to be in GCS. So I think this is working as intended. > Error while using customGcsTempLocation() with Dataflow > --- > > Key: BEAM-8089 > URL: https://issues.apache.org/jira/browse/BEAM-8089 > Project: Beam > Issue Type: Bug > Components: io-java-gcp >Affects Versions: 2.13.0 >Reporter: Harshit Dwivedi >Assignee: Chamikara Jayalath >Priority: Major > > I have the following code snippet which writes content to BigQuery via File > Loads. > Currently the files are being written to a GCS Bucket, but I want to write > them to the local file storage of Dataflow instead and want BigQuery to load > data from there. > > > > {code:java} > BigQueryIO > .writeTableRows() > .withNumFileShards(100) > .withTriggeringFrequency(Duration.standardSeconds(90)) > .withMethod(BigQueryIO.Write.Method.FILE_LOADS) > .withSchema(getSchema()) > .withoutValidation() > .withCustomGcsTempLocation(new ValueProvider() { > @Override > public String get(){ > return "/home/harshit/testFiles"; > } > @Override > public boolean isAccessible(){ > return true; > }}) > .withTimePartitioning(new TimePartitioning().setType("DAY")) > .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) > .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) > .to(tableName)); > {code} > > > On running this, I don't see any files being written to the provided path and > the BQ load jobs fail with an IOException. > > I looked at the docs, but I was unable to find any working example for this. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-7713) Migrate to "typing" module typing types in Beam typehints (on Py2 and Py3).
[ https://issues.apache.org/jira/browse/BEAM-7713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915991#comment-16915991 ] Valentyn Tymofieiev commented on BEAM-7713: --- Hey [~udim], could you please check what's the status of this issue? Thank you! > Migrate to "typing" module typing types in Beam typehints (on Py2 and Py3). > --- > > Key: BEAM-7713 > URL: https://issues.apache.org/jira/browse/BEAM-7713 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Valentyn Tymofieiev >Assignee: Udi Meiri >Priority: Major > Fix For: 2.16.0 > > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-8042) Parsing of aggregate query fails
[ https://issues.apache.org/jira/browse/BEAM-8042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915989#comment-16915989 ] Rui Wang commented on BEAM-8042: I see. It's triggered on the empty input table. > Parsing of aggregate query fails > > > Key: BEAM-8042 > URL: https://issues.apache.org/jira/browse/BEAM-8042 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Critical > > {code} > @Rule > public TestPipeline pipeline = > TestPipeline.fromOptions(createPipelineOptions()); > private static PipelineOptions createPipelineOptions() { > BeamSqlPipelineOptions opts = > PipelineOptionsFactory.create().as(BeamSqlPipelineOptions.class); > opts.setPlannerName(ZetaSQLQueryPlanner.class.getName()); > return opts; > } > @Test > public void testAggregate() { > Schema inputSchema = Schema.builder() > .addByteArrayField("id") > .addInt64Field("has_f1") > .addInt64Field("has_f2") > .addInt64Field("has_f3") > .addInt64Field("has_f4") > .addInt64Field("has_f5") > .addInt64Field("has_f6") > .build(); > String sql = "SELECT \n" + > " id, \n" + > " COUNT(*) as count, \n" + > " SUM(has_f1) as f1_count, \n" + > " SUM(has_f2) as f2_count, \n" + > " SUM(has_f3) as f3_count, \n" + > " SUM(has_f4) as f4_count, \n" + > " SUM(has_f5) as f5_count, \n" + > " SUM(has_f6) as f6_count \n" + > "FROM PCOLLECTION \n" + > "GROUP BY id"; > pipeline > .apply(Create.empty(inputSchema)) > .apply(SqlTransform.query(sql)); > pipeline.run(); > } > {code} > {code} > Caused by: java.lang.RuntimeException: Error while applying rule > AggregateProjectMergeRule, args > [rel#553:LogicalAggregate.NONE(input=RelSubset#552,group={0},f1=COUNT(),f2=SUM($2),f3=SUM($3),f4=SUM($4),f5=SUM($5),f6=SUM($6),f7=SUM($7)), > > rel#551:LogicalProject.NONE(input=RelSubset#550,key=$0,f1=$1,f2=$2,f3=$3,f4=$4,f5=$5,f6=$6)] > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:232) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:637) > at > org.apache.beam.repackaged.sql.org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:340) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.transform(ZetaSQLPlannerImpl.java:168) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:99) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:104) > at > ... 39 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.beam.repackaged.sql.com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:58) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.apply(AggregateProjectMergeRule.java:96) > at > org.apache.beam.repackaged.sql.org.apache.calcite.rel.rules.AggregateProjectMergeRule.onMatch(AggregateProjectMergeRule.java:73) > at > org.apache.beam.repackaged.sql.org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:205) > ... 48 more > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8079) Move verify_release_build.sh to Jenkins job
[ https://issues.apache.org/jira/browse/BEAM-8079?focusedWorklogId=301372=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301372 ] ASF GitHub Bot logged work on BEAM-8079: Author: ASF GitHub Bot Created on: 26/Aug/19 17:29 Start Date: 26/Aug/19 17:29 Worklog Time Spent: 10m Work Description: markflyhigh commented on pull request #9411: [BEAM-8079] Move release Gradle build to a Jenkins job (Part - 1) URL: https://github.com/apache/beam/pull/9411#discussion_r317710049 ## File path: .test-infra/jenkins/job_Release_Gradle_Build.groovy ## @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import CommonJobProperties as commonJobProperties + +job('beam_Release_Gradle_Build') { Review comment: Would add a comment here. As for release guide, I'd love to update it with some changes to `verify_release_branch.sh`. That will be the second part right following this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301372) Time Spent: 1h (was: 50m) > Move verify_release_build.sh to Jenkins job > --- > > Key: BEAM-8079 > URL: https://issues.apache.org/jira/browse/BEAM-8079 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > verify_release_build.sh is used for validation after release branch is cut. > Basically it does two things: 1. verify Gradle build with -PisRelease turned > on. 2. create a PR and run all PostCommit jobs against release branch. > However, release manager got many painpoints when running this script: > 1. A lot of environment setup and some of tooling install easily broke the > script. > 2. Running Gradle build locally too extremely long time. > 3. Auto-pr-creation (use hub) doesn't work. > We can move Gradle build to Jenkins in order to get rid of environment setup > work. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-5663) Add tox suites for various Python 3 versions (3.5, 3.6, 3.7)
[ https://issues.apache.org/jira/browse/BEAM-5663?focusedWorklogId=301366=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301366 ] ASF GitHub Bot logged work on BEAM-5663: Author: ASF GitHub Bot Created on: 26/Aug/19 17:16 Start Date: 26/Aug/19 17:16 Worklog Time Spent: 10m Work Description: udim commented on issue #7988: [BEAM-5663] Add Python 3.6 tox environment URL: https://github.com/apache/beam/pull/7988#issuecomment-524946573 I'm going over tox.ini for a nose->pytest migration. I see that `py36-gcp` is missing from the current tox.ini. Is that an oversight or is there an open issue? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301366) Time Spent: 5.5h (was: 5h 20m) > Add tox suites for various Python 3 versions (3.5, 3.6, 3.7) > > > Key: BEAM-5663 > URL: https://issues.apache.org/jira/browse/BEAM-5663 > Project: Beam > Issue Type: Sub-task > Components: sdk-py-core >Reporter: Manu Zhang >Assignee: Robbe >Priority: Major > Fix For: Not applicable > > Time Spent: 5.5h > Remaining Estimate: 0h > > Currently, Python 3.5.2 is set up for Jenkins tests but we've seen test > failings across various Python 3 versions. It will be valuable to add tox > suites for Python 3.4, 3.5, 3.6 and 3.7 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk
[ https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=301355=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301355 ] ASF GitHub Bot logged work on BEAM-7739: Author: ASF GitHub Bot Created on: 26/Aug/19 17:08 Start Date: 26/Aug/19 17:08 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9067: [BEAM-7739] Implement ReadModifyWriteState in Python SDK URL: https://github.com/apache/beam/pull/9067#discussion_r317697628 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java ## @@ -166,9 +172,62 @@ public WatermarkHoldState bindWatermark( } } - /** An {@link InMemoryState} implementation of {@link ValueState}. */ + + /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. */ + public static final class InMemoryReadModifyWrite + implements ReadModifyWriteState, InMemoryState> { +private final Coder coder; + +private boolean isCleared = true; +private @Nullable T value = null; + +public InMemoryReadModifyWrite(Coder coder) { + this.coder = coder; +} + +@Override +public void clear() { + // Even though we're clearing we can't remove this from the in-memory state map, since + // other users may already have a handle on this Value. + value = null; + isCleared = true; +} + +@Override +public InMemoryReadModifyWrite readLater() { + return this; +} + +@Override +public T read() { + return value; +} + +@Override +public void write(T input) { + isCleared = false; + this.value = input; +} + +@Override +public InMemoryReadModifyWrite copy() { + InMemoryReadModifyWrite that = new InMemoryReadModifyWrite<>(coder); + if (!this.isCleared) { +that.isCleared = this.isCleared; +that.value = uncheckedClone(coder, this.value); + } + return that; +} + +@Override +public boolean isCleared() { + return isCleared; +} + } + + /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. */ public static final class InMemoryValue - implements ValueState, InMemoryState> { + implements ValueState, InMemoryState> { Review comment: Seems like this could be mostly merged with InMemoryReadModifyWrite (possibly requiring a common baseclass). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301355) Time Spent: 3h 10m (was: 3h) > Add ValueState in Python sdk > > > Key: BEAM-7739 > URL: https://issues.apache.org/jira/browse/BEAM-7739 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rakesh Kumar >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently ValueState is missing from Python Sdks but it is existing in Java > sdks. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk
[ https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=301357=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301357 ] ASF GitHub Bot logged work on BEAM-7739: Author: ASF GitHub Bot Created on: 26/Aug/19 17:08 Start Date: 26/Aug/19 17:08 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9067: [BEAM-7739] Implement ReadModifyWriteState in Python SDK URL: https://github.com/apache/beam/pull/9067#discussion_r317698856 ## File path: runners/core-java/src/main/java/org/apache/beam/runners/core/InMemoryStateInternals.java ## @@ -166,9 +172,62 @@ public WatermarkHoldState bindWatermark( } } - /** An {@link InMemoryState} implementation of {@link ValueState}. */ + + /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. */ + public static final class InMemoryReadModifyWrite + implements ReadModifyWriteState, InMemoryState> { +private final Coder coder; + +private boolean isCleared = true; +private @Nullable T value = null; + +public InMemoryReadModifyWrite(Coder coder) { + this.coder = coder; +} + +@Override +public void clear() { + // Even though we're clearing we can't remove this from the in-memory state map, since + // other users may already have a handle on this Value. + value = null; + isCleared = true; +} + +@Override +public InMemoryReadModifyWrite readLater() { + return this; +} + +@Override +public T read() { + return value; +} + +@Override +public void write(T input) { + isCleared = false; + this.value = input; +} + +@Override +public InMemoryReadModifyWrite copy() { + InMemoryReadModifyWrite that = new InMemoryReadModifyWrite<>(coder); + if (!this.isCleared) { +that.isCleared = this.isCleared; +that.value = uncheckedClone(coder, this.value); + } + return that; +} + +@Override +public boolean isCleared() { + return isCleared; +} + } + + /** An {@link InMemoryState} implementation of {@link ReadModifyWriteState}. */ public static final class InMemoryValue - implements ValueState, InMemoryState> { + implements ValueState, InMemoryState> { Review comment: Actually, I wonder if InMemoryReadModifyWrite was a subclass of InMemoryValue one could get rid of some of the parallel code here (e.g. bindValue vs. bindReadModifyWrite). Haven't pursued this to its conclusion though. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301357) Time Spent: 3h 20m (was: 3h 10m) > Add ValueState in Python sdk > > > Key: BEAM-7739 > URL: https://issues.apache.org/jira/browse/BEAM-7739 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rakesh Kumar >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Currently ValueState is missing from Python Sdks but it is existing in Java > sdks. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7739) Add ValueState in Python sdk
[ https://issues.apache.org/jira/browse/BEAM-7739?focusedWorklogId=301356=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301356 ] ASF GitHub Bot logged work on BEAM-7739: Author: ASF GitHub Bot Created on: 26/Aug/19 17:08 Start Date: 26/Aug/19 17:08 Worklog Time Spent: 10m Work Description: robertwb commented on pull request #9067: [BEAM-7739] Implement ReadModifyWriteState in Python SDK URL: https://github.com/apache/beam/pull/9067#discussion_r317699641 ## File path: runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/state/FlinkBroadcastStateInternals.java ## @@ -273,14 +280,73 @@ void clearInternal() { } private class FlinkBroadcastValueState extends AbstractBroadcastState - implements ValueState { + implements ValueState { Review comment: Similarly, perhaps ReadModifyWriteState could extend ValueState? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301356) Time Spent: 3h 10m (was: 3h) > Add ValueState in Python sdk > > > Key: BEAM-7739 > URL: https://issues.apache.org/jira/browse/BEAM-7739 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Rakesh Kumar >Assignee: Rakesh Kumar >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > Currently ValueState is missing from Python Sdks but it is existing in Java > sdks. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date
[ https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=301350=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301350 ] ASF GitHub Bot logged work on BEAM-8080: Author: ASF GitHub Bot Created on: 26/Aug/19 17:01 Start Date: 26/Aug/19 17:01 Worklog Time Spent: 10m Work Description: amaliujia commented on pull request #9414: [BEAM-8080] [SQL] Fix relocation of com.google.types URL: https://github.com/apache/beam/pull/9414 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301350) Time Spent: 1.5h (was: 1h 20m) > java.lang.NoClassDefFoundError: > org/apache/beam/repackaged/sql/com/google/type/Date > --- > > Key: BEAM-8080 > URL: https://issues.apache.org/jira/browse/BEAM-8080 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Critical > Fix For: 2.16.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > {code} > java.lang.NoClassDefFoundError: > org/apache/beam/repackaged/sql/com/google/type/Date > at > org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380) > at > org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-8080) java.lang.NoClassDefFoundError: org/apache/beam/repackaged/sql/com/google/type/Date
[ https://issues.apache.org/jira/browse/BEAM-8080?focusedWorklogId=301349=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301349 ] ASF GitHub Bot logged work on BEAM-8080: Author: ASF GitHub Bot Created on: 26/Aug/19 17:01 Start Date: 26/Aug/19 17:01 Worklog Time Spent: 10m Work Description: amaliujia commented on issue #9414: [BEAM-8080] [SQL] Fix relocation of com.google.types URL: https://github.com/apache/beam/pull/9414#issuecomment-524941037 After we have vendor calcite in place, Beam ZetaSQL should be extracted into its own module. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301349) Time Spent: 1h 20m (was: 1h 10m) > java.lang.NoClassDefFoundError: > org/apache/beam/repackaged/sql/com/google/type/Date > --- > > Key: BEAM-8080 > URL: https://issues.apache.org/jira/browse/BEAM-8080 > Project: Beam > Issue Type: Sub-task > Components: dsl-sql-zetasql >Reporter: Gleb Kanterov >Assignee: Gleb Kanterov >Priority: Critical > Fix For: 2.16.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > {code} > java.lang.NoClassDefFoundError: > org/apache/beam/repackaged/sql/com/google/type/Date > at > org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.processGetBuiltinFunctionsResponse(SimpleCatalog.java:380) > at > org.apache.beam.repackaged.sql.com.google.zetasql.SimpleCatalog.addZetaSQLFunctions(SimpleCatalog.java:365) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.addBuiltinFunctionsToCatalog(SqlAnalyzer.java:146) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.createPopulatedCatalog(SqlAnalyzer.java:130) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer.analyze(SqlAnalyzer.java:90) > at > org.apache.beam.sdk.extensions.sql.zetasql.SqlAnalyzer$Builder.analyze(SqlAnalyzer.java:275) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLPlannerImpl.rel(ZetaSQLPlannerImpl.java:135) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:92) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.parseQuery(ZetaSQLQueryPlanner.java:87) > at > org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner.convertToBeamRel(ZetaSQLQueryPlanner.java:66) > at > org.apache.beam.sdk.extensions.sql.impl.BeamSqlEnv.parseQuery(BeamSqlEnv.java:103) > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7909) Write integration tests to test customized containers
[ https://issues.apache.org/jira/browse/BEAM-7909?focusedWorklogId=301348=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301348 ] ASF GitHub Bot logged work on BEAM-7909: Author: ASF GitHub Bot Created on: 26/Aug/19 16:58 Start Date: 26/Aug/19 16:58 Worklog Time Spent: 10m Work Description: Hannah-Jiang commented on issue #9351: [BEAM-7909] support customized container for Python URL: https://github.com/apache/beam/pull/9351#issuecomment-524939709 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301348) Time Spent: 7h 50m (was: 7h 40m) > Write integration tests to test customized containers > - > > Key: BEAM-7909 > URL: https://issues.apache.org/jira/browse/BEAM-7909 > Project: Beam > Issue Type: Sub-task > Components: build-system >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > Time Spent: 7h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments
[ https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915944#comment-16915944 ] Pablo Estrada commented on BEAM-5495: - This has not been addressed I guess, right? > PipelineResources algorithm is not working in most environments > --- > > Key: BEAM-5495 > URL: https://issues.apache.org/jira/browse/BEAM-5495 > Project: Beam > Issue Type: Bug > Components: runner-flink, runner-spark, sdk-java-core >Reporter: Romain Manni-Bucau >Priority: Major > > Issue are: > 1. it assumes the classloader is an URLClassLoader (not always true and java > >= 9 breaks that as well for the app loader) > 2. it uses loader.getURLs() which leads to including the JRE itself in the > staged file > Looks like this detect resource algorithm can't work and should be replaced > by a SPI rather than a built-in and not extensible algorithm. Another valid > alternative is to just drop that "guess" logic and force the user to set > staged files. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7730) Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9
[ https://issues.apache.org/jira/browse/BEAM-7730?focusedWorklogId=301323=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301323 ] ASF GitHub Bot logged work on BEAM-7730: Author: ASF GitHub Bot Created on: 26/Aug/19 16:11 Start Date: 26/Aug/19 16:11 Worklog Time Spent: 10m Work Description: dmvk commented on issue #9296: WIP: [BEAM-7730] Introduce Flink 1.9 Runner URL: https://github.com/apache/beam/pull/9296#issuecomment-524922978 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301323) Time Spent: 0.5h (was: 20m) > Add Flink 1.9 build target and Make FlinkRunner compatible with Flink 1.9 > - > > Key: BEAM-7730 > URL: https://issues.apache.org/jira/browse/BEAM-7730 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: sunjincheng >Assignee: David Moravek >Priority: Major > Fix For: 2.16.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Apache Flink 1.9 will coming and it's better to add Flink 1.9 build target > and make Flink Runner compatible with Flink 1.9. > I will add the brief changes after the Flink 1.9.0 released. > And I appreciate it if you can leave your suggestions or comments! -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-6730) Enable configuration of Java transforms (specifically IO) through other SDKs
[ https://issues.apache.org/jira/browse/BEAM-6730?focusedWorklogId=301318=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301318 ] ASF GitHub Bot logged work on BEAM-6730: Author: ASF GitHub Bot Created on: 26/Aug/19 16:00 Start Date: 26/Aug/19 16:00 Worklog Time Spent: 10m Work Description: manuelaguilar commented on issue #7875: [BEAM-6730] Support configuring transforms externally in Java SDK / Expose Java's GenerateSequence in Python URL: https://github.com/apache/beam/pull/7875#issuecomment-524919097 @mxm I'm looking to add a windowed write transform from a Python pipeline. Would registering TextIO as an external transform and wrap the specific withWindowedWrites option in Python be recommendable? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301318) Time Spent: 12h (was: 11h 50m) > Enable configuration of Java transforms (specifically IO) through other SDKs > > > Key: BEAM-6730 > URL: https://issues.apache.org/jira/browse/BEAM-6730 > Project: Beam > Issue Type: New Feature > Components: runner-flink, sdk-java-core, sdk-py-core >Reporter: Maximilian Michels >Assignee: Maximilian Michels >Priority: Major > Fix For: 2.12.0 > > Time Spent: 12h > Remaining Estimate: 0h > > Since https://github.com/apache/beam/pull/7316 we can reference external > transforms which are transforms only available in a "foreign" SDKs. This > allows us to fill the gap in terms of missing transforms in the Python and Go > SDK, specifically IO transforms. > We can start collecting/exposing transforms that Beam users need. The > following transforms could be interesting: > - KafkaIO / KinesisIO > - CassandraIO / ElasticserchIO / Hbase / Redis > - JDBC > - S3 file system > - GenerateSequence > See also https://s.apache.org/beam-cross-language-io and BEAM-6485. > CC [~robertwb] [~chamikara] [~thw] -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7978) ArithmeticExceptions on getting backlog bytes
[ https://issues.apache.org/jira/browse/BEAM-7978?focusedWorklogId=301299=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301299 ] ASF GitHub Bot logged work on BEAM-7978: Author: ASF GitHub Bot Created on: 26/Aug/19 15:52 Start Date: 26/Aug/19 15:52 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on issue #9432: [BEAM-7978] Return BACKLOG_UNKNOWN in case of unknown watermark URL: https://github.com/apache/beam/pull/9432#issuecomment-524915707 @chamikaramj Could you take a look, pls? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301299) Time Spent: 20m (was: 10m) > ArithmeticExceptions on getting backlog bytes > -- > > Key: BEAM-7978 > URL: https://issues.apache.org/jira/browse/BEAM-7978 > Project: Beam > Issue Type: Bug > Components: io-java-kinesis >Affects Versions: 2.14.0 >Reporter: Mateusz >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > Hello, > Beam 2.14.0 > (and to be more precise > [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec]) > introduced a change in watermark calculation in Kinesis IO causing below > error: > {code:java} > exception: "java.lang.RuntimeException: Unknown kinesis failure, when trying > to reach kinesis > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155) > at > org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158) > at > org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArithmeticException: Value cannot fit in an int: > 153748963401 > at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229) > at > org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141) > at > org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72) > at org.joda.time.Minutes.minutesBetween(Minutes.java:101) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210) > ... 10 more > {code} > We spotted this issue on Dataflow runner. It's problematic as inability to > get backlog bytes seems to result in constant recreation of KinesisReader. > The issue happens if the backlog bytes are retrieved before watermark value > is updated from initial default value. Easy way to reproduce it is to create > a pipeline with Kinesis source for a stream where no records are being put. > While debugging it locally, you can observe that the watermark is set to the > value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes > (default watermark idle duration threshold is set to 2 minutes) , the > watermark is set to value of > [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]), > so the next backlog bytes retrieval should be correct. However, as described > before, running the pipeline on Dataflow runner results in KinesisReader > being closed just after creation, so the watermark won't be fixed. > The reason of the issue is following: The introduced watermark policies are >
[jira] [Commented] (BEAM-7978) ArithmeticExceptions on getting backlog bytes
[ https://issues.apache.org/jira/browse/BEAM-7978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915893#comment-16915893 ] Alexey Romanenko commented on BEAM-7978: [~Juraszek] Sure, I created a PR for that > ArithmeticExceptions on getting backlog bytes > -- > > Key: BEAM-7978 > URL: https://issues.apache.org/jira/browse/BEAM-7978 > Project: Beam > Issue Type: Bug > Components: io-java-kinesis >Affects Versions: 2.14.0 >Reporter: Mateusz >Assignee: Alexey Romanenko >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Hello, > Beam 2.14.0 > (and to be more precise > [commit|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ec]) > introduced a change in watermark calculation in Kinesis IO causing below > error: > {code:java} > exception: "java.lang.RuntimeException: Unknown kinesis failure, when trying > to reach kinesis > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:227) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:167) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.getBacklogBytes(SimplifiedKinesisClient.java:155) > at > org.apache.beam.sdk.io.kinesis.KinesisReader.getTotalBacklogBytes(KinesisReader.java:158) > at > org.apache.beam.runners.dataflow.worker.StreamingModeExecutionContext.flushState(StreamingModeExecutionContext.java:433) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:1289) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker.access$1000(StreamingDataflowWorker.java:149) > at > org.apache.beam.runners.dataflow.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:1024) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ArithmeticException: Value cannot fit in an int: > 153748963401 > at org.joda.time.field.FieldUtils.safeToInt(FieldUtils.java:229) > at > org.joda.time.field.BaseDurationField.getDifference(BaseDurationField.java:141) > at > org.joda.time.base.BaseSingleFieldPeriod.between(BaseSingleFieldPeriod.java:72) > at org.joda.time.Minutes.minutesBetween(Minutes.java:101) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.lambda$getBacklogBytes$3(SimplifiedKinesisClient.java:169) > at > org.apache.beam.sdk.io.kinesis.SimplifiedKinesisClient.wrapExceptions(SimplifiedKinesisClient.java:210) > ... 10 more > {code} > We spotted this issue on Dataflow runner. It's problematic as inability to > get backlog bytes seems to result in constant recreation of KinesisReader. > The issue happens if the backlog bytes are retrieved before watermark value > is updated from initial default value. Easy way to reproduce it is to create > a pipeline with Kinesis source for a stream where no records are being put. > While debugging it locally, you can observe that the watermark is set to the > value on the past (like: "-290308-12-21T19:59:05.225Z"). After two minutes > (default watermark idle duration threshold is set to 2 minutes) , the > watermark is set to value of > [watermarkIdleThreshold|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkPolicyFactory.java#L110]), > so the next backlog bytes retrieval should be correct. However, as described > before, running the pipeline on Dataflow runner results in KinesisReader > being closed just after creation, so the watermark won't be fixed. > The reason of the issue is following: The introduced watermark policies are > relying on > [WatermarkParameters|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/WatermarkParameters.java] > which initialises currentWatermark and eventTime to > [BoundedWindow.TIMESTAMP_MIN_VALUE|https://github.com/apache/beam/commit/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad#diff-b4964a457006b1555c7042c739b405ecR52]. > This result in watermark being set to new Instant(-9223372036854775L) at the > KinesisReader creation. Calculated [period between the watermark and the > current > timestamp|https://github.com/apache/beam/blob/9fe0a0387ad1f894ef02acf32edbc9623a3b13ad/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/SimplifiedKinesisClient.java#L169] > is bigger than
[jira] [Work logged] (BEAM-7978) ArithmeticExceptions on getting backlog bytes
[ https://issues.apache.org/jira/browse/BEAM-7978?focusedWorklogId=301293=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301293 ] ASF GitHub Bot logged work on BEAM-7978: Author: ASF GitHub Bot Created on: 26/Aug/19 15:35 Start Date: 26/Aug/19 15:35 Worklog Time Spent: 10m Work Description: aromanenko-dev commented on pull request #9432: [BEAM-7978] Return BACKLOG_UNKNOWN in case of unknown watermark URL: https://github.com/apache/beam/pull/9432 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- |
[jira] [Work logged] (BEAM-7872) Simpler Flink cluster set up in load tests
[ https://issues.apache.org/jira/browse/BEAM-7872?focusedWorklogId=301289=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301289 ] ASF GitHub Bot logged work on BEAM-7872: Author: ASF GitHub Bot Created on: 26/Aug/19 15:22 Start Date: 26/Aug/19 15:22 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #9213: [BEAM-7872] Simpler Flink cluster set up in load tests URL: https://github.com/apache/beam/pull/9213#discussion_r317655439 ## File path: .test-infra/jenkins/Portability.groovy ## @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides properties for running job under the portability framework. + */ +class Portability { + static String beamRepository = 'gcr.io/apache-beam-testing/beam_portability' Review comment: Can we move this to `loadTestBuilder`? instead? Other than that, please use UPPERCASE and add final keyword (constant). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301289) Time Spent: 13h 50m (was: 13h 40m) > Simpler Flink cluster set up in load tests > -- > > Key: BEAM-7872 > URL: https://issues.apache.org/jira/browse/BEAM-7872 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 13h 50m > Remaining Estimate: 0h > > Creating a new load test running on Flink runner could be easier by providing > a single `setUp` function which would encapsulate the process of creating > Flink cluster and registering teardown steps -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Work logged] (BEAM-7872) Simpler Flink cluster set up in load tests
[ https://issues.apache.org/jira/browse/BEAM-7872?focusedWorklogId=301290=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-301290 ] ASF GitHub Bot logged work on BEAM-7872: Author: ASF GitHub Bot Created on: 26/Aug/19 15:22 Start Date: 26/Aug/19 15:22 Worklog Time Spent: 10m Work Description: lgajowy commented on pull request #9213: [BEAM-7872] Simpler Flink cluster set up in load tests URL: https://github.com/apache/beam/pull/9213#discussion_r317653246 ## File path: .test-infra/jenkins/Portability.groovy ## @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Provides properties for running job under the portability framework. + */ +class Portability { + static String beamRepository = 'gcr.io/apache-beam-testing/beam_portability' + static String flinkVersion = '1.7' Review comment: This constant is used only in :docker task name - I think it should be inlined in the task name rather than exported to a class with cosntants only This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking --- Worklog Id: (was: 301290) Time Spent: 13h 50m (was: 13h 40m) > Simpler Flink cluster set up in load tests > -- > > Key: BEAM-7872 > URL: https://issues.apache.org/jira/browse/BEAM-7872 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Kamil Wasilewski >Assignee: Kamil Wasilewski >Priority: Major > Time Spent: 13h 50m > Remaining Estimate: 0h > > Creating a new load test running on Flink runner could be easier by providing > a single `setUp` function which would encapsulate the process of creating > Flink cluster and registering teardown steps -- This message was sent by Atlassian Jira (v8.3.2#803003)