[GitHub] [beam-wheels] aaltay merged pull request #18: Deprecate repository, not used in release process anymore
aaltay merged pull request #18: URL: https://github.com/apache/beam-wheels/pull/18 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] aaltay commented on pull request #18: Deprecate repository, not used in release process anymore
aaltay commented on pull request #18: URL: https://github.com/apache/beam-wheels/pull/18#issuecomment-665798992 Nice. LGTM. @tvalentyn - Could you verify that 2.23 release did not use this repo? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] TobKed commented on pull request #18: Deprecate repository (depends on https://github.com/apache/beam/pull/12150)
TobKed commented on pull request #18: URL: https://github.com/apache/beam-wheels/pull/18#issuecomment-665693872 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] TobKed opened a new pull request #18: Deprecate repository (depends on https://github.com/apache/beam/pull/12150)
TobKed opened a new pull request #18: URL: https://github.com/apache/beam-wheels/pull/18 When GitHub Actions used to build Python Source Distribution and Wheels will be used in release process this repository may be deprecated. https://github.com/apache/beam/pull/12150 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] brucearctor opened a new pull request #17: Update README.md
brucearctor opened a new pull request #17: URL: https://github.com/apache/beam-wheels/pull/17 @aaltay @pabloem ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] ibzib commented on a change in pull request #17: Update README.md
ibzib commented on a change in pull request #17: URL: https://github.com/apache/beam-wheels/pull/17#discussion_r430645550 ## File path: README.md ## @@ -32,9 +32,11 @@ There are 2 major parts in this repository. * [travis](https://travis-ci.com/) configuration files, setups environment variables and deployment strategy. * `.travis.yml` contains a set of [environment variables](https://docs.travis-ci.com/user/environment-variables/) and steps of build process. - * `deploy_travis.sh` defines that final python wheels will be deployed to [dist.apache.org/dev](https://dist.apache.org/repos/dist/dev/beam/) -using svn with your apache credential. * `config.sh` defines custom build steps. + + +Additionally, for the release process to be complete: +* Also use [sign_hash_python_wheel.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/sign_hash_python_wheels.sh) from the main beam repo in order to deploy the final python wheels to [dist.apache.org/dev](https://dist.apache.org/repos/dist/dev/beam/) using svn with your apache credential. Review comment: ```suggestion * Also use [sign_hash_python_wheels.sh](https://github.com/apache/beam/blob/master/release/src/main/scripts/sign_hash_python_wheels.sh) from the main beam repo in order to deploy the final python wheels to [dist.apache.org/dev](https://dist.apache.org/repos/dist/dev/beam/) using svn with your apache credential. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] ibzib merged pull request #17: Update README.md
ibzib merged pull request #17: URL: https://github.com/apache/beam-wheels/pull/17 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [beam-wheels] aaltay commented on issue #15: update travis config based on https://config.travis-ci.com/explore
aaltay commented on issue #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#issuecomment-608175122 > This change has fixed the "does not trigger build" problem. Thank you! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia merged pull request #16: Fix travis config
amaliujia merged pull request #16: Fix travis config URL: https://github.com/apache/beam-wheels/pull/16 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia opened a new pull request #16: Fix travis config
amaliujia opened a new pull request #16: Fix travis config URL: https://github.com/apache/beam-wheels/pull/16 R: @boyuanzz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia commented on issue #15: update travis config based on https://config.travis-ci.com/explore
amaliujia commented on issue #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#issuecomment-608120384 This change has fixed the "does not trigger build" problem. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia merged pull request #15: update travis config based on https://config.travis-ci.com/explore
amaliujia merged pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore
amaliujia commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#discussion_r402624271 ## File path: .travis.yml ## @@ -96,9 +95,9 @@ deploy: access_key_id: ${ACCESS_KEY_ID} secret_access_key: ${SECRET_ACCESS_KEY} bucket: "beam-wheels-staging" - skip_cleanup: true Review comment: `skip_cleanup` is deprecated and it's replacement is `cleanup` I should have provided a list of warning messages from https://config.travis-ci.com/explore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] aaltay commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore
aaltay commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#discussion_r402622997 ## File path: .travis.yml ## @@ -96,9 +95,9 @@ deploy: access_key_id: ${ACCESS_KEY_ID} secret_access_key: ${SECRET_ACCESS_KEY} bucket: "beam-wheels-staging" - skip_cleanup: true Review comment: why was this true? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] aaltay commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore
aaltay commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#discussion_r402623417 ## File path: .travis.yml ## @@ -16,7 +15,7 @@ env: - BUILD_DEPENDS="Cython" - TEST_DEPENDS= -matrix: +jobs: exclude: # Exclude the default Python 3.5 build - python: 3.5 Review comment: I am not sure what this does. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore
amaliujia commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#discussion_r402622210 ## File path: .travis.yml ## @@ -16,7 +15,7 @@ env: - BUILD_DEPENDS="Cython" - TEST_DEPENDS= -matrix: +jobs: exclude: # Exclude the default Python 3.5 build - python: 3.5 Review comment: Actually this link might gives the real reason: https://docs.travis-ci.com/user/build-matrix/#explicitly-included-jobs-with-only-one-element-in-the-build-matrix This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia edited a comment on issue #15: update travis config based on https://config.travis-ci.com/explore
amaliujia edited a comment on issue #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#issuecomment-608111440 @aaltay @boyuanzz This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore
amaliujia commented on a change in pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#discussion_r402620104 ## File path: .travis.yml ## @@ -16,7 +15,7 @@ env: - BUILD_DEPENDS="Cython" - TEST_DEPENDS= -matrix: +jobs: exclude: # Exclude the default Python 3.5 build - python: 3.5 Review comment: The python version is 3.5 above but 3.5 is specified as "exclude" here. Is it the reason that making the build skipped? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia commented on issue #15: update travis config based on https://config.travis-ci.com/explore
amaliujia commented on issue #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15#issuecomment-608111440 @aaltay This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam-wheels] amaliujia opened a new pull request #15: update travis config based on https://config.travis-ci.com/explore
amaliujia opened a new pull request #15: update travis config based on https://config.travis-ci.com/explore URL: https://github.com/apache/beam-wheels/pull/15 Travis-ci now has an online tool to validate travis config: https://config.travis-ci.com/explore. I updated travis config based on the suggestions from the tool. E.g. fix warnings. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] kamilwu commented on a change in pull request #11194: [BEAM-9507] Fix python dependency check task
kamilwu commented on a change in pull request #11194: [BEAM-9507] Fix python dependency check task URL: https://github.com/apache/beam/pull/11194#discussion_r397019242 ## File path: sdks/python/build.gradle ## @@ -89,6 +89,7 @@ task depSnapshot { task dependencyUpdates { dependsOn ':dependencyUpdates' + dependsOn buildPython Review comment: I think this tasks does not install all dependencies we need. We'd like to install all extras (there are currently five of them: docs, test, gcp, interactive, aws. You can see them in `setup.py` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] kamilwu commented on a change in pull request #11194: [BEAM-9507] Fix python dependency check task
kamilwu commented on a change in pull request #11194: [BEAM-9507] Fix python dependency check task URL: https://github.com/apache/beam/pull/11194#discussion_r397019242 ## File path: sdks/python/build.gradle ## @@ -89,6 +89,7 @@ task depSnapshot { task dependencyUpdates { dependsOn ':dependencyUpdates' + dependsOn buildPython Review comment: I think this tasks does not install all dependencies we need. We'd like to install all extras (there are currently five of them: docs, test, gcp, interactive, aws. You can see them in `setup.py`) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] ihji opened a new pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java
ihji opened a new pull request #11205: [BEAM-9578] Enumerating artifacts is too expensive in Java URL: https://github.com/apache/beam/pull/11205 Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[GitHub] [beam] kamilwu commented on issue #11194: [BEAM-9507] Fix python dependency check task
kamilwu commented on issue #11194: [BEAM-9507] Fix python dependency check task URL: https://github.com/apache/beam/pull/11194#issuecomment-603115281 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] alexvanboxel commented on a change in pull request #10529: [BEAM-9044] Protobuf options to Schema options
alexvanboxel commented on a change in pull request #10529: [BEAM-9044] Protobuf options to Schema options URL: https://github.com/apache/beam/pull/10529#discussion_r396977813 ## File path: sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslator.java ## @@ -205,10 +205,12 @@ static Schema getSchema(Descriptors.Descriptor descriptor) { // Store proto field number in metadata. FieldType fieldType = withMetaData(beamFieldTypeFromProtoField(fieldDescriptor), fieldDescriptor); Review comment: Is it, I don't think so... because I told Reuven I would do metadata after this. Still I think it's a good idea... and with your comment I want to revise the URI: - `beam:option:proto:field:vptech.data.v1.description` for a proto field option - `beam:option:proto:message:vptech.data.v1.description` for a proto message option - `beam:option:proto:meta:number` for a proto field number - `beam:option:proto:meta:type_name` for a proto type name WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] stale[bot] commented on issue #10038: Simplify Python test process
stale[bot] commented on issue #10038: Simplify Python test process URL: https://github.com/apache/beam/pull/10038#issuecomment-603041838 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] stale[bot] closed pull request #10038: Simplify Python test process
stale[bot] closed pull request #10038: Simplify Python test process URL: https://github.com/apache/beam/pull/10038 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on issue #11202: asdletmedah
pabloem commented on issue #11202: asdletmedah URL: https://github.com/apache/beam/pull/11202#issuecomment-603028147 Run Portable_Python PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396853348 ## File path: sdks/go/pkg/beam/core/runtime/exec/reshuffle.go ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "bytes" + "context" + "fmt" + "io" + "math/rand" + + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" +) + +// ReshuffleInput is a Node. +type ReshuffleInput struct { + UID UnitID + SID StreamID + Coder *coder.Coder // Coder for the input PCollection. + Seed int64 + Out Node + + r*rand.Rand + enc ElementEncoder + wEnc WindowEncoder + bbytes.Buffer + // ret is a cached allocations for passing to the next Unit. Units never modify the passed in FullValue. + ret FullValue +} + +// ID returns the unit debug id. +func (n *ReshuffleInput) ID() UnitID { + return n.UID +} + +// Up initializes the value and window encoders, and the random source. +func (n *ReshuffleInput) Up(ctx context.Context) error { + n.enc = MakeElementEncoder(coder.SkipW(n.Coder)) + n.wEnc = MakeWindowEncoder(n.Coder.Window) + n.r = rand.New(rand.NewSource(n.Seed)) + return nil +} + +// StartBundle is a no-op. +func (n *ReshuffleInput) StartBundle(ctx context.Context, id string, data DataContext) error { + return MultiStartBundle(ctx, id, data, n.Out) +} + +func (n *ReshuffleInput) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { + n.b.Reset() + if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, ); err != nil { + return err + } + if err := n.enc.Encode(value, ); err != nil { + return errors.WithContextf(err, "encoding element %v with coder %v", value, n.Coder) + } + n.ret = FullValue{Elm: n.r.Int(), Elm2: n.b.Bytes(), Timestamp: value.Timestamp} + if err := n.Out.ProcessElement(ctx, ); err != nil { + return err + } + return nil Review comment: ```suggestion return n.Out.ProcessElement(ctx, ) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396887810 ## File path: sdks/go/pkg/beam/gbk.go ## @@ -95,3 +95,52 @@ func TryCoGroupByKey(s Scope, cols ...PCollection) (PCollection, error) { ret.SetCoder(NewCoder(ret.Type())) return ret, nil } + +// Reshuffle copies a PCollection of the same kind and using the same element +// coder, and maintains the same windowing information. Importantly, it allows +// the result PCollection to be processed with a different sharding, in a +// different stage than the input PCollection. +// +// For example, if a computation needs a lot of parallelism but +// produces only a small amount of output data, then the computation +// producing the data can run with as much parallelism as needed, +// while the output file is written with a smaller amount of +// parallelism, using the following pattern: +// +// pc := bigHairyComputationNeedingParallelism(scope) // PCollection +// resharded := beam.Reshard(scope, pc)// PCollection Review comment: Here and elsewhere in this comment, Reshuffle is referred to as "Reshard". I think it's fine to refer to it as a reshard informally, since that's what it functionally is, but the places where it's used as a proper noun should be switched to Reshuffle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396823199 ## File path: sdks/go/pkg/beam/core/runtime/exec/coder.go ## @@ -259,59 +298,82 @@ type customDecoder struct { dec Decoder } -func (c *customDecoder) Decode(r io.Reader) (*FullValue, error) { +func (c *customDecoder) DecodeTo(r io.Reader, fv *FullValue) error { // (1) Read length-prefixed encoded data size, err := coder.DecodeVarInt(r) if err != nil { - return nil, err + return err } data, err := ioutilx.ReadN(r, (int)(size)) if err != nil { - return nil, err + return err } // (2) Call decode val, err := c.dec.Decode(c.t, data) if err != nil { + return err + } + *fv = FullValue{Elm: val} + return err Review comment: I know this is just preserving the existing behavior, but it seems weird to `return err` here instead of `return nil`, even if it is guaranteed to be `nil` at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396886566 ## File path: sdks/go/pkg/beam/core/runtime/graphx/translate.go ## @@ -473,6 +479,164 @@ func (m *marshaller) addNode(n *graph.Node) string { return m.makeNode(id, m.coders.Add(n.Coder), n) } +// expandReshuffle translates resharding to a composite reshuffle +// transform. +// +// With proper runner support, the SDK doesn't need to do anything. +// However, we still need to provide a backup plan in terms of other +// PTransforms in the event the runner doesn't have a native implementation. +// +// In particular, the "backup plan" needs to: +// +// * Encode the windowed element, preserving timestamps. +// * Add random keys to the encoded windowed element []bytes +// * GroupByKey (in the global window). +// * Explode the resulting elements list. +// * Decode the windowed element []bytes. +// +// While a simple reshard can be written in user terms, (timestamps and windows +// are accessible to user functions) there are some framework internal +// optimizations that can be done if the framework is aware of the reshard, though +// ideally this is handled on the runner side. +// +// User code is able to write reshards, but it's easier to access +// the window coders framework side, which is critical for the reshard +// to function with unbounded inputs. +func (m *marshaller) expandReshuffle(edge NamedEdge) string { + id := edgeID(edge.Edge) + var kvCoderID, gbkCoderID string + { + kv := makeUnionCoder() + kvCoderID = m.coders.Add(kv) + gbkCoderID = m.coders.Add(coder.NewCoGBK(kv.Components)) + } + + var subtransforms []string + + in := edge.Edge.Input[0] + + origInput := m.addNode(in.From) + // We need to preserve the old windowing/triggering here + // for re-instatement after the GBK. + preservedWSId := m.pcollections[origInput].GetWindowingStrategyId() + + // Get the windowing strategy from before: + postReify := fmt.Sprintf("%v_%v_reifyts", nodeID(in.From), id) + m.makeNode(postReify, kvCoderID, in.From) + + // We need to replace postReify's windowing strategy with one appropriate + // for reshuffles. + { + wfn := window.NewGlobalWindows() + m.pcollections[postReify].WindowingStrategyId = + m.internWindowingStrategy({ + // Not segregated by time... + WindowFn: makeWindowFn(wfn), + // ...output after every element is received... + Trigger: { + // Should this be an Always trigger instead? + Trigger: _ElementCount_{ + ElementCount: _ElementCount{ + ElementCount: 1, + }, + }, + }, + // ...and after outputing, discard the output elements... + AccumulationMode: pb.AccumulationMode_DISCARDING, + // ...and since every pane should have 1 element, + // try to preserve the timestamp. + OutputTime: pb.OutputTime_EARLIEST_IN_PANE, + // Defaults copied from marshalWindowingStrategy. + // TODO(BEAM-3304): migrate to user side operations once trigger support is in. + EnvironmentId: m.addDefaultEnv(), + MergeStatus: pb.MergeStatus_NON_MERGING, + WindowCoderId: m.coders.AddWindowCoder(makeWindowCoder(wfn)), + ClosingBehavior: pb.ClosingBehavior_EMIT_IF_NONEMPTY, + AllowedLateness: 0, + OnTimeBehavior: pb.OnTimeBehavior_FIRE_ALWAYS, + }) + } + + // Inputs (i) + + inputID := fmt.Sprintf("%v_reifyts", id) + payload := { + DoFn: { + Urn: URNReshuffleInput, + Payload: []byte(protox.MustEncodeBase64({ + Urn: URNReshuffleInput, + })), + }, + } + input := { + UniqueName: inputID, + Spec: { + Urn: URNParDo, + Payload: protox.MustEncode(payload), + }, + Inputs:map[string]string{"i0": nodeID(in.From)}, + Outputs:
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396887966 ## File path: sdks/go/pkg/beam/gbk.go ## @@ -95,3 +95,52 @@ func TryCoGroupByKey(s Scope, cols ...PCollection) (PCollection, error) { ret.SetCoder(NewCoder(ret.Type())) return ret, nil } + +// Reshuffle copies a PCollection of the same kind and using the same element +// coder, and maintains the same windowing information. Importantly, it allows +// the result PCollection to be processed with a different sharding, in a +// different stage than the input PCollection. +// +// For example, if a computation needs a lot of parallelism but +// produces only a small amount of output data, then the computation +// producing the data can run with as much parallelism as needed, +// while the output file is written with a smaller amount of +// parallelism, using the following pattern: +// +// pc := bigHairyComputationNeedingParallelism(scope) // PCollection +// resharded := beam.Reshard(scope, pc)// PCollection +// +// Another use case is when one has a non-deterministic DoFn followed by one +// that performs externally-visible side effects. Inserting a Reshard +// between these DoFns ensures that retries of the second DoFn will always be +// the same, which is necessary to make side effects idempotent. +// +// A Reshuffle will force a break in the optimized pipeline. Consequently, +// this operation should be used sparingly, only after determining that the +// pipeline without reshard is broken in some way and performing an extra +// operation is worth the cost. +func Reshuffle(s Scope, col PCollection) PCollection { + return Must(TryReshuffle(s, col)) +} + +// TryReshuffle inserts a Reshard into the pipeline, and returns an error if Review comment: Same as previous comment, using Reshard instead of Reshuffle. The error message a few lines below also does that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396855047 ## File path: sdks/go/pkg/beam/core/runtime/exec/reshuffle.go ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "bytes" + "context" + "fmt" + "io" + "math/rand" + + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" +) + +// ReshuffleInput is a Node. +type ReshuffleInput struct { + UID UnitID + SID StreamID + Coder *coder.Coder // Coder for the input PCollection. + Seed int64 + Out Node + + r*rand.Rand + enc ElementEncoder + wEnc WindowEncoder + bbytes.Buffer + // ret is a cached allocations for passing to the next Unit. Units never modify the passed in FullValue. + ret FullValue +} + +// ID returns the unit debug id. +func (n *ReshuffleInput) ID() UnitID { + return n.UID +} + +// Up initializes the value and window encoders, and the random source. +func (n *ReshuffleInput) Up(ctx context.Context) error { + n.enc = MakeElementEncoder(coder.SkipW(n.Coder)) + n.wEnc = MakeWindowEncoder(n.Coder.Window) + n.r = rand.New(rand.NewSource(n.Seed)) + return nil +} + +// StartBundle is a no-op. +func (n *ReshuffleInput) StartBundle(ctx context.Context, id string, data DataContext) error { + return MultiStartBundle(ctx, id, data, n.Out) +} + +func (n *ReshuffleInput) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { + n.b.Reset() + if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, ); err != nil { + return err + } + if err := n.enc.Encode(value, ); err != nil { + return errors.WithContextf(err, "encoding element %v with coder %v", value, n.Coder) + } + n.ret = FullValue{Elm: n.r.Int(), Elm2: n.b.Bytes(), Timestamp: value.Timestamp} + if err := n.Out.ProcessElement(ctx, ); err != nil { + return err + } + return nil +} + +// FinishBundle propagates finish bundle, and clears cached state. +func (n *ReshuffleInput) FinishBundle(ctx context.Context) error { + n.b = bytes.Buffer{} + n.ret = FullValue{} + return MultiFinishBundle(ctx, n.Out) +} + +// Down is a no-op. +func (n *ReshuffleInput) Down(ctx context.Context) error { + return nil +} + +func (n *ReshuffleInput) String() string { + return fmt.Sprintf("ReshuffleInput[%v] Coder:%v", n.SID, n.Coder) +} + +// ReshuffleOutput is a Node. +type ReshuffleOutput struct { + UID UnitID + SID StreamID + Coder *coder.Coder // Coder for the receiving PCollection. + Out Node + + bbytes.Buffer + dec ElementDecoder + wDec WindowDecoder + ret FullValue +} + +// ID returns the unit debug id. +func (n *ReshuffleOutput) ID() UnitID { + return n.UID +} + +// Up initializes the value and window encoders, and the random source. +func (n *ReshuffleOutput) Up(ctx context.Context) error { + n.dec = MakeElementDecoder(coder.SkipW(n.Coder)) + n.wDec = MakeWindowDecoder(n.Coder.Window) + return nil +} + +// StartBundle is a no-op. +func (n *ReshuffleOutput) StartBundle(ctx context.Context, id string, data DataContext) error { + return MultiStartBundle(ctx, id, data, n.Out) +} + +func (n *ReshuffleOutput) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { + // Marshal the pieces into a temporary buffer since they must be transmitted on FnAPI as a single + // unit. + vs, err := values[0].Open() + if err != nil { + return errors.WithContextf(err, "decoding values for %v with coder %v", value, n.Coder) + } + defer vs.Close() + for { + v, err := vs.Read() + if err != nil { + if err == io.EOF { + return nil + } +
[GitHub] [beam] youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK
youngoli commented on a change in pull request #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#discussion_r396856077 ## File path: sdks/go/pkg/beam/core/runtime/exec/reshuffle.go ## @@ -0,0 +1,170 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +package exec + +import ( + "bytes" + "context" + "fmt" + "io" + "math/rand" + + "github.com/apache/beam/sdks/go/pkg/beam/core/graph/coder" + "github.com/apache/beam/sdks/go/pkg/beam/internal/errors" +) + +// ReshuffleInput is a Node. +type ReshuffleInput struct { + UID UnitID + SID StreamID + Coder *coder.Coder // Coder for the input PCollection. + Seed int64 + Out Node + + r*rand.Rand + enc ElementEncoder + wEnc WindowEncoder + bbytes.Buffer + // ret is a cached allocations for passing to the next Unit. Units never modify the passed in FullValue. + ret FullValue +} + +// ID returns the unit debug id. +func (n *ReshuffleInput) ID() UnitID { + return n.UID +} + +// Up initializes the value and window encoders, and the random source. +func (n *ReshuffleInput) Up(ctx context.Context) error { + n.enc = MakeElementEncoder(coder.SkipW(n.Coder)) + n.wEnc = MakeWindowEncoder(n.Coder.Window) + n.r = rand.New(rand.NewSource(n.Seed)) + return nil +} + +// StartBundle is a no-op. +func (n *ReshuffleInput) StartBundle(ctx context.Context, id string, data DataContext) error { + return MultiStartBundle(ctx, id, data, n.Out) +} + +func (n *ReshuffleInput) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { + n.b.Reset() + if err := EncodeWindowedValueHeader(n.wEnc, value.Windows, value.Timestamp, ); err != nil { + return err + } + if err := n.enc.Encode(value, ); err != nil { + return errors.WithContextf(err, "encoding element %v with coder %v", value, n.Coder) + } + n.ret = FullValue{Elm: n.r.Int(), Elm2: n.b.Bytes(), Timestamp: value.Timestamp} + if err := n.Out.ProcessElement(ctx, ); err != nil { + return err + } + return nil +} + +// FinishBundle propagates finish bundle, and clears cached state. +func (n *ReshuffleInput) FinishBundle(ctx context.Context) error { + n.b = bytes.Buffer{} + n.ret = FullValue{} + return MultiFinishBundle(ctx, n.Out) +} + +// Down is a no-op. +func (n *ReshuffleInput) Down(ctx context.Context) error { + return nil +} + +func (n *ReshuffleInput) String() string { + return fmt.Sprintf("ReshuffleInput[%v] Coder:%v", n.SID, n.Coder) +} + +// ReshuffleOutput is a Node. +type ReshuffleOutput struct { + UID UnitID + SID StreamID + Coder *coder.Coder // Coder for the receiving PCollection. + Out Node + + bbytes.Buffer + dec ElementDecoder + wDec WindowDecoder + ret FullValue +} + +// ID returns the unit debug id. +func (n *ReshuffleOutput) ID() UnitID { + return n.UID +} + +// Up initializes the value and window encoders, and the random source. +func (n *ReshuffleOutput) Up(ctx context.Context) error { + n.dec = MakeElementDecoder(coder.SkipW(n.Coder)) + n.wDec = MakeWindowDecoder(n.Coder.Window) + return nil +} + +// StartBundle is a no-op. +func (n *ReshuffleOutput) StartBundle(ctx context.Context, id string, data DataContext) error { + return MultiStartBundle(ctx, id, data, n.Out) +} + +func (n *ReshuffleOutput) ProcessElement(ctx context.Context, value *FullValue, values ...ReStream) error { + // Marshal the pieces into a temporary buffer since they must be transmitted on FnAPI as a single + // unit. + vs, err := values[0].Open() Review comment: It's strange to me that `values[]` can have multiple elements, but this method ends up actually reading all the values from the first element of it. Could you explain why that happens? Are there sometimes multiple ReStreams representing different things? This is an automated message from
[GitHub] [beam] chrisgorgo commented on issue #11204: [BEAM-9579] Fix numpy logic operators
chrisgorgo commented on issue #11204: [BEAM-9579] Fix numpy logic operators URL: https://github.com/apache/beam/pull/11204#issuecomment-603002452 R: @aaltay @charlesccychen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] chrisgorgo opened a new pull request #11204: [BEAM-9579] Fix numpy logic operators
chrisgorgo opened a new pull request #11204: [BEAM-9579] Fix numpy logic operators URL: https://github.com/apache/beam/pull/11204 Replacing deprecated '-' logical operators. Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[GitHub] [beam] robertwb opened a new pull request #11203: [BEAM-9577] Define and implement dependency-aware artifact staging service.
robertwb opened a new pull request #11203: [BEAM-9577] Define and implement dependency-aware artifact staging service. URL: https://github.com/apache/beam/pull/11203 This is not yet used anywhere (and even the legacy service is not yet used in Dataflow) but it is good to define what we want this service to look like. R: @lukecwik CC: @ihji Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] reuvenlax commented on issue #11074: Store logical type values in Row instead of base values
reuvenlax commented on issue #11074: Store logical type values in Row instead of base values URL: https://github.com/apache/beam/pull/11074#issuecomment-602935452 After 8 runs, the only Java Precommit failures have been random flakes (e.g. in Flink tests). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] reuvenlax merged pull request #11074: Store logical type values in Row instead of base values
reuvenlax merged pull request #11074: Store logical type values in Row instead of base values URL: https://github.com/apache/beam/pull/11074 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform example script
TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform example script URL: https://github.com/apache/beam/pull/10055#issuecomment-602931763 Run XVR_Flink PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform example script
TheNeuralBit commented on issue #10055: [BEAM-8603] Add Python SqlTransform example script URL: https://github.com/apache/beam/pull/10055#issuecomment-602931816 Run XVR_Spark PostCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] stale[bot] closed pull request #8801: Make temp_location an attribute of the StandardOptions class
stale[bot] closed pull request #8801: Make temp_location an attribute of the StandardOptions class URL: https://github.com/apache/beam/pull/8801 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] stale[bot] commented on issue #8884: Add a PCollectionCache ABC and several file-based implementations
stale[bot] commented on issue #8884: Add a PCollectionCache ABC and several file-based implementations URL: https://github.com/apache/beam/pull/8884#issuecomment-602929552 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] stale[bot] closed pull request #8884: Add a PCollectionCache ABC and several file-based implementations
stale[bot] closed pull request #8884: Add a PCollectionCache ABC and several file-based implementations URL: https://github.com/apache/beam/pull/8884 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] stale[bot] commented on issue #8801: Make temp_location an attribute of the StandardOptions class
stale[bot] commented on issue #8801: Make temp_location an attribute of the StandardOptions class URL: https://github.com/apache/beam/pull/8801#issuecomment-602929558 This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem opened a new pull request #11202: asdletmedah
pabloem opened a new pull request #11202: asdletmedah URL: https://github.com/apache/beam/pull/11202 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[GitHub] [beam] pabloem commented on a change in pull request #11163: [BEAM-9548] Add better error handling to the TestStreamServiceController
pabloem commented on a change in pull request #11163: [BEAM-9548] Add better error handling to the TestStreamServiceController URL: https://github.com/apache/beam/pull/11163#discussion_r396767924 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -202,14 +207,24 @@ def _emit_from_file(self, fh, tail): # The first line at pos = 0 is always the header. Read the line without # the new line. to_decode = line[:-1] -if pos == 0: - header = TestStreamFileHeader() - header.ParseFromString(self._coder.decode(to_decode)) - yield header +proto_cls = TestStreamFileHeader if pos == 0 else TestStreamFileRecord +msg = self._try_parse_as(proto_cls, to_decode) +if msg: + yield msg else: - record = TestStreamFileRecord() - record.ParseFromString(self._coder.decode(to_decode)) - yield record + break + + def _try_parse_as(self, proto_cls, to_decode): +try: + msg = proto_cls() + msg.ParseFromString(self._coder.decode(to_decode)) +except DecodeError: + _LOGGER.error( + 'Could not parse as %s. This can indicate that the cache is ' + 'corruputed. Please restart the kernel. ' + '\nfile: %s \nmessage: %s', proto_cls, self._path, to_decode) + msg = None Review comment: Do we just skip? This may mean that the file is corrupted? Should we stop consuming (i.e. rethrow the exception)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on a change in pull request #11163: [BEAM-9548] Add better error handling to the TestStreamServiceController
pabloem commented on a change in pull request #11163: [BEAM-9548] Add better error handling to the TestStreamServiceController URL: https://github.com/apache/beam/pull/11163#discussion_r396810340 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner.py ## @@ -170,8 +170,13 @@ def run_pipeline(self, pipeline, options): user_pipeline)): streaming_cache_manager = ie.current_env().cache_manager() if streaming_cache_manager: + + def exception_handler(e): +_LOGGER.error(str(e)) Review comment: Same as above. Do we just log and not stop processing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on a change in pull request #11163: [BEAM-9548] Add better error handling to the TestStreamServiceController
pabloem commented on a change in pull request #11163: [BEAM-9548] Add better error handling to the TestStreamServiceController URL: https://github.com/apache/beam/pull/11163#discussion_r396766986 ## File path: sdks/python/apache_beam/runners/interactive/caching/streaming_cache.py ## @@ -166,13 +169,15 @@ def _wait_until_file_exists(self, timeout_secs=30): # Wait for up to `timeout_secs` for the file to be available. start = time.time() -path = os.path.join(self._cache_dir, *self._labels) -while not os.path.exists(path): +while not os.path.exists(self._path): time.sleep(1) if time.time() - start > timeout_timestamp_secs: +from apache_beam.runners.interactive.pipeline_instrument import CacheKey +pcollection_var = CacheKey.from_str(self._labels[-1]).var Review comment: I hadn't stopeed to think that labels are a file name too, huh? I guess the final file name is the PCollection variable name? If so, users may name their PCollections something that is not supported by the OS? (or maybe not since they have to be Python variable names?) Anyway this is not for this PR. But just to think about. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on a change in pull request #11188: [BEAM-3301] Adding restriction trackers and validation.
lostluck commented on a change in pull request #11188: [BEAM-3301] Adding restriction trackers and validation. URL: https://github.com/apache/beam/pull/11188#discussion_r396803249 ## File path: sdks/go/pkg/beam/core/graph/fn_test.go ## @@ -562,7 +595,13 @@ func (fn *GoodSdf) RestrictionSize(int, RestT) float64 { return 0 } -// TODO(BEAM-3301): Add ProcessElement impl. when restriction trackers are in. +func (fn *GoodSdf) CreateTracker(RestT) *RTrackerT { + return {} +} + +func (fn *GoodSdf) ProcessElement(*RTrackerT, int) int { Review comment: What do you think of having ProcessElement actually just have an sdf.RTracker value? Having it as the interface simplifies our wrapping approach for dynamic splitting, and means the framework can do it all the time, for safety etc. CreateTracker would still need the actual implementation type, and check that it implements sdf.RTracker of course. We can always extend things to allow a user to "unwrap" the interface if they need direct access to their RTracker implementation for whatever reason. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on a change in pull request #11188: [BEAM-3301] Adding restriction trackers and validation.
lostluck commented on a change in pull request #11188: [BEAM-3301] Adding restriction trackers and validation. URL: https://github.com/apache/beam/pull/11188#discussion_r396795769 ## File path: sdks/go/pkg/beam/core/graph/fn_test.go ## @@ -676,39 +737,77 @@ func (fn *BadSdfElementTRestSize) RestrictionSize(float32, RestT) float64 { type BadRestT struct{} type BadSdfRestTSplitRestParam struct { - *GoodDoFn + *GoodSdf } func (fn *BadSdfRestTSplitRestParam) SplitRestriction(int, BadRestT) []RestT { return []RestT{} } type BadSdfRestTSplitRestReturn struct { - *GoodDoFn + *GoodSdf } func (fn *BadSdfRestTSplitRestReturn) SplitRestriction(int, RestT) []BadRestT { return []BadRestT{} } type BadSdfRestTRestSize struct { - *GoodDoFn + *GoodSdf } func (fn *BadSdfRestTRestSize) RestrictionSize(int, BadRestT) float64 { return 0 } +type BadSdfRestTCreateTracker struct { + *GoodSdf +} + +func (fn *BadSdfRestTCreateTracker) CreateTracker(BadRestT) *RTrackerT { + return {} +} + // Examples of other type validation that needs to be done. type BadSdfRestSizeReturn struct { - *GoodDoFn + *GoodSdf } func (fn *BadSdfRestSizeReturn) BadSdfRestSizeReturn(int, RestT) int { return 0 } +type BadRTrackerT struct{} Review comment: Consider commenting that this "RTracker" isn't implementing the RTracker interface. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on a change in pull request #11188: [BEAM-3301] Adding restriction trackers and validation.
lostluck commented on a change in pull request #11188: [BEAM-3301] Adding restriction trackers and validation. URL: https://github.com/apache/beam/pull/11188#discussion_r396804143 ## File path: sdks/go/pkg/beam/core/sdf/sdf.go ## @@ -0,0 +1,74 @@ +// Licensed to the Apache Software Foundation (ASF) under one or more +// contributor license agreements. See the NOTICE file distributed with +// this work for additional information regarding copyright ownership. +// The ASF licenses this file to You under the Apache License, Version 2.0 +// (the "License"); you may not use this file except in compliance with +// the License. You may obtain a copy of the License at +// +//http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, software +// distributed under the License is distributed on an "AS IS" BASIS, +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +// See the License for the specific language governing permissions and +// limitations under the License. + +// Package sdf is experimental, incomplete, and not yet meant for general usage. +package sdf + +// RTracker is an interface used to interact with restrictions while processing elements in +// SplittableDoFns. Each implementation of RTracker is expected to be used for tracking a single +// restriction type, which is the type that should be used to create the RTracker, and output by +// TrySplit. +type RTracker interface { + // TryClaim attempts to claim the block of work in the current restriction located at a given + // position. This method must be used in the ProcessElement method of Splittable DoFns to claim + // work before performing it. If no work is claimed, the ProcessElement is not allowed to perform + // work or emit outputs. If the claim is successful, the DoFn must process the entire block. If + // the claim is unsuccessful the ProcessElement method of the DoFn must return without performing + // any additional work or emitting any outputs. + // + // TryClaim accepts an arbitrary value that can be interpreted as the position of a block, and + // returns a boolean indicating whether the claim succeeded. + // + // If the claim fails due to an error, that error can be retrieved with GetError. + // + // For SDFs to work properly, claims must always be monotonically increasing in reference to the + // restriction's start and end points, and every block of work in a restriction must be claimed. + // + // This pseudocode example illustrates the typical usage of TryClaim: + // + // pos = position of first block after restriction.start + // for TryClaim(pos) == true { + // // Do all work in the claimed block and emit outputs. + // pos = position of next block + // } + // return + TryClaim(pos interface{}) (ok bool) + + // GetError returns the error that made this RTracker stop executing, and it returns null if no Review comment: returns nil* This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200#issuecomment-602890165 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] TheNeuralBit commented on a change in pull request #10529: [BEAM-9044] Protobuf options to Schema options
TheNeuralBit commented on a change in pull request #10529: [BEAM-9044] Protobuf options to Schema options URL: https://github.com/apache/beam/pull/10529#discussion_r396791822 ## File path: sdks/java/extensions/protobuf/src/main/java/org/apache/beam/sdk/extensions/protobuf/ProtoSchemaTranslator.java ## @@ -205,10 +205,12 @@ static Schema getSchema(Descriptors.Descriptor descriptor) { // Store proto field number in metadata. FieldType fieldType = withMetaData(beamFieldTypeFromProtoField(fieldDescriptor), fieldDescriptor); Review comment: I think Reuven was referring to the metadata that's added in `withMetaData`, since that's the "special" proto metadata This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] reuvenlax commented on issue #11074: Store logical type values in Row instead of base values
reuvenlax commented on issue #11074: Store logical type values in Row instead of base values URL: https://github.com/apache/beam/pull/11074#issuecomment-602887176 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] rohdesamuel opened a new pull request #11201: test
rohdesamuel opened a new pull request #11201: test URL: https://github.com/apache/beam/pull/11201 Change-Id: I18a1a3c4589e1fdc90d919a223ef67465f495374 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build
[GitHub] [beam] boyuanzz commented on issue #11199: [BEAM-9562] Update Timer encoding to V2
boyuanzz commented on issue #11199: [BEAM-9562] Update Timer encoding to V2 URL: https://github.com/apache/beam/pull/11199#issuecomment-602876259 I'll update the coder implementation once the proto looks good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem merged pull request #11198: [BEAM-7923] Obfuscates display ids
pabloem merged pull request #11198: [BEAM-7923] Obfuscates display ids URL: https://github.com/apache/beam/pull/11198 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm removed a comment on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm removed a comment on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200#issuecomment-602869001 Flink Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200#issuecomment-602869187 Run Flink Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200#issuecomment-602869001 Flink Runner Nexmark Tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200#issuecomment-602866857 Run Java Flink PortableValidatesRunner Streaming This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm commented on issue #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200#issuecomment-602866787 Run Flink ValidatesRunner This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] mxm opened a new pull request #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp
mxm opened a new pull request #11200: [BEAM-9573] Correct computing of watermark hold for timer output timestamp URL: https://github.com/apache/beam/pull/11200 This PR contains two related changes which would be hard to review in separate PRs: ### BEAM-9573 Correct computing of watermark hold for timer output timestamp With the introduction of timer output timestamps, a new watermark hold had been added to the Flink Runner. The watermark computation works on the keyed state backend which computes a key-scoped watermark hold and not the desired operator-wide watermark hold. Computation: https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1140 Key-scoped state: https://github.com/apache/beam/blob/b564239081e9351c56fb0e7d263495b95dd3f8f3/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/DoFnOperator.java#L1130 The solution is to iterate over all available state backend keys. ### BEAM-9566 Mitigate performance issue with watermark hold computation Benchmarks have shown that the watermark computation over all keys is very expensive. This introduces a cache which stores and updates the lowest timestamps for watermark holds due to timer output timestamps. In most cases only the cache should be hit, only when a large number of timers are added and then removed one-after-another, the cache might have to be reloaded with data from the state backend. Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[GitHub] [beam] reuvenlax commented on issue #11074: Store logical type values in Row instead of base values
reuvenlax commented on issue #11074: Store logical type values in Row instead of base values URL: https://github.com/apache/beam/pull/11074#issuecomment-602864720 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK
lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#issuecomment-602862121 Post commits run and pass which is a good sign! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] chamikaramj commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts from environment
chamikaramj commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts from environment URL: https://github.com/apache/beam/pull/11039#issuecomment-602855911 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] chamikaramj commented on issue #11185: [BEAM-8019] Some generalizations to support cross-language transforms.
chamikaramj commented on issue #11185: [BEAM-8019] Some generalizations to support cross-language transforms. URL: https://github.com/apache/beam/pull/11185#issuecomment-602855159 All tests pass now. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] aaltay commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised
aaltay commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised URL: https://github.com/apache/beam/pull/11174#issuecomment-602854440 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] robertwb commented on a change in pull request #11148: [BEAM-8335] Adds a streaming wordcount integration test
robertwb commented on a change in pull request #11148: [BEAM-8335] Adds a streaming wordcount integration test URL: https://github.com/apache/beam/pull/11148#discussion_r396752081 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner_test.py ## @@ -147,6 +150,97 @@ def process(self, element): ] self.assertEqual(actual_reified, expected_reified) + def test_streaming_wordcount(self): +class WordExtractingDoFn(beam.DoFn): + def process(self, element): +text_line = element.strip() +words = text_line.split() +return words + +# Add the TestStream so that it can be cached. +ib.options.capturable_sources.add(TestStream) +ib.options.capture_duration = timedelta(seconds=1) Review comment: Why is this need? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] robertwb commented on a change in pull request #11148: [BEAM-8335] Adds a streaming wordcount integration test
robertwb commented on a change in pull request #11148: [BEAM-8335] Adds a streaming wordcount integration test URL: https://github.com/apache/beam/pull/11148#discussion_r396751796 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner_test.py ## @@ -147,6 +150,97 @@ def process(self, element): ] self.assertEqual(actual_reified, expected_reified) + def test_streaming_wordcount(self): +class WordExtractingDoFn(beam.DoFn): + def process(self, element): +text_line = element.strip() +words = text_line.split() +return words + +# Add the TestStream so that it can be cached. +ib.options.capturable_sources.add(TestStream) +ib.options.capture_duration = timedelta(seconds=1) + +p = beam.Pipeline( +runner=interactive_runner.InteractiveRunner(), +options=StandardOptions(streaming=True)) + +data = ( +p +| TestStream() +.advance_watermark_to(0) +.advance_processing_time(1) +.add_elements(['to', 'be', 'or', 'not', 'to', 'be']) +.advance_watermark_to(20) +.advance_processing_time(1) +.add_elements(['to', 'be', 'or', 'not', 'to', 'be']) +.advance_watermark_to(40) +.advance_processing_time(1) Review comment: Does this not trigger the capture duration? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] robertwb commented on a change in pull request #11148: [BEAM-8335] Adds a streaming wordcount integration test
robertwb commented on a change in pull request #11148: [BEAM-8335] Adds a streaming wordcount integration test URL: https://github.com/apache/beam/pull/11148#discussion_r396752506 ## File path: sdks/python/apache_beam/runners/interactive/interactive_runner_test.py ## @@ -147,6 +150,97 @@ def process(self, element): ] self.assertEqual(actual_reified, expected_reified) + def test_streaming_wordcount(self): +class WordExtractingDoFn(beam.DoFn): + def process(self, element): +text_line = element.strip() +words = text_line.split() +return words + +# Add the TestStream so that it can be cached. +ib.options.capturable_sources.add(TestStream) +ib.options.capture_duration = timedelta(seconds=1) + +p = beam.Pipeline( +runner=interactive_runner.InteractiveRunner(), +options=StandardOptions(streaming=True)) + +data = ( +p +| TestStream() +.advance_watermark_to(0) +.advance_processing_time(1) +.add_elements(['to', 'be', 'or', 'not', 'to', 'be']) +.advance_watermark_to(20) +.advance_processing_time(1) +.add_elements(['to', 'be', 'or', 'not', 'to', 'be']) +.advance_watermark_to(40) +.advance_processing_time(1) +.add_elements(['to', 'be', 'or', 'not', 'to', 'be']) +| beam.WindowInto(beam.window.FixedWindows(10))) # yapf: disable + +counts = ( +data +| 'split' >> beam.ParDo(WordExtractingDoFn()) +| 'pair_with_one' >> beam.Map(lambda x: (x, 1)) +| 'group' >> beam.GroupByKey() +| 'count' >> beam.Map(lambda wordones: (wordones[0], sum(wordones[1] + +# Watch the local scope for Interactive Beam so that referenced PCollections +# will be cached. +ib.watch(locals()) + +# This is normally done in the interactive_utils when a transform is +# applied but needs an IPython environment. So we manually run this here. +ie.current_env().track_user_pipelines() + +# This tests that the data was correctly cached. +pane_info = PaneInfo(True, True, PaneInfoTiming.UNKNOWN, 0, 0) +expected_data_df = pd.DataFrame( +[('to', 0, [beam.window.IntervalWindow(0, 10)], pane_info), Review comment: It'd be easier to understand the test if there were less data. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] KevinGG commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised
KevinGG commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised URL: https://github.com/apache/beam/pull/11174#issuecomment-602852406 Fixed a failed test. TL;DR, the test was wrong and passed in the past because it got lucky. The test used to pass and started failing with the try-append-finally-pop change, because: 1. The test tried to append two no name / label transforms into a same pipeline, both of them will generate the same label and raise errors while the tests asserted for errors, thus allowing pipeline construction to continue even if it ran into fatal errors; 2. In the past, the pipeline was in a broken state where the current_transform was not popped when the 1st error was raised. The full label generated was `WriteToBigQuery`; 3. In the past, the second transform appending wrongly added parts into the broken current_transform `WriteToBigQuery`, causing the transform node to be appended after a wrong failed parent node and luckily got a unique transform full label `WriteToBigQuery/WriteToBigQuery`, thus the test didn't fail due to duplicated labels. But the pipeline was constructed as `WriteToBigQuery`->`WriteToBigQuery` which was completely messed up; 4. Still the test didn't fail because it was testing for errors even though it built a broken pipeline with broken usages. Once the try-append-finally-pop change is applied, the test functions as: 1. First transform still fails to be appended and has side-effects including an applied transform label, but it would not leave the pipeline in a broken state where current transform is still the right node; 2. Second transform still fails but now appended to the correct node; 3. As long as the 2 transforms have different labels (or executed in different cells if in an interactive environment), the test still passes, pipeline is not broken and can be used for future development, side-effects are ruled out when data-centric APIs such as `show` and `collect` are invoked. The reason we keep the applied label is that we never know what side effects are when the error is raised. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK
lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#issuecomment-602851450 Run Go Postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] aaltay commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised
aaltay commented on issue #11174: [BEAM-7923] Pop failed transform when error is raised URL: https://github.com/apache/beam/pull/11174#issuecomment-602851060 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] boyuanzz commented on a change in pull request #11060: [BEAM-9454] Create Deduplication transform based on user timer/state
boyuanzz commented on a change in pull request #11060: [BEAM-9454] Create Deduplication transform based on user timer/state URL: https://github.com/apache/beam/pull/11060#discussion_r396747769 ## File path: sdks/python/apache_beam/transforms/deduplicate.py ## @@ -0,0 +1,133 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# pytype: skip-file + +"""a collection of ptransforms for deduplicating elements.""" + +from __future__ import absolute_import +from __future__ import division + +import typing + +from apache_beam import typehints +from apache_beam.coders.coders import BooleanCoder +from apache_beam.transforms import core +from apache_beam.transforms import ptransform +from apache_beam.transforms import userstate +from apache_beam.transforms.timeutil import TimeDomain +from apache_beam.utils import timestamp + +__all__ = [ +'Deduplicate', +'DeduplicatePerKey', +] + +K = typing.TypeVar('K') +V = typing.TypeVar('V') + + +@typehints.with_input_types(typing.Tuple[K, V]) +@typehints.with_output_types(typing.Tuple[K, V]) +class DeduplicatePerKey(ptransform.PTransform): + """ A PTransform which deduplicates pair over a time domain and + threshold. Values in different windows will NOT be considered duplicates of + each other. Deduplication is best effort. + + The durations specified may impose memory and/or storage requirements within + a runner and care might need to be used to ensure that the deduplication time + limit is long enough to remove duplicates but short enough to not cause + performance problems within a runner. Each runner may provide an optimized + implementation of their choice using the deduplication time domain and + threshold specified. + + Does not preserve any order the input PCollection might have had. + """ + def __init__(self, processing_time_duration=None, event_time_duration=None): +if processing_time_duration is None and event_time_duration is None: + raise ValueError( + 'DeduplicatePerKey requires at lease provide either' + 'processing_time_duration or event_time_duration.') +self.processing_time_duration = processing_time_duration +self.event_time_duration = event_time_duration + + def _create_deduplicate_fn(self): +processing_timer_spec = userstate.TimerSpec( +'processing_timer', TimeDomain.REAL_TIME) +event_timer_spec = userstate.TimerSpec('event_timer', TimeDomain.WATERMARK) +state_spec = userstate.BagStateSpec('seen', BooleanCoder()) Review comment: The `seen_state` is only set once per key during that duration. I'm not sure whether combining state is more suitable here. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on issue #11040: [BEAM-9305] Allow value provider query strings in _CustomBigQuerySource
pabloem commented on issue #11040: [BEAM-9305] Allow value provider query strings in _CustomBigQuerySource URL: https://github.com/apache/beam/pull/11040#issuecomment-602847544 Thanks @EDjur for the contribution! Thanks @kamilwu for reviewing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem merged pull request #11040: [BEAM-9305] Allow value provider query strings in _CustomBigQuerySource
pabloem merged pull request #11040: [BEAM-9305] Allow value provider query strings in _CustomBigQuerySource URL: https://github.com/apache/beam/pull/11040 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] Hannah-Jiang merged pull request #11187: optionally import grpc
Hannah-Jiang merged pull request #11187: optionally import grpc URL: https://github.com/apache/beam/pull/11187 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] KevinGG commented on a change in pull request #11174: [BEAM-7923] Pop failed transform when error is raised
KevinGG commented on a change in pull request #11174: [BEAM-7923] Pop failed transform when error is raised URL: https://github.com/apache/beam/pull/11174#discussion_r396742548 ## File path: sdks/python/apache_beam/pipeline.py ## @@ -307,58 +307,61 @@ def _replace_if_needed(self, original_transform_node): elif len(inputs) == 0: input_node = pvalue.PBegin(self.pipeline) - # We have to add the new AppliedTransform to the stack before expand() - # and pop it out later to make sure that parts get added correctly. - self.pipeline.transforms_stack.append(replacement_transform_node) - - # Keeping the same label for the replaced node but recursively - # removing labels of child transforms of original transform since they - # will be replaced during the expand below. This is needed in case - # the replacement contains children that have labels that conflicts - # with labels of the children of the original. - self.pipeline._remove_labels_recursively(original_transform_node) - - new_output = replacement_transform.expand(input_node) - assert isinstance( - new_output, (dict, pvalue.PValue, pvalue.DoOutputsTuple)) - - if isinstance(new_output, pvalue.PValue): -new_output.element_type = None -self.pipeline._infer_result_type( -replacement_transform, inputs, new_output) - - if isinstance(new_output, dict): -for new_tag, new_pcoll in new_output.items(): - replacement_transform_node.add_output(new_pcoll, new_tag) - elif isinstance(new_output, pvalue.DoOutputsTuple): -replacement_transform_node.add_output( -new_output, new_output._main_tag) - else: -replacement_transform_node.add_output(new_output, new_output.tag) - - # Recording updated outputs. This cannot be done in the same visitor - # since if we dynamically update output type here, we'll run into - # errors when visiting child nodes. - # - # NOTE: When replacing multiple outputs, the replacement PCollection - # tags must have a matching tag in the original transform. - if isinstance(new_output, pvalue.PValue): -if not new_output.producer: - new_output.producer = replacement_transform_node -output_map[original_transform_node.outputs[new_output.tag]] = \ -new_output - elif isinstance(new_output, (pvalue.DoOutputsTuple, tuple)): -for pcoll in new_output: - if not pcoll.producer: -pcoll.producer = replacement_transform_node - output_map[original_transform_node.outputs[pcoll.tag]] = pcoll - elif isinstance(new_output, dict): -for tag, pcoll in new_output.items(): - if not pcoll.producer: -pcoll.producer = replacement_transform_node - output_map[original_transform_node.outputs[tag]] = pcoll - - self.pipeline.transforms_stack.pop() + try: +# We have to add the new AppliedTransform to the stack before +# expand() and pop it out later to make sure that parts get added +# correctly. +self.pipeline.transforms_stack.append(replacement_transform_node) Review comment: It wouldn't raise an error if it's still a list on the line, and if it's not a list (becomes a None or some other object) at the moment it causes an error, the finally block would error out too. So it's not necessary to exclude it from the try block. Putting it inside the try makes it a little bit more self-explained. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] boyuanzz commented on a change in pull request #11060: [BEAM-9454] Create Deduplication transform based on user timer/state
boyuanzz commented on a change in pull request #11060: [BEAM-9454] Create Deduplication transform based on user timer/state URL: https://github.com/apache/beam/pull/11060#discussion_r396742097 ## File path: sdks/python/apache_beam/transforms/deduplicate_test.py ## @@ -0,0 +1,168 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# pytype: skip-file + +"""Unit tests for deduplicate transform by using TestStream.""" + +from __future__ import absolute_import + +import unittest + +from nose.plugins.attrib import attr + +import apache_beam as beam +from apache_beam.coders import coders +from apache_beam.testing.test_pipeline import TestPipeline +from apache_beam.testing.test_stream import TestStream +from apache_beam.testing.util import assert_that +from apache_beam.testing.util import equal_to +from apache_beam.testing.util import equal_to_per_window +from apache_beam.transforms import deduplicate +from apache_beam.transforms import window +from apache_beam.utils.timestamp import Duration +from apache_beam.utils.timestamp import Timestamp + + +# TestStream is only supported in streaming pipeline. The Deduplicate transform +# also requires Timer support. Sickbaying this testsuite until dataflow runner +# supports both TestStream and user timer. +@attr('ValidatesRunner', 'sickbay-batch', 'sickbay-streaming') Review comment: The 'sickbay-batch' and 'sickbay-streaming' is only used by dataflow suite now. And unfortunately, I don't we have runners supporting these python test now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] youngoli commented on issue #11188: [BEAM-3301] Adding restriction trackers and validation.
youngoli commented on issue #11188: [BEAM-3301] Adding restriction trackers and validation. URL: https://github.com/apache/beam/pull/11188#issuecomment-602838126 Whoops, forgot reviewers. R: @lostluck This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on issue #11198: [BEAM-7923] Obfuscates display ids
pabloem commented on issue #11198: [BEAM-7923] Obfuscates display ids URL: https://github.com/apache/beam/pull/11198#issuecomment-602837778 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK
lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#issuecomment-602835398 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on issue #10291: [BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive watermark manager
pabloem commented on issue #10291: [BEAM-7516][BEAM-8823] FnApiRunner works with work queues, and a primitive watermark manager URL: https://github.com/apache/beam/pull/10291#issuecomment-602830331 I can't reproduce any of the failures locally via `tox -e py35-cloud,py35-cython` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] boyuanzz opened a new pull request #11199: [BEAM-9562] Update Timer encoding to V2
boyuanzz opened a new pull request #11199: [BEAM-9562] Update Timer encoding to V2 URL: https://github.com/apache/beam/pull/11199 **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/) Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)[![Build
[GitHub] [beam] rohdesamuel commented on issue #11148: [BEAM-8335] Adds a streaming wordcount integration test
rohdesamuel commented on issue #11148: [BEAM-8335] Adds a streaming wordcount integration test URL: https://github.com/apache/beam/pull/11148#issuecomment-602825559 R: @robertwb This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] reuvenlax commented on issue #11074: Store logical type values in Row instead of base values
reuvenlax commented on issue #11074: Store logical type values in Row instead of base values URL: https://github.com/apache/beam/pull/11074#issuecomment-602824816 Run Java PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test
udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test URL: https://github.com/apache/beam/pull/10914#issuecomment-602823166 Trying again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test
udim commented on issue #10914: [BEAM-8078] streaming_wordcount_debugging.py is missing a test URL: https://github.com/apache/beam/pull/10914#issuecomment-602823472 run python 3.7 postcommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] chamikaramj commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts from environment
chamikaramj commented on issue #11039: [BEAM-9383] Staging Dataflow artifacts from environment URL: https://github.com/apache/beam/pull/11039#issuecomment-602822540 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] chamikaramj commented on issue #11185: [BEAM-8019] Some generalizations to support cross-language transforms.
chamikaramj commented on issue #11185: [BEAM-8019] Some generalizations to support cross-language transforms. URL: https://github.com/apache/beam/pull/11185#issuecomment-602821407 Run Python2_PVR_Flink PreCommit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] davidyan74 commented on a change in pull request #11198: [BEAM-7923] Obfuscates display ids
davidyan74 commented on a change in pull request #11198: [BEAM-7923] Obfuscates display ids URL: https://github.com/apache/beam/pull/11198#discussion_r396717186 ## File path: sdks/python/apache_beam/runners/interactive/display/pcoll_visualization.py ## @@ -246,11 +247,11 @@ def __init__(self, pcoll, include_window_info=False, display_facets=False): if not self._pcoll_var: self._pcoll_var = 'Value' self._cache_key = self._pin.cache_key(self._pcoll) -self._dive_display_id = 'facets_dive_{}_{}'.format( -self._cache_key, id(self)) -self._overview_display_id = 'facets_overview_{}_{}'.format( -self._cache_key, id(self)) -self._df_display_id = 'df_{}_{}'.format(self._cache_key, id(self)) +self._dive_display_id = 'facets_dive_{}'.format( +obfuscate(self._cache_key, id(self))) Review comment: nit: assign the obfuscate return value to a variable so it can be reused. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] pabloem commented on issue #11198: [BEAM-7923] Obfuscates display ids
pabloem commented on issue #11198: [BEAM-7923] Obfuscates display ids URL: https://github.com/apache/beam/pull/11198#issuecomment-602819811 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK
lostluck commented on issue #11197: [BEAM-8292] Portable Reshuffle for Go SDK URL: https://github.com/apache/beam/pull/11197#issuecomment-602817069 I'm definitely not merging this until both the PostCommit runs, and someone more familiar with windowing/trigger semantics looks over the configuration I copied over from python: https://github.com/apache/beam/pull/11197/files#diff-ef420fdb9afbce0674282b4ed4481042R530 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] KevinGG opened a new pull request #11198: [BEAM-7923] Obfuscates display ids
KevinGG opened a new pull request #11198: [BEAM-7923] Obfuscates display ids URL: https://github.com/apache/beam/pull/11198 1. Use md5 to hash and digest any inputs into a hexadecimal string. 2. The obfuscation is applied to all display ids in notebooks. Note the ids will start with alphabets such as `facets_dive_` or `table_df_` to be compatible with `document.querySelector()`. Otherwise, hexadecimal strings are compatible with `jQuery()` already. 3. The performance and hash clashing research can be found here: https://www.peterbe.com/plog/best-hashing-function-in-python. **Please** add a meaningful description for your change here Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily: - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`). - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue. - [ ] Update `CHANGES.md` with noteworthy changes. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf). See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier). Post-Commit Tests Status (on master branch) Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark --- | --- | --- | --- | --- | --- | --- | --- Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/) Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)[![Build
[GitHub] [beam] KevinGG commented on issue #11198: [BEAM-7923] Obfuscates display ids
KevinGG commented on issue #11198: [BEAM-7923] Obfuscates display ids URL: https://github.com/apache/beam/pull/11198#issuecomment-602817070 yapf formatted. Lint passed locally. R: @pabloem R: @davidyan74 R: @rohdesamuel PTAL, thx! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] lukecwik merged pull request #11192: [BEAM-9430] Fix coder sent to Dataflow service for non-portable pipelines due to WatermarkEstimators migration change
lukecwik merged pull request #11192: [BEAM-9430] Fix coder sent to Dataflow service for non-portable pipelines due to WatermarkEstimators migration change URL: https://github.com/apache/beam/pull/11192 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [beam] reuvenlax merged pull request #10990: [BEAM-9569] disable coder inference for rows
reuvenlax merged pull request #10990: [BEAM-9569] disable coder inference for rows URL: https://github.com/apache/beam/pull/10990 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services