[beam] tag nightly-master updated (bee495f -> bcced0c)
This is an automated email from the ASF dual-hosted git repository. github-bot pushed a change to tag nightly-master in repository https://gitbox.apache.org/repos/asf/beam.git. *** WARNING: tag nightly-master was modified! *** from bee495f (commit) to bcced0c (commit) from bee495f Merge pull request #14407: [BEAM-12060] Fix failing Go Postcommits, jenkins support for Gradle tasks. add 4477b41 [BEAM-12067] Bump elasticsearch-rest-high-level-client to 7.12.0 add c4d1ab3 Merge pull request #14424: [BEAM-12067] Bump elasticsearch-rest-high-level-client to 7.12.0 add 5ffb3ee [BEAM-9615] Misc final schema cleanups. (#14285) add abbe14f [BEAM-12083] Nexmark Query 13. (#14404) add 37db190 [BEAM-4106] Add FileStagingOptions and merge staging file options between runners add 7134cfd Merge pull request #14423: [BEAM-4106] Add FileStagingOptions and merge staging file options between runners add 99a3998 [BEAM-12060] Remove overwriting jenkins property. (#14432) add 3420488 [BEAM-10925] Roundtrip tests for literals through Java UDF. add 7845c58 [BEAM-10925] Simplify test setup. add a01bc08 Merge pull request #14418 from ibzib/java-udf-types add 3e2e630 [BEAM-12095] Fix Spark job server path. add 312c6eb Merge pull request #14429 from ibzib/BEAM-12095 add 09bc53a remove typo in encoding.go add 6988b94 Merge pull request #14430: remove typo in encoding.go add 91e35b5 Updates Dataflow worker pool config to include all environments used by transforms. Sets environment Ids in the Dataflow worker pool config which will be used to map bundles to SDK Harnesses. add fe9a406 Merge pull request #14408: [BEAM-11935] Updates Dataflow config to include all environments used by transforms add 6ddf0c2 Optimize reservoir sampling calculation add bcced0c Merge pull request #14406 from [BEAM-11836] Optimize reservoir sampling calculation in PCollectionConsumerRegistry No new revisions were added by this update. Summary of changes: .test-infra/jenkins/job_PostCommit_Go.groovy | 1 - .../job_PostCommit_Go_ValidatesRunner_Flink.groovy | 1 - .../job_PostCommit_Go_ValidatesRunner_Spark.groovy | 1 - .../jenkins/job_PreCommit_Go_Portable.groovy | 1 - .../beam/runners/flink/FlinkPipelineOptions.java | 18 +- .../beam/runners/dataflow/DataflowRunner.java | 80 +++--- .../options/DataflowPipelineWorkerPoolOptions.java | 19 +- .../beam/runners/dataflow/DataflowRunnerTest.java | 54 ++-- .../runners/spark/SparkCommonPipelineOptions.java | 18 +- .../runners/twister2/Twister2PipelineOptions.java | 11 +- sdks/go/pkg/beam/coder.go | 14 +- sdks/go/pkg/beam/core/graph/coder/row.go | 3 +- sdks/go/pkg/beam/core/graph/coder/row_decoder.go | 2 +- sdks/go/pkg/beam/core/graph/coder/row_encoder.go | 2 +- sdks/go/pkg/beam/encoding.go | 50 +++- sdks/go/pkg/beam/util/shimx/generate.go| 15 ++ sdks/go/pkg/beam/util/shimx/generate_test.go | 2 +- sdks/go/test/build.gradle | 8 +- .../beam/sdk/options/FileStagingOptions.java} | 30 +-- .../beam/sdk/options/PortablePipelineOptions.java | 18 +- .../org/apache/beam/sdk/transforms/Reshuffle.java | 52 ++-- .../sql/zetasql/ZetaSqlJavaUdfTypeTest.java| 276 + .../harness/data/PCollectionConsumerRegistry.java | 49 +++- sdks/java/io/hadoop-format/build.gradle| 2 +- .../beam/sdk/nexmark/NexmarkConfiguration.java | 13 + .../apache/beam/sdk/nexmark/NexmarkLauncher.java | 4 + .../apache/beam/sdk/nexmark/NexmarkQueryName.java | 1 + .../apache/beam/sdk/nexmark/queries/Query13.java | 107 .../apache_beam/runners/portability/job_server.py | 5 +- .../runners/portability/spark_runner.py| 4 +- sdks/python/apache_beam/utils/subprocess_server.py | 10 +- 31 files changed, 671 insertions(+), 200 deletions(-) copy sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java => core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java} (58%) create mode 100644 sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java create mode 100644 sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java
[beam] branch master updated: Optimize reservoir sampling calculation
This is an automated email from the ASF dual-hosted git repository. boyuanz pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 6ddf0c2 Optimize reservoir sampling calculation new bcced0c Merge pull request #14406 from [BEAM-11836] Optimize reservoir sampling calculation in PCollectionConsumerRegistry 6ddf0c2 is described below commit 6ddf0c2ff6706883771ec9cb13309101b34b80c4 Author: kileys AuthorDate: Fri Apr 2 00:57:38 2021 + Optimize reservoir sampling calculation --- .../harness/data/PCollectionConsumerRegistry.java | 49 -- 1 file changed, 36 insertions(+), 13 deletions(-) diff --git a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java index 14d245dc..457cbe8 100644 --- a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java +++ b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java @@ -377,23 +377,46 @@ public class PCollectionConsumerRegistry { } } -// Lowest sampling probability: 0.001%. -private static final int SAMPLING_TOKEN_UPPER_BOUND = 100; -private static final int SAMPLING_CUTOFF = 10; -private int samplingToken = 0; +private static final int RESERVOIR_SIZE = 10; +private static final int SAMPLING_THRESHOLD = 30; +private long samplingToken = 0; +private long nextSamplingToken = 0; private Random randomGenerator = new Random(); -// TODO(BEAM-11836): Implement fast approximation for reservoir sampling. private boolean shouldSampleElement() { // Sampling probability decreases as the element count is increasing. - // We unconditionally sample the first samplingCutoff elements. For the - // next samplingCutoff elements, the sampling probability drops from 100% - // to 50%. The probability of sampling the Nth element is: - // min(1, samplingCutoff / N), with an additional lower bound of - // samplingCutoff / samplingTokenUpperBound. This algorithm may be refined - // later. - samplingToken = Math.min(samplingToken + 1, SAMPLING_TOKEN_UPPER_BOUND); - return randomGenerator.nextInt(samplingToken) < SAMPLING_CUTOFF; + // We unconditionally sample the first samplingCutoff elements. Calculating + // nextInt(samplingToken) for each element is expensive, so after a threshold, calculate the + // gap to next sample. + // https://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/ + + // Reset samplingToken if it's going to exceed the max value. + if (samplingToken + 1 == Long.MAX_VALUE) { +samplingToken = 0; +nextSamplingToken = getNextSamplingToken(samplingToken); + } + + samplingToken++; + // Use traditional sampling until the threshold of 30 + if (nextSamplingToken == 0) { +if (randomGenerator.nextInt((int) samplingToken) <= RESERVOIR_SIZE) { + if (samplingToken > SAMPLING_THRESHOLD) { +nextSamplingToken = getNextSamplingToken(samplingToken); + } + return true; +} + } else if (samplingToken >= nextSamplingToken) { +nextSamplingToken = getNextSamplingToken(samplingToken); +return true; + } + return false; +} + +private long getNextSamplingToken(long samplingToken) { + double gap = + Math.log(1.0 - randomGenerator.nextDouble()) + / Math.log(1.0 - RESERVOIR_SIZE / (double) samplingToken); + return samplingToken + (int) gap; } } }
[beam] branch master updated (6988b94 -> fe9a406)
This is an automated email from the ASF dual-hosted git repository. chamikara pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 6988b94 Merge pull request #14430: remove typo in encoding.go add 91e35b5 Updates Dataflow worker pool config to include all environments used by transforms. Sets environment Ids in the Dataflow worker pool config which will be used to map bundles to SDK Harnesses. add fe9a406 Merge pull request #14408: [BEAM-11935] Updates Dataflow config to include all environments used by transforms No new revisions were added by this update. Summary of changes: .../beam/runners/dataflow/DataflowRunner.java | 80 ++ .../beam/runners/dataflow/DataflowRunnerTest.java | 54 --- 2 files changed, 82 insertions(+), 52 deletions(-)
[beam] branch master updated (312c6eb -> 6988b94)
This is an automated email from the ASF dual-hosted git repository. danoliveira pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 312c6eb Merge pull request #14429 from ibzib/BEAM-12095 add 09bc53a remove typo in encoding.go add 6988b94 Merge pull request #14430: remove typo in encoding.go No new revisions were added by this update. Summary of changes: sdks/go/pkg/beam/encoding.go | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[beam] branch master updated: [BEAM-12095] Fix Spark job server path.
This is an automated email from the ASF dual-hosted git repository. ibzib pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 3e2e630 [BEAM-12095] Fix Spark job server path. new 312c6eb Merge pull request #14429 from ibzib/BEAM-12095 3e2e630 is described below commit 3e2e630fa1f93e182d324c43ba49c4b4e1862031 Author: Kyle Weaver AuthorDate: Mon Apr 5 09:53:38 2021 -0700 [BEAM-12095] Fix Spark job server path. --- sdks/python/apache_beam/runners/portability/job_server.py | 5 +++-- sdks/python/apache_beam/runners/portability/spark_runner.py | 4 +++- sdks/python/apache_beam/utils/subprocess_server.py | 10 -- 3 files changed, 14 insertions(+), 5 deletions(-) diff --git a/sdks/python/apache_beam/runners/portability/job_server.py b/sdks/python/apache_beam/runners/portability/job_server.py index dfc8802..2d581c4 100644 --- a/sdks/python/apache_beam/runners/portability/job_server.py +++ b/sdks/python/apache_beam/runners/portability/job_server.py @@ -140,8 +140,9 @@ class JavaJarJobServer(SubprocessJobServer): raise NotImplementedError(type(self)) @staticmethod - def path_to_beam_jar(gradle_target): -return subprocess_server.JavaJarServer.path_to_beam_jar(gradle_target) + def path_to_beam_jar(gradle_target, artifact_id=None): +return subprocess_server.JavaJarServer.path_to_beam_jar( +gradle_target, artifact_id=artifact_id) @staticmethod def local_jar(url): diff --git a/sdks/python/apache_beam/runners/portability/spark_runner.py b/sdks/python/apache_beam/runners/portability/spark_runner.py index ec7046c..99337b0 100644 --- a/sdks/python/apache_beam/runners/portability/spark_runner.py +++ b/sdks/python/apache_beam/runners/portability/spark_runner.py @@ -82,7 +82,9 @@ class SparkJarJobServer(job_server.JavaJarJobServer): self._jar) return self._jar else: - return self.path_to_beam_jar(':runners:spark:2:job-server:shadowJar') + return self.path_to_beam_jar( + ':runners:spark:2:job-server:shadowJar', + artifact_id='beam-runners-spark-job-server') def java_arguments( self, job_port, artifact_port, expansion_port, artifacts_dir): diff --git a/sdks/python/apache_beam/utils/subprocess_server.py b/sdks/python/apache_beam/utils/subprocess_server.py index 7ff4261..d7af0b4 100644 --- a/sdks/python/apache_beam/utils/subprocess_server.py +++ b/sdks/python/apache_beam/utils/subprocess_server.py @@ -205,12 +205,18 @@ class JavaJarServer(SubprocessServer): ]) @classmethod - def path_to_beam_jar(cls, gradle_target, appendix=None, version=beam_version): + def path_to_beam_jar( + cls, + gradle_target, + appendix=None, + version=beam_version, + artifact_id=None): if gradle_target in cls._BEAM_SERVICES.replacements: return cls._BEAM_SERVICES.replacements[gradle_target] gradle_package = gradle_target.strip(':').rsplit(':', 1)[0] -artifact_id = 'beam-' + gradle_package.replace(':', '-') +if not artifact_id: + artifact_id = 'beam-' + gradle_package.replace(':', '-') project_root = os.path.sep.join( os.path.abspath(__file__).split(os.path.sep)[:-5]) local_path = os.path.join(
[beam] branch master updated (99a3998 -> a01bc08)
This is an automated email from the ASF dual-hosted git repository. ibzib pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 99a3998 [BEAM-12060] Remove overwriting jenkins property. (#14432) new 3420488 [BEAM-10925] Roundtrip tests for literals through Java UDF. new 7845c58 [BEAM-10925] Simplify test setup. new a01bc08 Merge pull request #14418 from ibzib/java-udf-types The 31370 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../sql/zetasql/ZetaSqlJavaUdfTypeTest.java| 276 + 1 file changed, 276 insertions(+) create mode 100644 sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java
[beam] branch master updated (7134cfd -> 99a3998)
This is an automated email from the ASF dual-hosted git repository. lostluck pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 7134cfd Merge pull request #14423: [BEAM-4106] Add FileStagingOptions and merge staging file options between runners add 99a3998 [BEAM-12060] Remove overwriting jenkins property. (#14432) No new revisions were added by this update. Summary of changes: .test-infra/jenkins/job_PostCommit_Go.groovy | 1 - .../jenkins/job_PostCommit_Go_ValidatesRunner_Flink.groovy| 1 - .../jenkins/job_PostCommit_Go_ValidatesRunner_Spark.groovy| 1 - .test-infra/jenkins/job_PreCommit_Go_Portable.groovy | 1 - sdks/go/test/build.gradle | 8 5 files changed, 4 insertions(+), 8 deletions(-)
[beam] branch master updated (abbe14f -> 7134cfd)
This is an automated email from the ASF dual-hosted git repository. iemejia pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from abbe14f [BEAM-12083] Nexmark Query 13. (#14404) add 37db190 [BEAM-4106] Add FileStagingOptions and merge staging file options between runners add 7134cfd Merge pull request #14423: [BEAM-4106] Add FileStagingOptions and merge staging file options between runners No new revisions were added by this update. Summary of changes: .../beam/runners/flink/FlinkPipelineOptions.java | 18 ++--- .../options/DataflowPipelineWorkerPoolOptions.java | 19 ++ .../runners/spark/SparkCommonPipelineOptions.java | 18 +++-- .../runners/twister2/Twister2PipelineOptions.java | 11 +++- .../beam/sdk/options/FileStagingOptions.java} | 30 +++--- .../beam/sdk/options/PortablePipelineOptions.java | 18 + 6 files changed, 26 insertions(+), 88 deletions(-) copy sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java => core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java} (58%)
[beam] branch lostluck-patch-5 created (now 09bc53a)
This is an automated email from the ASF dual-hosted git repository. lostluck pushed a change to branch lostluck-patch-5 in repository https://gitbox.apache.org/repos/asf/beam.git. at 09bc53a remove typo in encoding.go No new revisions were added by this update.
[beam] branch master updated (5ffb3ee -> abbe14f)
This is an automated email from the ASF dual-hosted git repository. amaliujia pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 5ffb3ee [BEAM-9615] Misc final schema cleanups. (#14285) add abbe14f [BEAM-12083] Nexmark Query 13. (#14404) No new revisions were added by this update. Summary of changes: .../org/apache/beam/sdk/transforms/Reshuffle.java | 52 +- .../beam/sdk/nexmark/NexmarkConfiguration.java | 13 +++ .../apache/beam/sdk/nexmark/NexmarkLauncher.java | 4 + .../apache/beam/sdk/nexmark/NexmarkQueryName.java | 1 + .../apache/beam/sdk/nexmark/queries/Query13.java | 107 + 5 files changed, 151 insertions(+), 26 deletions(-) create mode 100644 sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java
[beam] branch master updated (c4d1ab3 -> 5ffb3ee)
This is an automated email from the ASF dual-hosted git repository. lostluck pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from c4d1ab3 Merge pull request #14424: [BEAM-12067] Bump elasticsearch-rest-high-level-client to 7.12.0 add 5ffb3ee [BEAM-9615] Misc final schema cleanups. (#14285) No new revisions were added by this update. Summary of changes: sdks/go/pkg/beam/coder.go| 14 ++- sdks/go/pkg/beam/core/graph/coder/row.go | 3 +- sdks/go/pkg/beam/core/graph/coder/row_decoder.go | 2 +- sdks/go/pkg/beam/core/graph/coder/row_encoder.go | 2 +- sdks/go/pkg/beam/encoding.go | 50 +++- sdks/go/pkg/beam/util/shimx/generate.go | 15 +++ sdks/go/pkg/beam/util/shimx/generate_test.go | 2 +- 7 files changed, 81 insertions(+), 7 deletions(-)
[beam] branch master updated (bee495f -> c4d1ab3)
This is an automated email from the ASF dual-hosted git repository. aromanenko pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from bee495f Merge pull request #14407: [BEAM-12060] Fix failing Go Postcommits, jenkins support for Gradle tasks. add 4477b41 [BEAM-12067] Bump elasticsearch-rest-high-level-client to 7.12.0 add c4d1ab3 Merge pull request #14424: [BEAM-12067] Bump elasticsearch-rest-high-level-client to 7.12.0 No new revisions were added by this update. Summary of changes: sdks/java/io/hadoop-format/build.gradle | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)