[beam] tag nightly-master updated (bee495f -> bcced0c)

2021-04-05 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to tag nightly-master
in repository https://gitbox.apache.org/repos/asf/beam.git.


*** WARNING: tag nightly-master was modified! ***

from bee495f  (commit)
  to bcced0c  (commit)
from bee495f  Merge pull request #14407: [BEAM-12060] Fix failing Go 
Postcommits, jenkins support for Gradle tasks.
 add 4477b41  [BEAM-12067] Bump elasticsearch-rest-high-level-client to 
7.12.0
 add c4d1ab3  Merge pull request #14424: [BEAM-12067] Bump 
elasticsearch-rest-high-level-client to 7.12.0
 add 5ffb3ee  [BEAM-9615] Misc final schema cleanups. (#14285)
 add abbe14f  [BEAM-12083] Nexmark Query 13. (#14404)
 add 37db190  [BEAM-4106] Add FileStagingOptions and merge staging file 
options between runners
 add 7134cfd  Merge pull request #14423: [BEAM-4106] Add FileStagingOptions 
and merge staging file options between runners
 add 99a3998  [BEAM-12060] Remove overwriting jenkins property. (#14432)
 add 3420488  [BEAM-10925] Roundtrip tests for literals through Java UDF.
 add 7845c58  [BEAM-10925] Simplify test setup.
 add a01bc08  Merge pull request #14418 from ibzib/java-udf-types
 add 3e2e630  [BEAM-12095] Fix Spark job server path.
 add 312c6eb  Merge pull request #14429 from ibzib/BEAM-12095
 add 09bc53a  remove typo in encoding.go
 add 6988b94  Merge pull request #14430: remove typo in encoding.go
 add 91e35b5  Updates Dataflow worker pool config to include all 
environments used by transforms. Sets environment Ids in the Dataflow worker 
pool config which will be used to map bundles to SDK Harnesses.
 add fe9a406  Merge pull request #14408: [BEAM-11935] Updates Dataflow 
config to include all environments used by transforms
 add 6ddf0c2  Optimize reservoir sampling calculation
 add bcced0c  Merge pull request #14406 from [BEAM-11836] Optimize 
reservoir sampling calculation in PCollectionConsumerRegistry

No new revisions were added by this update.

Summary of changes:
 .test-infra/jenkins/job_PostCommit_Go.groovy   |   1 -
 .../job_PostCommit_Go_ValidatesRunner_Flink.groovy |   1 -
 .../job_PostCommit_Go_ValidatesRunner_Spark.groovy |   1 -
 .../jenkins/job_PreCommit_Go_Portable.groovy   |   1 -
 .../beam/runners/flink/FlinkPipelineOptions.java   |  18 +-
 .../beam/runners/dataflow/DataflowRunner.java  |  80 +++---
 .../options/DataflowPipelineWorkerPoolOptions.java |  19 +-
 .../beam/runners/dataflow/DataflowRunnerTest.java  |  54 ++--
 .../runners/spark/SparkCommonPipelineOptions.java  |  18 +-
 .../runners/twister2/Twister2PipelineOptions.java  |  11 +-
 sdks/go/pkg/beam/coder.go  |  14 +-
 sdks/go/pkg/beam/core/graph/coder/row.go   |   3 +-
 sdks/go/pkg/beam/core/graph/coder/row_decoder.go   |   2 +-
 sdks/go/pkg/beam/core/graph/coder/row_encoder.go   |   2 +-
 sdks/go/pkg/beam/encoding.go   |  50 +++-
 sdks/go/pkg/beam/util/shimx/generate.go|  15 ++
 sdks/go/pkg/beam/util/shimx/generate_test.go   |   2 +-
 sdks/go/test/build.gradle  |   8 +-
 .../beam/sdk/options/FileStagingOptions.java}  |  30 +--
 .../beam/sdk/options/PortablePipelineOptions.java  |  18 +-
 .../org/apache/beam/sdk/transforms/Reshuffle.java  |  52 ++--
 .../sql/zetasql/ZetaSqlJavaUdfTypeTest.java| 276 +
 .../harness/data/PCollectionConsumerRegistry.java  |  49 +++-
 sdks/java/io/hadoop-format/build.gradle|   2 +-
 .../beam/sdk/nexmark/NexmarkConfiguration.java |  13 +
 .../apache/beam/sdk/nexmark/NexmarkLauncher.java   |   4 +
 .../apache/beam/sdk/nexmark/NexmarkQueryName.java  |   1 +
 .../apache/beam/sdk/nexmark/queries/Query13.java   | 107 
 .../apache_beam/runners/portability/job_server.py  |   5 +-
 .../runners/portability/spark_runner.py|   4 +-
 sdks/python/apache_beam/utils/subprocess_server.py |  10 +-
 31 files changed, 671 insertions(+), 200 deletions(-)
 copy 
sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java
 => core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java} 
(58%)
 create mode 100644 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java
 create mode 100644 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java


[beam] branch master updated: Optimize reservoir sampling calculation

2021-04-05 Thread boyuanz
This is an automated email from the ASF dual-hosted git repository.

boyuanz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 6ddf0c2  Optimize reservoir sampling calculation
 new bcced0c  Merge pull request #14406 from [BEAM-11836] Optimize 
reservoir sampling calculation in PCollectionConsumerRegistry
6ddf0c2 is described below

commit 6ddf0c2ff6706883771ec9cb13309101b34b80c4
Author: kileys 
AuthorDate: Fri Apr 2 00:57:38 2021 +

Optimize reservoir sampling calculation
---
 .../harness/data/PCollectionConsumerRegistry.java  | 49 --
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
index 14d245dc..457cbe8 100644
--- 
a/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
+++ 
b/sdks/java/harness/src/main/java/org/apache/beam/fn/harness/data/PCollectionConsumerRegistry.java
@@ -377,23 +377,46 @@ public class PCollectionConsumerRegistry {
   }
 }
 
-// Lowest sampling probability: 0.001%.
-private static final int SAMPLING_TOKEN_UPPER_BOUND = 100;
-private static final int SAMPLING_CUTOFF = 10;
-private int samplingToken = 0;
+private static final int RESERVOIR_SIZE = 10;
+private static final int SAMPLING_THRESHOLD = 30;
+private long samplingToken = 0;
+private long nextSamplingToken = 0;
 private Random randomGenerator = new Random();
 
-// TODO(BEAM-11836): Implement fast approximation for reservoir sampling.
 private boolean shouldSampleElement() {
   // Sampling probability decreases as the element count is increasing.
-  // We unconditionally sample the first samplingCutoff elements. For the
-  // next samplingCutoff elements, the sampling probability drops from 100%
-  // to 50%. The probability of sampling the Nth element is:
-  // min(1, samplingCutoff / N), with an additional lower bound of
-  // samplingCutoff / samplingTokenUpperBound. This algorithm may be 
refined
-  // later.
-  samplingToken = Math.min(samplingToken + 1, SAMPLING_TOKEN_UPPER_BOUND);
-  return randomGenerator.nextInt(samplingToken) < SAMPLING_CUTOFF;
+  // We unconditionally sample the first samplingCutoff elements. 
Calculating
+  // nextInt(samplingToken) for each element is expensive, so after a 
threshold, calculate the
+  // gap to next sample.
+  // 
https://erikerlandson.github.io/blog/2015/11/20/very-fast-reservoir-sampling/
+
+  // Reset samplingToken if it's going to exceed the max value.
+  if (samplingToken + 1 == Long.MAX_VALUE) {
+samplingToken = 0;
+nextSamplingToken = getNextSamplingToken(samplingToken);
+  }
+
+  samplingToken++;
+  // Use traditional sampling until the threshold of 30
+  if (nextSamplingToken == 0) {
+if (randomGenerator.nextInt((int) samplingToken) <= RESERVOIR_SIZE) {
+  if (samplingToken > SAMPLING_THRESHOLD) {
+nextSamplingToken = getNextSamplingToken(samplingToken);
+  }
+  return true;
+}
+  } else if (samplingToken >= nextSamplingToken) {
+nextSamplingToken = getNextSamplingToken(samplingToken);
+return true;
+  }
+  return false;
+}
+
+private long getNextSamplingToken(long samplingToken) {
+  double gap =
+  Math.log(1.0 - randomGenerator.nextDouble())
+  / Math.log(1.0 - RESERVOIR_SIZE / (double) samplingToken);
+  return samplingToken + (int) gap;
 }
   }
 }


[beam] branch master updated (6988b94 -> fe9a406)

2021-04-05 Thread chamikara
This is an automated email from the ASF dual-hosted git repository.

chamikara pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 6988b94  Merge pull request #14430: remove typo in encoding.go
 add 91e35b5  Updates Dataflow worker pool config to include all 
environments used by transforms. Sets environment Ids in the Dataflow worker 
pool config which will be used to map bundles to SDK Harnesses.
 add fe9a406  Merge pull request #14408: [BEAM-11935] Updates Dataflow 
config to include all environments used by transforms

No new revisions were added by this update.

Summary of changes:
 .../beam/runners/dataflow/DataflowRunner.java  | 80 ++
 .../beam/runners/dataflow/DataflowRunnerTest.java  | 54 ---
 2 files changed, 82 insertions(+), 52 deletions(-)


[beam] branch master updated (312c6eb -> 6988b94)

2021-04-05 Thread danoliveira
This is an automated email from the ASF dual-hosted git repository.

danoliveira pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 312c6eb  Merge pull request #14429 from ibzib/BEAM-12095
 add 09bc53a  remove typo in encoding.go
 add 6988b94  Merge pull request #14430: remove typo in encoding.go

No new revisions were added by this update.

Summary of changes:
 sdks/go/pkg/beam/encoding.go | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


[beam] branch master updated: [BEAM-12095] Fix Spark job server path.

2021-04-05 Thread ibzib
This is an automated email from the ASF dual-hosted git repository.

ibzib pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 3e2e630  [BEAM-12095] Fix Spark job server path.
 new 312c6eb  Merge pull request #14429 from ibzib/BEAM-12095
3e2e630 is described below

commit 3e2e630fa1f93e182d324c43ba49c4b4e1862031
Author: Kyle Weaver 
AuthorDate: Mon Apr 5 09:53:38 2021 -0700

[BEAM-12095] Fix Spark job server path.
---
 sdks/python/apache_beam/runners/portability/job_server.py   |  5 +++--
 sdks/python/apache_beam/runners/portability/spark_runner.py |  4 +++-
 sdks/python/apache_beam/utils/subprocess_server.py  | 10 --
 3 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/sdks/python/apache_beam/runners/portability/job_server.py 
b/sdks/python/apache_beam/runners/portability/job_server.py
index dfc8802..2d581c4 100644
--- a/sdks/python/apache_beam/runners/portability/job_server.py
+++ b/sdks/python/apache_beam/runners/portability/job_server.py
@@ -140,8 +140,9 @@ class JavaJarJobServer(SubprocessJobServer):
 raise NotImplementedError(type(self))
 
   @staticmethod
-  def path_to_beam_jar(gradle_target):
-return subprocess_server.JavaJarServer.path_to_beam_jar(gradle_target)
+  def path_to_beam_jar(gradle_target, artifact_id=None):
+return subprocess_server.JavaJarServer.path_to_beam_jar(
+gradle_target, artifact_id=artifact_id)
 
   @staticmethod
   def local_jar(url):
diff --git a/sdks/python/apache_beam/runners/portability/spark_runner.py 
b/sdks/python/apache_beam/runners/portability/spark_runner.py
index ec7046c..99337b0 100644
--- a/sdks/python/apache_beam/runners/portability/spark_runner.py
+++ b/sdks/python/apache_beam/runners/portability/spark_runner.py
@@ -82,7 +82,9 @@ class SparkJarJobServer(job_server.JavaJarJobServer):
   self._jar)
   return self._jar
 else:
-  return self.path_to_beam_jar(':runners:spark:2:job-server:shadowJar')
+  return self.path_to_beam_jar(
+  ':runners:spark:2:job-server:shadowJar',
+  artifact_id='beam-runners-spark-job-server')
 
   def java_arguments(
   self, job_port, artifact_port, expansion_port, artifacts_dir):
diff --git a/sdks/python/apache_beam/utils/subprocess_server.py 
b/sdks/python/apache_beam/utils/subprocess_server.py
index 7ff4261..d7af0b4 100644
--- a/sdks/python/apache_beam/utils/subprocess_server.py
+++ b/sdks/python/apache_beam/utils/subprocess_server.py
@@ -205,12 +205,18 @@ class JavaJarServer(SubprocessServer):
 ])
 
   @classmethod
-  def path_to_beam_jar(cls, gradle_target, appendix=None, 
version=beam_version):
+  def path_to_beam_jar(
+  cls,
+  gradle_target,
+  appendix=None,
+  version=beam_version,
+  artifact_id=None):
 if gradle_target in cls._BEAM_SERVICES.replacements:
   return cls._BEAM_SERVICES.replacements[gradle_target]
 
 gradle_package = gradle_target.strip(':').rsplit(':', 1)[0]
-artifact_id = 'beam-' + gradle_package.replace(':', '-')
+if not artifact_id:
+  artifact_id = 'beam-' + gradle_package.replace(':', '-')
 project_root = os.path.sep.join(
 os.path.abspath(__file__).split(os.path.sep)[:-5])
 local_path = os.path.join(


[beam] branch master updated (99a3998 -> a01bc08)

2021-04-05 Thread ibzib
This is an automated email from the ASF dual-hosted git repository.

ibzib pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 99a3998  [BEAM-12060] Remove overwriting jenkins property. (#14432)
 new 3420488  [BEAM-10925] Roundtrip tests for literals through Java UDF.
 new 7845c58  [BEAM-10925] Simplify test setup.
 new a01bc08  Merge pull request #14418 from ibzib/java-udf-types

The 31370 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../sql/zetasql/ZetaSqlJavaUdfTypeTest.java| 276 +
 1 file changed, 276 insertions(+)
 create mode 100644 
sdks/java/extensions/sql/zetasql/src/test/java/org/apache/beam/sdk/extensions/sql/zetasql/ZetaSqlJavaUdfTypeTest.java


[beam] branch master updated (7134cfd -> 99a3998)

2021-04-05 Thread lostluck
This is an automated email from the ASF dual-hosted git repository.

lostluck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 7134cfd  Merge pull request #14423: [BEAM-4106] Add FileStagingOptions 
and merge staging file options between runners
 add 99a3998  [BEAM-12060] Remove overwriting jenkins property. (#14432)

No new revisions were added by this update.

Summary of changes:
 .test-infra/jenkins/job_PostCommit_Go.groovy  | 1 -
 .../jenkins/job_PostCommit_Go_ValidatesRunner_Flink.groovy| 1 -
 .../jenkins/job_PostCommit_Go_ValidatesRunner_Spark.groovy| 1 -
 .test-infra/jenkins/job_PreCommit_Go_Portable.groovy  | 1 -
 sdks/go/test/build.gradle | 8 
 5 files changed, 4 insertions(+), 8 deletions(-)


[beam] branch master updated (abbe14f -> 7134cfd)

2021-04-05 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from abbe14f  [BEAM-12083] Nexmark Query 13. (#14404)
 add 37db190  [BEAM-4106] Add FileStagingOptions and merge staging file 
options between runners
 add 7134cfd  Merge pull request #14423: [BEAM-4106] Add FileStagingOptions 
and merge staging file options between runners

No new revisions were added by this update.

Summary of changes:
 .../beam/runners/flink/FlinkPipelineOptions.java   | 18 ++---
 .../options/DataflowPipelineWorkerPoolOptions.java | 19 ++
 .../runners/spark/SparkCommonPipelineOptions.java  | 18 +++--
 .../runners/twister2/Twister2PipelineOptions.java  | 11 +++-
 .../beam/sdk/options/FileStagingOptions.java}  | 30 +++---
 .../beam/sdk/options/PortablePipelineOptions.java  | 18 +
 6 files changed, 26 insertions(+), 88 deletions(-)
 copy 
sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/BeamSqlSeekableTable.java
 => core/src/main/java/org/apache/beam/sdk/options/FileStagingOptions.java} 
(58%)


[beam] branch lostluck-patch-5 created (now 09bc53a)

2021-04-05 Thread lostluck
This is an automated email from the ASF dual-hosted git repository.

lostluck pushed a change to branch lostluck-patch-5
in repository https://gitbox.apache.org/repos/asf/beam.git.


  at 09bc53a  remove typo in encoding.go

No new revisions were added by this update.


[beam] branch master updated (5ffb3ee -> abbe14f)

2021-04-05 Thread amaliujia
This is an automated email from the ASF dual-hosted git repository.

amaliujia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 5ffb3ee  [BEAM-9615] Misc final schema cleanups. (#14285)
 add abbe14f  [BEAM-12083] Nexmark Query 13. (#14404)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/beam/sdk/transforms/Reshuffle.java  |  52 +-
 .../beam/sdk/nexmark/NexmarkConfiguration.java |  13 +++
 .../apache/beam/sdk/nexmark/NexmarkLauncher.java   |   4 +
 .../apache/beam/sdk/nexmark/NexmarkQueryName.java  |   1 +
 .../apache/beam/sdk/nexmark/queries/Query13.java   | 107 +
 5 files changed, 151 insertions(+), 26 deletions(-)
 create mode 100644 
sdks/java/testing/nexmark/src/main/java/org/apache/beam/sdk/nexmark/queries/Query13.java


[beam] branch master updated (c4d1ab3 -> 5ffb3ee)

2021-04-05 Thread lostluck
This is an automated email from the ASF dual-hosted git repository.

lostluck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from c4d1ab3  Merge pull request #14424: [BEAM-12067] Bump 
elasticsearch-rest-high-level-client to 7.12.0
 add 5ffb3ee  [BEAM-9615] Misc final schema cleanups. (#14285)

No new revisions were added by this update.

Summary of changes:
 sdks/go/pkg/beam/coder.go| 14 ++-
 sdks/go/pkg/beam/core/graph/coder/row.go |  3 +-
 sdks/go/pkg/beam/core/graph/coder/row_decoder.go |  2 +-
 sdks/go/pkg/beam/core/graph/coder/row_encoder.go |  2 +-
 sdks/go/pkg/beam/encoding.go | 50 +++-
 sdks/go/pkg/beam/util/shimx/generate.go  | 15 +++
 sdks/go/pkg/beam/util/shimx/generate_test.go |  2 +-
 7 files changed, 81 insertions(+), 7 deletions(-)


[beam] branch master updated (bee495f -> c4d1ab3)

2021-04-05 Thread aromanenko
This is an automated email from the ASF dual-hosted git repository.

aromanenko pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from bee495f  Merge pull request #14407: [BEAM-12060] Fix failing Go 
Postcommits, jenkins support for Gradle tasks.
 add 4477b41  [BEAM-12067] Bump elasticsearch-rest-high-level-client to 
7.12.0
 add c4d1ab3  Merge pull request #14424: [BEAM-12067] Bump 
elasticsearch-rest-high-level-client to 7.12.0

No new revisions were added by this update.

Summary of changes:
 sdks/java/io/hadoop-format/build.gradle | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)