[beam] branch master updated: [BEAM-8550] @RequiresTimeSortedInput: working with legacy flink and spark

2020-02-05 Thread janl
This is an automated email from the ASF dual-hosted git repository.

janl pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new 377f1ac  [BEAM-8550] @RequiresTimeSortedInput: working with legacy 
flink and spark
 new 041f7af  Merge pull request #8774 from 
je-ik/requires-time-sorted-input-draft: [BEAM-8550] Requires time sorted input
377f1ac is described below

commit 377f1ac7ebbc4253299e7efbdb3ad58d0c9e14c5
Author: Jan Lukavsky 
AuthorDate: Thu Jan 30 13:10:31 2020 +0100

[BEAM-8550] @RequiresTimeSortedInput: working with legacy flink and spark
---
 .gitignore |   1 +
 .../pipeline/src/main/proto/beam_runner_api.proto  |   4 +-
 .../translation/operators/ApexParDoOperator.java   |   8 +-
 .../core/construction/ParDoTranslation.java|   8 +
 .../runners/core/construction/SplittableParDo.java |   5 +
 .../org/apache/beam/runners/core/DoFnRunners.java  |  60 -
 .../apache/beam/runners/core/SimpleDoFnRunner.java |   4 +-
 .../beam/runners/core/StatefulDoFnRunner.java  | 172 ++---
 .../SimplePushbackSideInputDoFnRunnerTest.java |  26 +-
 .../beam/runners/core/StatefulDoFnRunnerTest.java  | 285 ++---
 .../apache/beam/runners/direct/ParDoEvaluator.java |  32 ++-
 .../runners/direct/ParDoMultiOverrideFactory.java  |  73 +++---
 .../beam/runners/direct/QuiescenceDriver.java  |   2 +-
 .../direct/StatefulParDoEvaluatorFactoryTest.java  | 149 +--
 .../FlinkBatchPortablePipelineTranslator.java  |  11 +
 .../flink/FlinkBatchTransformTranslators.java  |  28 +-
 .../FlinkStreamingPortablePipelineTranslator.java  |   1 -
 .../flink/FlinkStreamingTransformTranslators.java  |   8 -
 .../utils/FlinkPortableRunnerUtils.java|  58 +
 .../wrappers/streaming/DoFnOperator.java   |  59 +++--
 .../streaming/ExecutableStageDoFnOperator.java | 114 +
 .../wrappers/streaming/SplittableDoFnOperator.java |   6 +-
 .../wrappers/streaming/WindowDoFnOperator.java |   4 +-
 .../runners/flink/FlinkPipelineOptionsTest.java|   2 -
 .../wrappers/streaming/DoFnOperatorTest.java   |  21 --
 .../streaming/ExecutableStageDoFnOperatorTest.java |   6 +-
 .../dataflow/PrimitiveParDoSingleFactory.java  |   5 +
 .../runners/samza/runtime/SamzaDoFnRunners.java|   5 +-
 .../beam/runners/spark/coders/CoderHelpers.java|  47 
 .../spark/translation/TransformTranslator.java | 175 +++--
 .../spark/translation/TransformTranslatorTest.java | 106 
 .../apache/beam/sdk/runners/AppliedPTransform.java |   9 +
 .../sdk/testing/UsesRequiresTimeSortedInput.java   |  27 ++
 .../java/org/apache/beam/sdk/transforms/DoFn.java  |  27 ++
 .../beam/sdk/transforms/reflect/DoFnSignature.java |   8 +
 .../sdk/transforms/reflect/DoFnSignatures.java |   2 +
 .../org/apache/beam/sdk/transforms/ParDoTest.java  | 191 +-
 37 files changed, 1492 insertions(+), 257 deletions(-)

diff --git a/.gitignore b/.gitignore
index 5732b9c..f030006 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,6 +8,7 @@
 # Ignore files generated by the Gradle build process.
 **/.gradle/**/*
 **/.gogradle/**/*
+**/.nb-gradle/**/*
 **/gogradle.lock
 **/build/**/*
 .test-infra/**/vendor/**/*
diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto 
b/model/pipeline/src/main/proto/beam_runner_api.proto
index 57c5295..81e4d2d 100644
--- a/model/pipeline/src/main/proto/beam_runner_api.proto
+++ b/model/pipeline/src/main/proto/beam_runner_api.proto
@@ -175,7 +175,6 @@ message StandardPTransforms {
   enum Primitives {
 // Represents Beam's parallel do operation.
 // Payload: ParDoPayload.
-// TODO(BEAM-3595): Change this to beam:transform:pardo:v1.
 PAR_DO = 0 [(beam_urn) = "beam:transform:pardo:v1"];
 
 // Represents Beam's flatten operation.
@@ -398,6 +397,9 @@ message ParDoPayload {
 
   // (Optional) A mapping of local timer family names to timer specifications.
   map timer_family_specs = 9;
+  
+  // Whether this stage requires time sorted input
+  bool requires_time_sorted_input = 10;
 }
 
 // Parameters that a UDF might require.
diff --git 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java
 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java
index 4841c6a..8df7997 100644
--- 
a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java
+++ 
b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java
@@ -511,7 +511,13 @@ public class ApexParDoOperator extends 
BaseOperator
 
   doFnRunner =
   DoFnRunners.defaultStatefulDoFnRunner(
-  doFn, doFnRunner, windowingStrategy, cleanupTimer, stateCleaner);
+  doFn,
+  inputCoder,
+ 

[beam] branch master updated (5f5efc7 -> 00b49f2)

2020-02-05 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 5f5efc7  [BEAM-9037] Instant and duration as logical type (#10486)
 add 00b49f2  [BEAM-2645] Define the display data model type

No new revisions were added by this update.

Summary of changes:
 .../pipeline/src/main/proto/beam_runner_api.proto  | 112 +++--
 .../core/construction/DisplayDataTranslation.java  |  61 ---
 .../core/construction/PCollectionTranslation.java  |   1 -
 .../core/construction/PTransformTranslation.java   |   2 +-
 .../construction/DisplayDataTranslationTest.java   |  67 
 5 files changed, 174 insertions(+), 69 deletions(-)
 create mode 100644 
runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DisplayDataTranslationTest.java



[beam] branch master updated (a8af0e1 -> 5f5efc7)

2020-02-05 Thread bhulette
This is an automated email from the ASF dual-hosted git repository.

bhulette pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from a8af0e1  Merge pull request #10761 from y1chi/fix_doc
 add 5f5efc7  [BEAM-9037] Instant and duration as logical type (#10486)

No new revisions were added by this update.

Summary of changes:
 .../sdk/schemas/logicaltypes/NanosDuration.java}   | 29 
 .../sdk/schemas/logicaltypes/NanosInstant.java}| 29 
 .../beam/sdk/schemas/logicaltypes/NanosType.java}  | 39 +++
 .../sdk/schemas/logicaltypes/LogicalTypesTest.java | 26 
 .../extensions/protobuf/ProtoByteBuddyUtils.java   | 10 ++-
 .../protobuf/ProtoSchemaLogicalTypes.java  | 77 --
 .../extensions/protobuf/ProtoSchemaTranslator.java |  8 +--
 .../sdk/extensions/protobuf/TestProtoSchemas.java  | 12 ++--
 8 files changed, 105 insertions(+), 125 deletions(-)
 copy 
sdks/java/{extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/CheckSize.java
 => 
core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosDuration.java} 
(60%)
 copy 
sdks/java/{extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/CheckSize.java
 => 
core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosInstant.java} 
(59%)
 copy 
sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTable.java
 => core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosType.java} 
(55%)



[beam] branch asf-site updated: Publishing website 2020/02/06 01:18:56 at commit a8af0e1

2020-02-05 Thread git-site-role
This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/asf-site by this push:
 new 6ad0c23  Publishing website 2020/02/06 01:18:56 at commit a8af0e1
6ad0c23 is described below

commit 6ad0c23c530bae840823f1266cf8c15163fa4ea3
Author: jenkins 
AuthorDate: Thu Feb 6 01:18:56 2020 +

Publishing website 2020/02/06 01:18:56 at commit a8af0e1
---
 .../documentation/programming-guide/index.html | 24 ++
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git 
a/website/generated-content/documentation/programming-guide/index.html 
b/website/generated-content/documentation/programming-guide/index.html
index 178854b..7d1dfae 100644
--- a/website/generated-content/documentation/programming-guide/index.html
+++ b/website/generated-content/documentation/programming-guide/index.html
@@ -3263,10 +3263,6 @@ elements.
 
 7.4.1. Managing late data
 
-
-  Note: Managing late data is not supported in the Beam 
SDK for Python.
-
-
 You can allow late data by invoking the .withAllowedLateness operation when
 you set your PCollection’s windowing 
strategy. The following code example
 demonstrates a windowing strategy that will allow late data up to two days 
after
@@ -3278,6 +3274,14 @@ the end of a window.
   .withAllowedLateness(Duration.standardDays(2)));
 
 
+   pc = 
[Initial PCollection]
+   pc | beam.WindowInto(
+  FixedWindows(60),
+  trigger=trigger_fn,
+  accumulation_mode=accumulation_mode,
+  timestamp_combiner=timestamp_combiner,
+  allowed_lateness=Duration(seconds=2*24*60*60)) # 2 days
+
 When you set .withAllowedLateness on 
a PCollection, that allowed lateness
 propagates forward to any subsequent PCollection derived from the first
 PCollection you applied allowed 
lateness to. If you want to change the allowed
@@ -3555,10 +3559,6 @@ on each firing:
 
 8.4.2. Handling late data
 
-
-  The Beam SDK for Python does not currently support allowed lateness.
-
-
 If you want your pipeline to process data that arrives after the watermark
 passes the end of the window, you can apply an allowed lateness when 
you set
 your windowing configuration. This gives your trigger the opportunity to react
@@ -3574,7 +3574,13 @@ windowing function:
  .plusDelayOf(Duration.standardMinutes(1)))
   .withAllowedLateness(Duration.standardMinutes(30));
 
-  # The Beam SDK for Python does not 
currently support allowed lateness.
+  pc = 
[Initial PCollection]
+  pc | beam.WindowInto(
+FixedWindows(60),
+trigger=AfterProcessingTime(60),
+allowed_lateness=1800) # 30 minutes
+ | ...
+  
 
 
 This allowed lateness propagates to all PCollections derived as a result of



[beam] branch master updated: Remove managing late data not supported by python sdk note

2020-02-05 Thread goenka
This is an automated email from the ASF dual-hosted git repository.

goenka pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
 new d852114  Remove managing late data not supported by python sdk note
 new a8af0e1  Merge pull request #10761 from y1chi/fix_doc
d852114 is described below

commit d85211428f5e39ba59be72ec11510455e89e329e
Author: Yichi Zhang 
AuthorDate: Mon Feb 3 18:09:45 2020 -0800

Remove managing late data not supported by python sdk note
---
 website/src/documentation/programming-guide.md | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/website/src/documentation/programming-guide.md 
b/website/src/documentation/programming-guide.md
index 510d7cd..ca12d67 100644
--- a/website/src/documentation/programming-guide.md
+++ b/website/src/documentation/programming-guide.md
@@ -2577,7 +2577,6 @@ elements.
 
  7.4.1. Managing late data {#managing-late-data}
 
-> **Note:** Managing late data is not supported in the Beam SDK for Python.
 
 You can allow late data by invoking the `.withAllowedLateness` operation when
 you set your `PCollection`'s windowing strategy. The following code example
@@ -2591,6 +2590,15 @@ the end of a window.
   .withAllowedLateness(Duration.standardDays(2)));
 ```
 
+```py
+   pc = [Initial PCollection]
+   pc | beam.WindowInto(
+  FixedWindows(60),
+  trigger=trigger_fn,
+  accumulation_mode=accumulation_mode,
+  timestamp_combiner=timestamp_combiner,
+  allowed_lateness=Duration(seconds=2*24*60*60)) # 2 days
+```
 When you set `.withAllowedLateness` on a `PCollection`, that allowed lateness
 propagates forward to any subsequent `PCollection` derived from the first
 `PCollection` you applied allowed lateness to. If you want to change the 
allowed
@@ -2858,7 +2866,6 @@ on each firing:
 
  8.4.2. Handling late data {#handling-late-data}
 
-> The Beam SDK for Python does not currently support allowed lateness.
 
 If you want your pipeline to process data that arrives after the watermark
 passes the end of the window, you can apply an *allowed lateness* when you set
@@ -2877,7 +2884,13 @@ windowing function:
   
.withAllowedLateness(Duration.standardMinutes(30));
 ```
 ```py
-  # The Beam SDK for Python does not currently support allowed lateness.
+  pc = [Initial PCollection]
+  pc | beam.WindowInto(
+FixedWindows(60),
+trigger=AfterProcessingTime(60),
+allowed_lateness=1800) # 30 minutes
+ | ...
+  
 ```
 
 This allowed lateness propagates to all `PCollection`s derived as a result of
@@ -3107,4 +3120,4 @@ public class MyMetricsDoFn extends DoFn 
{
 context.output(context.element());
   }
 }
-```  
\ No newline at end of file
+```  



[beam] branch master updated (0b6b8f2 -> b800652)

2020-02-05 Thread robertwb
This is an automated email from the ASF dual-hosted git repository.

robertwb pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 0b6b8f2  Merge pull request #10775: [BEAM-9163] update 
sphinx_rtd_theme to newest
 add e15d33f  [BEAM-8271] Properly encode/decode StateGetRequest/Response 
continuation_token
 add b800652  [BEAM-8271] Properly type StateGetRequest/Response 
continuation token.

No new revisions were added by this update.

Summary of changes:
 sdks/python/apache_beam/runners/portability/fn_api_runner.py | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)



[beam] branch master updated (1f94c99 -> 0b6b8f2)

2020-02-05 Thread udim
This is an automated email from the ASF dual-hosted git repository.

udim pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 1f94c99  [BEAM-6703] Make Dataflow ValidatesRunner test use Java 11 in 
test execution (#10689)
 add d2594d8  [BEAM-9163] update sphinx_rtd_theme to newest
 add 0b6b8f2  Merge pull request #10775: [BEAM-9163] update 
sphinx_rtd_theme to newest

No new revisions were added by this update.

Summary of changes:
 sdks/python/tox.ini | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[beam] branch master updated (c0bec87 -> 1f94c99)

2020-02-05 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from c0bec87  [BEAM-5605] Migrate splittable DoFn methods to use "new" DoFn 
style argument providing. (#10702)
 add 1f94c99  [BEAM-6703] Make Dataflow ValidatesRunner test use Java 11 in 
test execution (#10689)

No new revisions were added by this update.

Summary of changes:
 ...mit_Java_ValidatesRunner_Dataflow_Java11.groovy | 28 +-
 runners/google-cloud-dataflow-java/build.gradle| 33 --
 2 files changed, 21 insertions(+), 40 deletions(-)



[beam-site] branch revert-597-updates_release_2.19.0 created (now cf22724)

2020-02-05 Thread boyuanz
This is an automated email from the ASF dual-hosted git repository.

boyuanz pushed a change to branch revert-597-updates_release_2.19.0
in repository https://gitbox.apache.org/repos/asf/beam-site.git.


  at cf22724  Revert "Publish 2.19.0 release"

This branch includes the following new commits:

 new cf22724  Revert "Publish 2.19.0 release"

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[beam] branch master updated (4c92739 -> c0bec87)

2020-02-05 Thread lcwik
This is an automated email from the ASF dual-hosted git repository.

lcwik pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from 4c92739  Merge pull request #10774: Fix AvroIO javadoc for deprecated 
methods
 add c0bec87  [BEAM-5605] Migrate splittable DoFn methods to use "new" DoFn 
style argument providing. (#10702)

No new revisions were added by this update.

Summary of changes:
 .../runners/core/construction/SplittableParDo.java |  65 ++-
 .../construction/SplittableParDoNaiveBounded.java  |  25 +-
 .../core/construction/PTransformMatchersTest.java  |   4 +-
 .../core/construction/ParDoTranslationTest.java|   4 +-
 .../core/construction/SplittableParDoTest.java |   4 +-
 .../graph/SplittableParDoExpanderTest.java |   6 +-
 ...TimeBoundedSplittableProcessElementInvoker.java |   5 +
 .../apache/beam/runners/core/SimpleDoFnRunner.java |  21 +
 .../core/SplittableParDoViaKeyedWorkItems.java |  14 +-
 ...BoundedSplittableProcessElementInvokerTest.java |   6 +-
 .../runners/core/SplittableParDoProcessFnTest.java |  12 +-
 .../dataflow/DataflowPipelineTranslatorTest.java   |   2 +-
 .../java/org/apache/beam/sdk/transforms/DoFn.java  | 169 +--
 .../org/apache/beam/sdk/transforms/DoFnTester.java |   6 +
 .../java/org/apache/beam/sdk/transforms/Watch.java |  10 +-
 .../reflect/ByteBuddyDoFnInvokerFactory.java   | 103 -
 .../beam/sdk/transforms/reflect/DoFnInvoker.java   | 273 ---
 .../beam/sdk/transforms/reflect/DoFnSignature.java |  71 ++-
 .../sdk/transforms/reflect/DoFnSignatures.java | 510 -
 .../splittabledofn/RestrictionTracker.java |   4 +-
 .../beam/sdk/transforms/SplittableDoFnTest.java|  26 +-
 .../sdk/transforms/reflect/DoFnInvokersTest.java   | 130 --
 .../reflect/DoFnSignaturesSplittableDoFnTest.java  | 262 ---
 .../apache/beam/fn/harness/FnApiDoFnRunner.java| 245 +-
 .../beam/fn/harness/FnApiDoFnRunnerTest.java   |   7 +-
 .../beam/sdk/io/hbase/HBaseReadSplittableDoFn.java |   8 +-
 26 files changed, 1362 insertions(+), 630 deletions(-)



[beam] branch master updated (f314eb6 -> 4c92739)

2020-02-05 Thread iemejia
This is an automated email from the ASF dual-hosted git repository.

iemejia pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git.


from f314eb6  Merge pull request #10773: [BEAM-9251] Fix 
:sdks:java:io:kafka:updateOfflineRepository
 add 03d3ec9  Fix AvroIO javadoc for deprecated methods
 add 4c92739  Merge pull request #10774: Fix AvroIO javadoc for deprecated 
methods

No new revisions were added by this update.

Summary of changes:
 .../core/src/main/java/org/apache/beam/sdk/io/AvroIO.java| 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)