[beam] branch master updated: [BEAM-8550] @RequiresTimeSortedInput: working with legacy flink and spark
This is an automated email from the ASF dual-hosted git repository. janl pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new 377f1ac [BEAM-8550] @RequiresTimeSortedInput: working with legacy flink and spark new 041f7af Merge pull request #8774 from je-ik/requires-time-sorted-input-draft: [BEAM-8550] Requires time sorted input 377f1ac is described below commit 377f1ac7ebbc4253299e7efbdb3ad58d0c9e14c5 Author: Jan Lukavsky AuthorDate: Thu Jan 30 13:10:31 2020 +0100 [BEAM-8550] @RequiresTimeSortedInput: working with legacy flink and spark --- .gitignore | 1 + .../pipeline/src/main/proto/beam_runner_api.proto | 4 +- .../translation/operators/ApexParDoOperator.java | 8 +- .../core/construction/ParDoTranslation.java| 8 + .../runners/core/construction/SplittableParDo.java | 5 + .../org/apache/beam/runners/core/DoFnRunners.java | 60 - .../apache/beam/runners/core/SimpleDoFnRunner.java | 4 +- .../beam/runners/core/StatefulDoFnRunner.java | 172 ++--- .../SimplePushbackSideInputDoFnRunnerTest.java | 26 +- .../beam/runners/core/StatefulDoFnRunnerTest.java | 285 ++--- .../apache/beam/runners/direct/ParDoEvaluator.java | 32 ++- .../runners/direct/ParDoMultiOverrideFactory.java | 73 +++--- .../beam/runners/direct/QuiescenceDriver.java | 2 +- .../direct/StatefulParDoEvaluatorFactoryTest.java | 149 +-- .../FlinkBatchPortablePipelineTranslator.java | 11 + .../flink/FlinkBatchTransformTranslators.java | 28 +- .../FlinkStreamingPortablePipelineTranslator.java | 1 - .../flink/FlinkStreamingTransformTranslators.java | 8 - .../utils/FlinkPortableRunnerUtils.java| 58 + .../wrappers/streaming/DoFnOperator.java | 59 +++-- .../streaming/ExecutableStageDoFnOperator.java | 114 + .../wrappers/streaming/SplittableDoFnOperator.java | 6 +- .../wrappers/streaming/WindowDoFnOperator.java | 4 +- .../runners/flink/FlinkPipelineOptionsTest.java| 2 - .../wrappers/streaming/DoFnOperatorTest.java | 21 -- .../streaming/ExecutableStageDoFnOperatorTest.java | 6 +- .../dataflow/PrimitiveParDoSingleFactory.java | 5 + .../runners/samza/runtime/SamzaDoFnRunners.java| 5 +- .../beam/runners/spark/coders/CoderHelpers.java| 47 .../spark/translation/TransformTranslator.java | 175 +++-- .../spark/translation/TransformTranslatorTest.java | 106 .../apache/beam/sdk/runners/AppliedPTransform.java | 9 + .../sdk/testing/UsesRequiresTimeSortedInput.java | 27 ++ .../java/org/apache/beam/sdk/transforms/DoFn.java | 27 ++ .../beam/sdk/transforms/reflect/DoFnSignature.java | 8 + .../sdk/transforms/reflect/DoFnSignatures.java | 2 + .../org/apache/beam/sdk/transforms/ParDoTest.java | 191 +- 37 files changed, 1492 insertions(+), 257 deletions(-) diff --git a/.gitignore b/.gitignore index 5732b9c..f030006 100644 --- a/.gitignore +++ b/.gitignore @@ -8,6 +8,7 @@ # Ignore files generated by the Gradle build process. **/.gradle/**/* **/.gogradle/**/* +**/.nb-gradle/**/* **/gogradle.lock **/build/**/* .test-infra/**/vendor/**/* diff --git a/model/pipeline/src/main/proto/beam_runner_api.proto b/model/pipeline/src/main/proto/beam_runner_api.proto index 57c5295..81e4d2d 100644 --- a/model/pipeline/src/main/proto/beam_runner_api.proto +++ b/model/pipeline/src/main/proto/beam_runner_api.proto @@ -175,7 +175,6 @@ message StandardPTransforms { enum Primitives { // Represents Beam's parallel do operation. // Payload: ParDoPayload. -// TODO(BEAM-3595): Change this to beam:transform:pardo:v1. PAR_DO = 0 [(beam_urn) = "beam:transform:pardo:v1"]; // Represents Beam's flatten operation. @@ -398,6 +397,9 @@ message ParDoPayload { // (Optional) A mapping of local timer family names to timer specifications. map timer_family_specs = 9; + + // Whether this stage requires time sorted input + bool requires_time_sorted_input = 10; } // Parameters that a UDF might require. diff --git a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java index 4841c6a..8df7997 100644 --- a/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java +++ b/runners/apex/src/main/java/org/apache/beam/runners/apex/translation/operators/ApexParDoOperator.java @@ -511,7 +511,13 @@ public class ApexParDoOperator extends BaseOperator doFnRunner = DoFnRunners.defaultStatefulDoFnRunner( - doFn, doFnRunner, windowingStrategy, cleanupTimer, stateCleaner); + doFn, + inputCoder, +
[beam] branch master updated (5f5efc7 -> 00b49f2)
This is an automated email from the ASF dual-hosted git repository. lcwik pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 5f5efc7 [BEAM-9037] Instant and duration as logical type (#10486) add 00b49f2 [BEAM-2645] Define the display data model type No new revisions were added by this update. Summary of changes: .../pipeline/src/main/proto/beam_runner_api.proto | 112 +++-- .../core/construction/DisplayDataTranslation.java | 61 --- .../core/construction/PCollectionTranslation.java | 1 - .../core/construction/PTransformTranslation.java | 2 +- .../construction/DisplayDataTranslationTest.java | 67 5 files changed, 174 insertions(+), 69 deletions(-) create mode 100644 runners/core-construction-java/src/test/java/org/apache/beam/runners/core/construction/DisplayDataTranslationTest.java
[beam] branch master updated (a8af0e1 -> 5f5efc7)
This is an automated email from the ASF dual-hosted git repository. bhulette pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from a8af0e1 Merge pull request #10761 from y1chi/fix_doc add 5f5efc7 [BEAM-9037] Instant and duration as logical type (#10486) No new revisions were added by this update. Summary of changes: .../sdk/schemas/logicaltypes/NanosDuration.java} | 29 .../sdk/schemas/logicaltypes/NanosInstant.java}| 29 .../beam/sdk/schemas/logicaltypes/NanosType.java} | 39 +++ .../sdk/schemas/logicaltypes/LogicalTypesTest.java | 26 .../extensions/protobuf/ProtoByteBuddyUtils.java | 10 ++- .../protobuf/ProtoSchemaLogicalTypes.java | 77 -- .../extensions/protobuf/ProtoSchemaTranslator.java | 8 +-- .../sdk/extensions/protobuf/TestProtoSchemas.java | 12 ++-- 8 files changed, 105 insertions(+), 125 deletions(-) copy sdks/java/{extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/CheckSize.java => core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosDuration.java} (60%) copy sdks/java/{extensions/sql/src/test/java/org/apache/beam/sdk/extensions/sql/impl/rel/CheckSize.java => core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosInstant.java} (59%) copy sdks/java/{extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/meta/provider/test/TestTable.java => core/src/main/java/org/apache/beam/sdk/schemas/logicaltypes/NanosType.java} (55%)
[beam] branch asf-site updated: Publishing website 2020/02/06 01:18:56 at commit a8af0e1
This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/asf-site by this push: new 6ad0c23 Publishing website 2020/02/06 01:18:56 at commit a8af0e1 6ad0c23 is described below commit 6ad0c23c530bae840823f1266cf8c15163fa4ea3 Author: jenkins AuthorDate: Thu Feb 6 01:18:56 2020 + Publishing website 2020/02/06 01:18:56 at commit a8af0e1 --- .../documentation/programming-guide/index.html | 24 ++ 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/website/generated-content/documentation/programming-guide/index.html b/website/generated-content/documentation/programming-guide/index.html index 178854b..7d1dfae 100644 --- a/website/generated-content/documentation/programming-guide/index.html +++ b/website/generated-content/documentation/programming-guide/index.html @@ -3263,10 +3263,6 @@ elements. 7.4.1. Managing late data - - Note: Managing late data is not supported in the Beam SDK for Python. - - You can allow late data by invoking the .withAllowedLateness operation when you set your PCollection’s windowing strategy. The following code example demonstrates a windowing strategy that will allow late data up to two days after @@ -3278,6 +3274,14 @@ the end of a window. .withAllowedLateness(Duration.standardDays(2))); + pc = [Initial PCollection] + pc | beam.WindowInto( + FixedWindows(60), + trigger=trigger_fn, + accumulation_mode=accumulation_mode, + timestamp_combiner=timestamp_combiner, + allowed_lateness=Duration(seconds=2*24*60*60)) # 2 days + When you set .withAllowedLateness on a PCollection, that allowed lateness propagates forward to any subsequent PCollection derived from the first PCollection you applied allowed lateness to. If you want to change the allowed @@ -3555,10 +3559,6 @@ on each firing: 8.4.2. Handling late data - - The Beam SDK for Python does not currently support allowed lateness. - - If you want your pipeline to process data that arrives after the watermark passes the end of the window, you can apply an allowed lateness when you set your windowing configuration. This gives your trigger the opportunity to react @@ -3574,7 +3574,13 @@ windowing function: .plusDelayOf(Duration.standardMinutes(1))) .withAllowedLateness(Duration.standardMinutes(30)); - # The Beam SDK for Python does not currently support allowed lateness. + pc = [Initial PCollection] + pc | beam.WindowInto( +FixedWindows(60), +trigger=AfterProcessingTime(60), +allowed_lateness=1800) # 30 minutes + | ... + This allowed lateness propagates to all PCollections derived as a result of
[beam] branch master updated: Remove managing late data not supported by python sdk note
This is an automated email from the ASF dual-hosted git repository. goenka pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/beam.git The following commit(s) were added to refs/heads/master by this push: new d852114 Remove managing late data not supported by python sdk note new a8af0e1 Merge pull request #10761 from y1chi/fix_doc d852114 is described below commit d85211428f5e39ba59be72ec11510455e89e329e Author: Yichi Zhang AuthorDate: Mon Feb 3 18:09:45 2020 -0800 Remove managing late data not supported by python sdk note --- website/src/documentation/programming-guide.md | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/website/src/documentation/programming-guide.md b/website/src/documentation/programming-guide.md index 510d7cd..ca12d67 100644 --- a/website/src/documentation/programming-guide.md +++ b/website/src/documentation/programming-guide.md @@ -2577,7 +2577,6 @@ elements. 7.4.1. Managing late data {#managing-late-data} -> **Note:** Managing late data is not supported in the Beam SDK for Python. You can allow late data by invoking the `.withAllowedLateness` operation when you set your `PCollection`'s windowing strategy. The following code example @@ -2591,6 +2590,15 @@ the end of a window. .withAllowedLateness(Duration.standardDays(2))); ``` +```py + pc = [Initial PCollection] + pc | beam.WindowInto( + FixedWindows(60), + trigger=trigger_fn, + accumulation_mode=accumulation_mode, + timestamp_combiner=timestamp_combiner, + allowed_lateness=Duration(seconds=2*24*60*60)) # 2 days +``` When you set `.withAllowedLateness` on a `PCollection`, that allowed lateness propagates forward to any subsequent `PCollection` derived from the first `PCollection` you applied allowed lateness to. If you want to change the allowed @@ -2858,7 +2866,6 @@ on each firing: 8.4.2. Handling late data {#handling-late-data} -> The Beam SDK for Python does not currently support allowed lateness. If you want your pipeline to process data that arrives after the watermark passes the end of the window, you can apply an *allowed lateness* when you set @@ -2877,7 +2884,13 @@ windowing function: .withAllowedLateness(Duration.standardMinutes(30)); ``` ```py - # The Beam SDK for Python does not currently support allowed lateness. + pc = [Initial PCollection] + pc | beam.WindowInto( +FixedWindows(60), +trigger=AfterProcessingTime(60), +allowed_lateness=1800) # 30 minutes + | ... + ``` This allowed lateness propagates to all `PCollection`s derived as a result of @@ -3107,4 +3120,4 @@ public class MyMetricsDoFn extends DoFn { context.output(context.element()); } } -``` \ No newline at end of file +```
[beam] branch master updated (0b6b8f2 -> b800652)
This is an automated email from the ASF dual-hosted git repository. robertwb pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 0b6b8f2 Merge pull request #10775: [BEAM-9163] update sphinx_rtd_theme to newest add e15d33f [BEAM-8271] Properly encode/decode StateGetRequest/Response continuation_token add b800652 [BEAM-8271] Properly type StateGetRequest/Response continuation token. No new revisions were added by this update. Summary of changes: sdks/python/apache_beam/runners/portability/fn_api_runner.py | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-)
[beam] branch master updated (1f94c99 -> 0b6b8f2)
This is an automated email from the ASF dual-hosted git repository. udim pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 1f94c99 [BEAM-6703] Make Dataflow ValidatesRunner test use Java 11 in test execution (#10689) add d2594d8 [BEAM-9163] update sphinx_rtd_theme to newest add 0b6b8f2 Merge pull request #10775: [BEAM-9163] update sphinx_rtd_theme to newest No new revisions were added by this update. Summary of changes: sdks/python/tox.ini | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
[beam] branch master updated (c0bec87 -> 1f94c99)
This is an automated email from the ASF dual-hosted git repository. lcwik pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from c0bec87 [BEAM-5605] Migrate splittable DoFn methods to use "new" DoFn style argument providing. (#10702) add 1f94c99 [BEAM-6703] Make Dataflow ValidatesRunner test use Java 11 in test execution (#10689) No new revisions were added by this update. Summary of changes: ...mit_Java_ValidatesRunner_Dataflow_Java11.groovy | 28 +- runners/google-cloud-dataflow-java/build.gradle| 33 -- 2 files changed, 21 insertions(+), 40 deletions(-)
[beam-site] branch revert-597-updates_release_2.19.0 created (now cf22724)
This is an automated email from the ASF dual-hosted git repository. boyuanz pushed a change to branch revert-597-updates_release_2.19.0 in repository https://gitbox.apache.org/repos/asf/beam-site.git. at cf22724 Revert "Publish 2.19.0 release" This branch includes the following new commits: new cf22724 Revert "Publish 2.19.0 release" The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[beam] branch master updated (4c92739 -> c0bec87)
This is an automated email from the ASF dual-hosted git repository. lcwik pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from 4c92739 Merge pull request #10774: Fix AvroIO javadoc for deprecated methods add c0bec87 [BEAM-5605] Migrate splittable DoFn methods to use "new" DoFn style argument providing. (#10702) No new revisions were added by this update. Summary of changes: .../runners/core/construction/SplittableParDo.java | 65 ++- .../construction/SplittableParDoNaiveBounded.java | 25 +- .../core/construction/PTransformMatchersTest.java | 4 +- .../core/construction/ParDoTranslationTest.java| 4 +- .../core/construction/SplittableParDoTest.java | 4 +- .../graph/SplittableParDoExpanderTest.java | 6 +- ...TimeBoundedSplittableProcessElementInvoker.java | 5 + .../apache/beam/runners/core/SimpleDoFnRunner.java | 21 + .../core/SplittableParDoViaKeyedWorkItems.java | 14 +- ...BoundedSplittableProcessElementInvokerTest.java | 6 +- .../runners/core/SplittableParDoProcessFnTest.java | 12 +- .../dataflow/DataflowPipelineTranslatorTest.java | 2 +- .../java/org/apache/beam/sdk/transforms/DoFn.java | 169 +-- .../org/apache/beam/sdk/transforms/DoFnTester.java | 6 + .../java/org/apache/beam/sdk/transforms/Watch.java | 10 +- .../reflect/ByteBuddyDoFnInvokerFactory.java | 103 - .../beam/sdk/transforms/reflect/DoFnInvoker.java | 273 --- .../beam/sdk/transforms/reflect/DoFnSignature.java | 71 ++- .../sdk/transforms/reflect/DoFnSignatures.java | 510 - .../splittabledofn/RestrictionTracker.java | 4 +- .../beam/sdk/transforms/SplittableDoFnTest.java| 26 +- .../sdk/transforms/reflect/DoFnInvokersTest.java | 130 -- .../reflect/DoFnSignaturesSplittableDoFnTest.java | 262 --- .../apache/beam/fn/harness/FnApiDoFnRunner.java| 245 +- .../beam/fn/harness/FnApiDoFnRunnerTest.java | 7 +- .../beam/sdk/io/hbase/HBaseReadSplittableDoFn.java | 8 +- 26 files changed, 1362 insertions(+), 630 deletions(-)
[beam] branch master updated (f314eb6 -> 4c92739)
This is an automated email from the ASF dual-hosted git repository. iemejia pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/beam.git. from f314eb6 Merge pull request #10773: [BEAM-9251] Fix :sdks:java:io:kafka:updateOfflineRepository add 03d3ec9 Fix AvroIO javadoc for deprecated methods add 4c92739 Merge pull request #10774: Fix AvroIO javadoc for deprecated methods No new revisions were added by this update. Summary of changes: .../core/src/main/java/org/apache/beam/sdk/io/AvroIO.java| 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-)