[3/3] beam-site git commit: This closes #146
This closes #146 Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/221f388d Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/221f388d Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/221f388d Branch: refs/heads/asf-site Commit: 221f388d3fd75af1e2a4ae2e59815bb96d292fbb Parents: e1dd10a 6ebcb08 Author: Frances Perry <f...@google.com> Authored: Wed Feb 8 10:54:44 2017 -0800 Committer: Frances Perry <f...@google.com> Committed: Wed Feb 8 10:54:44 2017 -0800 -- .../documentation/programming-guide/index.html | 327 +++--- src/documentation/programming-guide.md | 341 +++ 2 files changed, 541 insertions(+), 127 deletions(-) --
[2/3] beam-site git commit: Regenerate website
Regenerate website Project: http://git-wip-us.apache.org/repos/asf/beam-site/repo Commit: http://git-wip-us.apache.org/repos/asf/beam-site/commit/6ebcb08c Tree: http://git-wip-us.apache.org/repos/asf/beam-site/tree/6ebcb08c Diff: http://git-wip-us.apache.org/repos/asf/beam-site/diff/6ebcb08c Branch: refs/heads/asf-site Commit: 6ebcb08cb503a3e58101fc73be11649116111c65 Parents: f277339 Author: Frances Perry <f...@google.com> Authored: Wed Feb 8 10:53:36 2017 -0800 Committer: Frances Perry <f...@google.com> Committed: Wed Feb 8 10:53:36 2017 -0800 -- .../documentation/programming-guide/index.html | 327 --- 1 file changed, 274 insertions(+), 53 deletions(-) -- http://git-wip-us.apache.org/repos/asf/beam-site/blob/6ebcb08c/content/documentation/programming-guide/index.html -- diff --git a/content/documentation/programming-guide/index.html b/content/documentation/programming-guide/index.html index dee4869..9830735 100644 --- a/content/documentation/programming-guide/index.html +++ b/content/documentation/programming-guide/index.html @@ -208,7 +208,7 @@ PCollection: A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism. Your pipeline typically creates an initial PCollection by reading data from an external data source, but you can also create a PCollection from in-memory data within your driver program. From there, PCollections are the inputs and outputs for each step in your pipeline. -Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, performs a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. +Transform: A Transform represents a data processing operation, or a step, in your pipeline. Every Transform takes one or more PCollection objects as input, perfroms a processing function that you provide on the elements of that PCollection, and produces one or more output PCollection objects. I/O Source and Sink: Beam provides Source and Sink APIs to represent reading and writing data, respectively. Source encapsulates the code necessary to read data into your Beam pipeline from some external source, such as cloud file storage or a subscription to a streaming data source. Sink likewise encapsulates the code necessary to write the elements of a PCollection to an external data sink. @@ -248,11 +248,13 @@ -from apache_beam.utils.pipeline_options import PipelineOptions - -# Will parse the arguments passed into the application and construct a PipelineOptions +# Will parse the arguments passed into the application and construct a PipelineOptions object. # Note that --help will print registered options. + +from apache_beam.utils.pipeline_options import PipelineOptions + p = beam.Pipeline(options=PipelineOptions()) + @@ -286,13 +288,8 @@ -import apache_beam as beam - -# Create the pipeline. -p = beam.Pipeline() +lines = p | 'ReadMyFile' beam.io.ReadFromText('gs://some/inputData.txt') -# Read the text file into a PCollection. -lines = p | 'ReadMyFile' beam.io.Read(beam.io.TextFileSource("protocol://path/to/some/inputData.txt")) @@ -327,20 +324,18 @@ -import apache_beam as beam +p = beam.Pipeline(options=pipeline_options) -# python list -lines = [ - "To be, or not to be: that is the question: ", - "Whether 'tis nobler in the mind to suffer ", - "The slings and arrows of outrageous fortune, ", - "Or to take arms against a sea of troubles, " -] +(p + | beam.Create([ + 'To be, or not to be: that is the question: ', + 'Whether \'tis nobler in the mind to suffer ', + 'The slings and arrows of outrageous fortune, ', + 'Or to take arms against a sea of troubles, ']) + | beam.io.WriteToText(my_options.output)) -# Create the pipeline. -p = beam.Pipeline() +result = p.run() -collection = p | 'ReadMyLines' beam.Create(lines) @@ -401,8 +396,8 @@ How you apply your pipelineâs transforms determines the structure of your pipeline. The best way to think of your pipeline is as a directed acyclic graph, where the nodes are PCollections and the edges are transforms. For example, you can chain transforms to create a sequential pipeline, like this one: [Final Output PCollection] = [Initial Input PCollection].apply([First Transform]) - .apply([Second Transform]) -
[jira] [Created] (BEAM-1556) Spark executors need to register IO factories
Frances Perry created BEAM-1556: --- Summary: Spark executors need to register IO factories Key: BEAM-1556 URL: https://issues.apache.org/jira/browse/BEAM-1556 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Frances Perry Assignee: Amit Sela The Spark executors need to call IOChannelUtils.registerIOFactories(options) in order to support GCS file and make the default WordCount example work. Context in this thread: https://lists.apache.org/thread.html/469a139c9eb07e64e514cdea42ab8000678ab743794a090c365205d7@%3Cuser.beam.apache.org%3E -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-370) Remove the .named() methods from PTransforms and sub-classes
[ https://issues.apache.org/jira/browse/BEAM-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15771289#comment-15771289 ] Frances Perry commented on BEAM-370: Can this issue be closed? > Remove the .named() methods from PTransforms and sub-classes > > > Key: BEAM-370 > URL: https://issues.apache.org/jira/browse/BEAM-370 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: Ben Chambers >Assignee: Ben Chambers >Priority: Minor > Labels: backward-incompatible > > 1. Update examples/tests/etc. to use named application instead of `.named()` > 2. Remove the `.named()` methods from composite PTransforms > 3. Where appropriate, use the the PTransform constructor which takes a string > to use as the default name. > See further discussion in the related thread > (http://mail-archives.apache.org/mod_mbox/incubator-beam-dev/201606.mbox/%3ccan-7fgzuz1f_szzd2orfyd2pk2_prymhgwjepjpefp01h7s...@mail.gmail.com%3E). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-501) Update website skin
[ https://issues.apache.org/jira/browse/BEAM-501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958315#comment-15958315 ] Frances Perry commented on BEAM-501: Checked with JB that he's not actively working on this right now. Reassigning to Jeremy, who has some great thoughts ;-) > Update website skin > --- > > Key: BEAM-501 > URL: https://issues.apache.org/jira/browse/BEAM-501 > Project: Beam > Issue Type: Improvement > Components: website > Reporter: Frances Perry >Assignee: Jeremy Weinstein > > Update the main landing page and website skin as discussed here > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (BEAM-501) Update website skin
[ https://issues.apache.org/jira/browse/BEAM-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-501: -- Assignee: Jeremy Weinstein (was: Jean-Baptiste Onofré) > Update website skin > --- > > Key: BEAM-501 > URL: https://issues.apache.org/jira/browse/BEAM-501 > Project: Beam > Issue Type: Improvement > Components: website > Reporter: Frances Perry >Assignee: Jeremy Weinstein > > Update the main landing page and website skin as discussed here > https://docs.google.com/document/d/1-0jMv7NnYp0Ttt4voulUMwVe_qjBYeNMLm2LusYF3gQ/edit > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1247) Session state should not be lost when discardingFiredPanes
[ https://issues.apache.org/jira/browse/BEAM-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977091#comment-15977091 ] Frances Perry commented on BEAM-1247: - Any updates on this one? > Session state should not be lost when discardingFiredPanes > -- > > Key: BEAM-1247 > URL: https://issues.apache.org/jira/browse/BEAM-1247 > Project: Beam > Issue Type: Bug > Components: beam-model, runner-core >Reporter: Kenneth Knowles >Assignee: Kenneth Knowles >Priority: Critical > Labels: backward-incompatible > Fix For: First stable release > > > Today when {{discardingFiredPanes}} the entirety of state is cleared, > including the state of evolving sessions. This means that with multiple > triggerings a single session shows up as multiple. This also stymies > downstream stateful computations. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-170) Session windows should not be identified by their bounds
[ https://issues.apache.org/jira/browse/BEAM-170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977095#comment-15977095 ] Frances Perry commented on BEAM-170: Ilya, should we find a new owner for this? > Session windows should not be identified by their bounds > > > Key: BEAM-170 > URL: https://issues.apache.org/jira/browse/BEAM-170 > Project: Beam > Issue Type: Bug > Components: sdk-java-core >Reporter: Kenneth Knowles >Assignee: Ilya Ganelin > Labels: backward-incompatible > Fix For: First stable release > > > Today, if two session windows for the same key have the same bounds, they are > considered the same window. This is an accident. It is not intended that any > session windows are considered equal except via the operation of merging them > into the same session. > A risk associated with this behavior is that two windows that happen to > coincide will share per-window-and-key state rather than evolving separately > and having their separate state reconciled by state merging logic. These code > paths are not required to be coherent, and in practice they are not. > In particular, if the trigger for a session window ever finishes, then > subsequent data in a window with the same bounds will be dropped, whereas if > it had differed by a millisecond it would have created a new session, > ignoring the previously closed session. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-622) Add checkpointing tests for DoFnOperator and WindowDoFnOperator
[ https://issues.apache.org/jira/browse/BEAM-622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15977100#comment-15977100 ] Frances Perry commented on BEAM-622: Is this a blocker for the first stable release? Is there a good owner if so? > Add checkpointing tests for DoFnOperator and WindowDoFnOperator > > > Key: BEAM-622 > URL: https://issues.apache.org/jira/browse/BEAM-622 > Project: Beam > Issue Type: Test > Components: runner-flink >Affects Versions: 0.3.0-incubating >Reporter: Maximilian Michels > Fix For: First stable release > > > Tests which test the correct snapshotting of these two operators are missing. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1624) Unable to deserialize Coder in DataflowRunner
Frances Perry created BEAM-1624: --- Summary: Unable to deserialize Coder in DataflowRunner Key: BEAM-1624 URL: https://issues.apache.org/jira/browse/BEAM-1624 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Frances Perry Assignee: Davor Bonaci Priority: Blocker Fix For: 0.6.0 To repro, sync to head and run the LeaderBoard example with the Dataflow runner Does not repro in 0.5. Caused by: java.lang.RuntimeException: Unable to deserialize Coder: WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder). Check that a suitable constructor is defined. See Coder for details. at org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231) at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206) at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363) at org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505) at org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210) at org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340) ... 6 more Caused by: java.lang.RuntimeException: Unable to deserialize class interface org.apache.beam.sdk.coders.Coder at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102) at org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112) ... 29 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (BEAM-1624) Unable to deserialize Coder in DataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-1624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896112#comment-15896112 ] Frances Perry commented on BEAM-1624: - [~altay] FYI -- considering this 0.6 release blocking until triaged. > Unable to deserialize Coder in DataflowRunner > - > > Key: BEAM-1624 > URL: https://issues.apache.org/jira/browse/BEAM-1624 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Frances Perry >Assignee: Davor Bonaci >Priority: Blocker > Fix For: 0.6.0 > > > To repro, sync to head and run the LeaderBoard example with the Dataflow > runner > Does not repro in 0.5. > Caused by: java.lang.RuntimeException: Unable to deserialize Coder: > WindowedValue$FullWindowedValueCoder(KvCoder(BigQueryIO$ShardedKeyCoder(StringUtf8Coder),BigQueryIO$TableRowInfoCoder),IntervalWindow$IntervalWindowCoder). > Check that a suitable constructor is defined. See Coder for details. > at > org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:115) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:655) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$StepTranslator.addOutput(DataflowPipelineTranslator.java:602) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translateOutputs(DataflowPipelineTranslator.java:945) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.access$1200(DataflowPipelineTranslator.java:111) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translateMultiHelper(DataflowPipelineTranslator.java:836) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:826) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$6.translate(DataflowPipelineTranslator.java:823) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.visitPrimitiveTransform(DataflowPipelineTranslator.java:413) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:486) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:481) > at > org.apache.beam.sdk.runners.TransformHierarchy$Node.access$400(TransformHierarchy.java:231) > at > org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:206) > at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:321) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator$Translator.translate(DataflowPipelineTranslator.java:363) > at > org.apache.beam.runners.dataflow.DataflowPipelineTranslator.translate(DataflowPipelineTranslator.java:153) > at > org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:505) > at > org.apache.beam.runners.dataflow.DataflowRunner.run(DataflowRunner.java:150) > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:210) > at > org.apache.beam.examples.complete.game.GameStats.main(GameStats.java:340) > ... 6 more > Caused by: java.lang.RuntimeException: Unable to deserialize class interface > org.apache.beam.sdk.coders.Coder > at org.apache.beam.sdk.util.Serializer.deserialize(Serializer.java:102) > at > org.apache.beam.sdk.util.SerializableUtils.ensureSerializable(SerializableUtils.java:112) > ... 29 more -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (BEAM-1627) Composite/DisplayData structure changed
Frances Perry created BEAM-1627: --- Summary: Composite/DisplayData structure changed Key: BEAM-1627 URL: https://issues.apache.org/jira/browse/BEAM-1627 Project: Beam Issue Type: Bug Components: runner-dataflow Reporter: Frances Perry Assignee: Thomas Groh Priority: Blocker Fix For: 0.6.0 When running at head, pipeline composite structure has changed. My guess is this is related to pull/2145. (1) Steps that used to be leaf notes are now expandable composites with a ParMultiDo inside them. (2) For some (but not all) display data appears to be lost This can be seen pretty clearly in the Dataflow monitoring UI. Attached screenshots showing -- ParseGameEvent transform leaks an extra level of composite. -- FixedWindows transform leaks an extra composite and loses display data. [~tgroh] can you triage? [~altay] FYI potential 0.6 release blocker -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (BEAM-1627) Composite/DisplayData structure changed
[ https://issues.apache.org/jira/browse/BEAM-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry updated BEAM-1627: Attachment: ParseGame-0.5.png ParseGame-snapshot-extraComposite.png FixedWindows-0.5.png FixedWindows-snapshot-extraComposite-noDisplayData.png > Composite/DisplayData structure changed > --- > > Key: BEAM-1627 > URL: https://issues.apache.org/jira/browse/BEAM-1627 > Project: Beam > Issue Type: Bug > Components: runner-dataflow > Reporter: Frances Perry >Assignee: Thomas Groh >Priority: Blocker > Fix For: 0.6.0 > > Attachments: FixedWindows-0.5.png, > FixedWindows-snapshot-extraComposite-noDisplayData.png, ParseGame-0.5.png, > ParseGame-snapshot-extraComposite.png > > > When running at head, pipeline composite structure has changed. My guess is > this is related to pull/2145. > (1) Steps that used to be leaf notes are now expandable composites with a > ParMultiDo inside them. > (2) For some (but not all) display data appears to be lost > This can be seen pretty clearly in the Dataflow monitoring UI. Attached > screenshots showing > -- ParseGameEvent transform leaks an extra level of composite. > -- FixedWindows transform leaks an extra composite and loses display data. > [~tgroh] can you triage? > [~altay] FYI potential 0.6 release blocker -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (BEAM-1069) Add CountingInput Transform to python sdk
[ https://issues.apache.org/jira/browse/BEAM-1069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-1069: --- Assignee: Tibor Kiss (was: Frances Perry) > Add CountingInput Transform to python sdk > - > > Key: BEAM-1069 > URL: https://issues.apache.org/jira/browse/BEAM-1069 > Project: Beam > Issue Type: Improvement > Components: sdk-py >Reporter: Vikas Kedigehalli >Assignee: Tibor Kiss >Priority: Minor > Labels: starter > > Similar to java sdk, > https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/CountingInput.java -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (BEAM-1976) Allow only one runner profile active at once in examples and archetypes
[ https://issues.apache.org/jira/browse/BEAM-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-1976: --- Assignee: (was: Frances Perry) > Allow only one runner profile active at once in examples and archetypes > --- > > Key: BEAM-1976 > URL: https://issues.apache.org/jira/browse/BEAM-1976 > Project: Beam > Issue Type: Sub-task > Components: examples-java >Reporter: Aviem Zur > > Since only one SLF4J logger binding is allowed in the classpath, we shouldn't > allow more than one runner profile to be active at once in our > examples/archetype modules since different runners use different bindings. > Also, remove slf4j-jdk14 dependency from root and place it instead in > direct-runner and dataflow-runner profiles, for the same reason. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (BEAM-91) Retractions
[ https://issues.apache.org/jira/browse/BEAM-91?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-91: - Assignee: (was: Frances Perry) > Retractions > --- > > Key: BEAM-91 > URL: https://issues.apache.org/jira/browse/BEAM-91 > Project: Beam > Issue Type: New Feature > Components: beam-model >Reporter: Tyler Akidau > Original Estimate: 672h > Remaining Estimate: 672h > > We still haven't added retractions to Beam, even though they're a core part > of the model. We should document all the necessary aspects (uncombine, > reverting DoFn output with DoOvers, sink integration, source-level > retractions, etc), and then implement them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (BEAM-2450) Transform names and named applications should not be null or empty
[ https://issues.apache.org/jira/browse/BEAM-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry reassigned BEAM-2450: --- Assignee: (was: Frances Perry) > Transform names and named applications should not be null or empty > -- > > Key: BEAM-2450 > URL: https://issues.apache.org/jira/browse/BEAM-2450 > Project: Beam > Issue Type: Bug > Components: beam-model, sdk-java-core, sdk-py >Reporter: Scott Wegner >Priority: Minor > > Beam SDK allows setting the name of a transform [1] and also naming the > transform application [2]. If no name is specified on application, the name > of the transform is used. If no name is specified for the transform, the > class name is used. > The application name serves as metadata for the applied PTransforms in the > constructed graph. The are effectively extra display data (historically, > PTransform names predate display data). The names are used by runners for UI > and monitoring applications, such as the displayed pipeline graph in the > Dataflow Monitoring UI [3]. > Currently there is no explicit validation on the specified application name. > The current behavior seems to be: > * null application names cause a NullPointerException at construction time. > * Specifying the empty string compiles and succeeds in the DirectRunner, but > causes strange behavior in Dataflow when rendering the graph in the UI. I > have not tested the behavior of other runners. > We should add explicit validation in the model on the specified transform > name and application name. I propose that we disallow null and empty names. > This is technically a breaking change as the SDK currently allows the empty > string, but only because it is under-specified. The upgrade path for any > pipelines broken by this change is simple: specify a non-empty name or > fallback to the default class name. > [1] > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L236 > [2] > https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/PCollection.java#L295 > [3] > https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf#viewing-a-pipeline -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (BEAM-193) Port existing Dataflow SDK documentation to Beam Programming Guide
[ https://issues.apache.org/jira/browse/BEAM-193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frances Perry closed BEAM-193. -- Resolution: Fixed Fix Version/s: Not applicable > Port existing Dataflow SDK documentation to Beam Programming Guide > -- > > Key: BEAM-193 > URL: https://issues.apache.org/jira/browse/BEAM-193 > Project: Beam > Issue Type: Task > Components: website >Reporter: Devin Donnelly >Assignee: Melissa Pashniak > Fix For: Not applicable > > > There is an extensive amount of documentation on the Dataflow SDK programming > model and classes. Port this documentation over as a new Beam Programming > Guide covering the following major topics: > - Programming model overview > - Pipeline structure > - PCollections > - Transforms > - I/O -- This message was sent by Atlassian JIRA (v6.3.15#6346)