[jira] [Commented] (BEAM-14) Add data integration DSL
[ https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148117#comment-15148117 ] Frances Perry commented on BEAM-14: --- I think there's a few rough concepts in here that may need model extensions, but general this seems to be about supporting a different DSL on top of the existing model. > Add data integration DSL > > > Key: BEAM-14 > URL: https://issues.apache.org/jira/browse/BEAM-14 > Project: Beam > Issue Type: New Feature > Components: sdk-ideas >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Even if users would still be able to use directly the API, it would be great > to provide a DSL on top of the API covering batch and streaming data > processing but also data integration. > Instead of designing a pipeline as a chain of apply() wrapping function > (DoFn), we can provide a fluent DSL allowing users to directly leverage > keyturn functions. > For instance, an user would be able to design a pipeline like: > {code} > .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”); > {code} > The DSL will allow to use existing pipelines, for instance: > {code} > .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all") > {code} > So it means that we will have to create a IO Sink that can trigger the > execution of a target pipeline: (from("trigger:other") triggering the > pipeline execution when another pipeline design starts with > pipeline("other")). We can also imagine to mix the runners: the pipeline() > can be on one runner, the from("trigger:other") can be on another runner). > It's not trivial, but it will give strong flexibility and key value for Beam. > In a second step, we can provide DSLs in different languages (the first one > would be Java, but why not providing XML, akka, scala DSLs). > We can note in previous examples that the DSL would also provide data > integration support to bean in addition of data processing. Data Integration > is an extension of Beam API to support some Enterprise Integration Patterns > (EIPs). As we would need metadata for data integration (even if metadata can > also be interesting in stream/batch data processing pipeline), we can provide > a DataxMessage built on top of PCollection. A DataxMessage would contain: > structured headers > binary payload > For instance, the headers can contains an Avro schema to describe the payload. > The headers can also contains useful information coming from the IO Source > (for instance the partition/path where the data comes from, …). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-18) Add support for new Dataflow Sink API
Amit Sela created BEAM-18: - Summary: Add support for new Dataflow Sink API Key: BEAM-18 URL: https://issues.apache.org/jira/browse/BEAM-18 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Amit Sela Assignee: Amit Sela This is the write side counterpart to BEAM-17 Upstream docs are at https://cloud.google.com/dataflow/model/sources-and-sinks#creating-sinks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-17) Add support for new Dataflow Source API
Amit Sela created BEAM-17: - Summary: Add support for new Dataflow Source API Key: BEAM-17 URL: https://issues.apache.org/jira/browse/BEAM-17 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Amit Sela Assignee: Amit Sela The API is discussed in https://cloud.google.com/dataflow/model/sources-and-sinks#creating-sources To implement this, we need to add support for com.google.cloud.dataflow.sdk.io.Read in TransformTranslator. This can be done by creating a new SourceInputFormat class that translates from a DF Source to a Hadoop InputFormat. The two concepts are pretty-well aligned since they both have the concept of splits and readers. Note that when there's a native HadoopSource in DF, it will need special-casing in the code for Read since we'll be able to use the underlying InputFormat directly. This could be tested using XmlSource from the SDK. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-16) Make Spark RDDs readable as PCollections
Amit Sela created BEAM-16: - Summary: Make Spark RDDs readable as PCollections Key: BEAM-16 URL: https://issues.apache.org/jira/browse/BEAM-16 Project: Beam Issue Type: New Feature Reporter: Amit Sela Priority: Minor This could be done by implementing a SparkSource. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-15) Applying windowing to cached RDDs fails
Amit Sela created BEAM-15: - Summary: Applying windowing to cached RDDs fails Key: BEAM-15 URL: https://issues.apache.org/jira/browse/BEAM-15 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Amit Sela Assignee: Amit Sela The Spark runner caches RDDs that are accessed more than once. If applying window operations to a cached RDD, it will fail because windowed RDDs will try to cache with a different cache level - windowing cache level is StorageLevel.MEMORY_ONLY_SER and RDD cache level is StorageLevel.MEMORY_ONLY. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (BEAM-13) Create JMS IO
[ https://issues.apache.org/jira/browse/BEAM-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147586#comment-15147586 ] Jean-Baptiste Onofré edited comment on BEAM-13 at 2/15/16 5:19 PM: --- Added Write and generic connection factory support (allowing to use any JMS broker). Now, I'm fixing issues, add support of queues and topics, and add tests. was (Author: jbonofre): Added Write and generic connection factory support (allowing to use any JMS broker). Now, I'm fixing issue, add support of queues and topics, and add tests. > Create JMS IO > - > > Key: BEAM-13 > URL: https://issues.apache.org/jira/browse/BEAM-13 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Work in progress: https://github.com/jbonofre/DataflowJavaSDK/tree/IO-JMS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-13) Create JMS IO
[ https://issues.apache.org/jira/browse/BEAM-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147586#comment-15147586 ] Jean-Baptiste Onofré commented on BEAM-13: -- Added Write and generic connection factory support (allowing to use any JMS broker). Now, I'm fixing issue, add support of queues and topics, and add tests. > Create JMS IO > - > > Key: BEAM-13 > URL: https://issues.apache.org/jira/browse/BEAM-13 > Project: Beam > Issue Type: New Feature > Components: sdk-java-extensions >Reporter: Jean-Baptiste Onofré >Assignee: Jean-Baptiste Onofré > > Work in progress: https://github.com/jbonofre/DataflowJavaSDK/tree/IO-JMS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-12) Apply GroupByKey transforms on PCollection of normal type other than KV
[ https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147522#comment-15147522 ] Kenneth Knowles commented on BEAM-12: - Note also that the SDK includes a very simple composite {{WithKeys}} transform for just this use. > Apply GroupByKey transforms on PCollection of normal type other than KV > --- > > Key: BEAM-12 > URL: https://issues.apache.org/jira/browse/BEAM-12 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core >Reporter: bakeypan >Assignee: Frances Perry >Priority: Trivial > > Now the GroupByKey transforms can only apply on PCollection>.So I > have to transform PCollection to PCollection > before I want to > apply GroupByKey. > I think we can do better by apply GroupByKey on normal type of PCollection > other than KV.And user can offer one custome extract key function or we can > offer default extract key function.Just like this: > PCollection input = ... > PCollection > result = input.apply(GroupByKey. V>create(new ExtractFn())); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
svn commit: r1730528 - /incubator/beam/website/
Author: jbonofre Date: Mon Feb 15 13:45:15 2016 New Revision: 1730528 URL: http://svn.apache.org/viewvc?rev=1730528=rev Log: [scm-publish] Updating Beam website Modified: incubator/beam/website/building-and-deploying.html incubator/beam/website/clustering.html incubator/beam/website/configuration.html incubator/beam/website/dependency-convergence.html incubator/beam/website/dependency-info.html incubator/beam/website/distribution-management.html incubator/beam/website/getting-started.html incubator/beam/website/index.html incubator/beam/website/integration.html incubator/beam/website/issue-tracking.html incubator/beam/website/license.html incubator/beam/website/mail-lists.html incubator/beam/website/plugin-management.html incubator/beam/website/plugins.html incubator/beam/website/privacy-policy.html incubator/beam/website/project-info.html incubator/beam/website/project-summary.html incubator/beam/website/source-repository.html incubator/beam/website/team-list.html Modified: incubator/beam/website/building-and-deploying.html URL: http://svn.apache.org/viewvc/incubator/beam/website/building-and-deploying.html?rev=1730528=1730527=1730528=diff == --- incubator/beam/website/building-and-deploying.html (original) +++ incubator/beam/website/building-and-deploying.html Mon Feb 15 13:45:15 2016 @@ -1,7 +1,7 @@ @@ -357,7 +357,7 @@ Java HotSpot(TM) 64-Bit Server VM (build Back to top Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights Reserved. - Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-12. + Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-15. http://github.com/andriusvelykis/reflow-maven-skin; title="Reflow Maven skin">Reflow Maven skin by http://andrius.velykis.lt; target="_blank" title="Andrius Velykis">Andrius Velykis. Modified: incubator/beam/website/clustering.html URL: http://svn.apache.org/viewvc/incubator/beam/website/clustering.html?rev=1730528=1730527=1730528=diff == --- incubator/beam/website/clustering.html (original) +++ incubator/beam/website/clustering.html Mon Feb 15 13:45:15 2016 @@ -1,7 +1,7 @@ @@ -291,7 +291,7 @@ monthlyIndex.numberOfReplicas=1 Back to top Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights Reserved. - Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-12. + Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-15. http://github.com/andriusvelykis/reflow-maven-skin; title="Reflow Maven skin">Reflow Maven skin by http://andrius.velykis.lt; target="_blank" title="Andrius Velykis">Andrius Velykis. Modified: incubator/beam/website/configuration.html URL: http://svn.apache.org/viewvc/incubator/beam/website/configuration.html?rev=1730528=1730527=1730528=diff == --- incubator/beam/website/configuration.html (original) +++ incubator/beam/website/configuration.html Mon Feb 15 13:45:15 2016 @@ -1,7 +1,7 @@ @@ -416,7 +416,7 @@ ProxyPassReverse / http://localhost:8181 Back to top Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights Reserved. - Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-12. + Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-15. http://github.com/andriusvelykis/reflow-maven-skin; title="Reflow Maven skin">Reflow Maven skin by http://andrius.velykis.lt; target="_blank" title="Andrius Velykis">Andrius Velykis. Modified: incubator/beam/website/dependency-convergence.html URL: http://svn.apache.org/viewvc/incubator/beam/website/dependency-convergence.html?rev=1730528=1730527=1730528=diff == --- incubator/beam/website/dependency-convergence.html (original) +++ incubator/beam/website/dependency-convergence.html Mon Feb 15 13:45:15 2016 @@ -1,7 +1,7 @@ @@ -257,7 +257,7 @@ Back to top Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights
[jira] [Created] (BEAM-14) Add data integration DSL
Jean-Baptiste Onofré created BEAM-14: Summary: Add data integration DSL Key: BEAM-14 URL: https://issues.apache.org/jira/browse/BEAM-14 Project: Beam Issue Type: New Feature Reporter: Jean-Baptiste Onofré Assignee: Jean-Baptiste Onofré Even if users would still be able to use directly the API, it would be great to provide a DSL on top of the API covering batch and streaming data processing but also data integration. Instead of designing a pipeline as a chain of apply() wrapping function (DoFn), we can provide a fluent DSL allowing users to directly leverage keyturn functions. For instance, an user would be able to design a pipeline like: {code} .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”); {code} The DSL will allow to use existing pipelines, for instance: {code} .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all") {code} So it means that we will have to create a IO Sink that can trigger the execution of a target pipeline: (from("trigger:other") triggering the pipeline execution when another pipeline design starts with pipeline("other")). We can also imagine to mix the runners: the pipeline() can be on one runner, the from("trigger:other") can be on another runner). It's not trivial, but it will give strong flexibility and key value for Beam. In a second step, we can provide DSLs in different languages (the first one would be Java, but why not providing XML, akka, scala DSLs). We can note in previous examples that the DSL would also provide data integration support to bean in addition of data processing. Data Integration is an extension of Beam API to support some Enterprise Integration Patterns (EIPs). As we would need metadata for data integration (even if metadata can also be interesting in stream/batch data processing pipeline), we can provide a DataxMessage built on top of PCollection. A DataxMessage would contain: structured headers binary payload For instance, the headers can contains an Avro schema to describe the payload. The headers can also contains useful information coming from the IO Source (for instance the partition/path where the data comes from, …). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-6) Import Spark Runner code
[ https://issues.apache.org/jira/browse/BEAM-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Baptiste Onofré reassigned BEAM-6: --- Assignee: Jean-Baptiste Onofré > Import Spark Runner code > > > Key: BEAM-6 > URL: https://issues.apache.org/jira/browse/BEAM-6 > Project: Beam > Issue Type: Sub-task > Components: runner-spark >Reporter: Frances Perry >Assignee: Jean-Baptiste Onofré > -- This message was sent by Atlassian JIRA (v6.3.4#6332)