[jira] [Commented] (BEAM-14) Add data integration DSL

2016-02-15 Thread Frances Perry (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148117#comment-15148117
 ] 

Frances Perry commented on BEAM-14:
---

I think there's a few rough concepts in here that may need model extensions, 
but general this seems to be about supporting a different DSL on top of the 
existing model.

> Add data integration DSL
> 
>
> Key: BEAM-14
> URL: https://issues.apache.org/jira/browse/BEAM-14
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-ideas
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Even if users would still be able to use directly the API, it would be great 
> to provide a DSL on top of the API covering batch and streaming data 
> processing but also data integration.
> Instead of designing a pipeline as a chain of apply() wrapping function 
> (DoFn), we can provide a fluent DSL allowing users to directly leverage 
> keyturn functions.
> For instance, an user would be able to design a pipeline like:
> {code}
> .from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
> {code}
> The DSL will allow to use existing pipelines, for instance:
> {code}
> .from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all")
> {code}
> So it means that we will have to create a IO Sink that can trigger the 
> execution of a target pipeline: (from("trigger:other") triggering the 
> pipeline execution when another pipeline design starts with 
> pipeline("other")). We can also imagine to mix the runners: the pipeline() 
> can be on one runner, the from("trigger:other") can be on another runner). 
> It's not trivial, but it will give strong flexibility and key value for Beam.
> In a second step, we can provide DSLs in different languages (the first one 
> would be Java, but why not providing XML, akka, scala DSLs).
> We can note in previous examples that the DSL would also provide data 
> integration support to bean in addition of data processing. Data Integration 
> is an extension of Beam API to support some Enterprise Integration Patterns 
> (EIPs). As we would need metadata for data integration (even if metadata can 
> also be interesting in stream/batch data processing pipeline), we can provide 
> a DataxMessage built on top of PCollection. A DataxMessage would contain:
> structured headers
> binary payload
> For instance, the headers can contains an Avro schema to describe the payload.
> The headers can also contains useful information coming from the IO Source 
> (for instance the partition/path where the data comes from, …).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-18) Add support for new Dataflow Sink API

2016-02-15 Thread Amit Sela (JIRA)
Amit Sela created BEAM-18:
-

 Summary: Add support for new Dataflow Sink API
 Key: BEAM-18
 URL: https://issues.apache.org/jira/browse/BEAM-18
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela


This is the write side counterpart to BEAM-17
Upstream docs are at 
https://cloud.google.com/dataflow/model/sources-and-sinks#creating-sinks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-17) Add support for new Dataflow Source API

2016-02-15 Thread Amit Sela (JIRA)
Amit Sela created BEAM-17:
-

 Summary: Add support for new Dataflow Source API
 Key: BEAM-17
 URL: https://issues.apache.org/jira/browse/BEAM-17
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela


The API is discussed in 
https://cloud.google.com/dataflow/model/sources-and-sinks#creating-sources

To implement this, we need to add support for 
com.google.cloud.dataflow.sdk.io.Read in TransformTranslator. This can be done 
by creating a new SourceInputFormat class that translates from a DF Source to a 
Hadoop InputFormat. The two concepts are pretty-well aligned since they both 
have the concept of splits and readers.

Note that when there's a native HadoopSource in DF, it will need special-casing 
in the code for Read since we'll be able to use the underlying InputFormat 
directly.

This could be tested using XmlSource from the SDK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-16) Make Spark RDDs readable as PCollections

2016-02-15 Thread Amit Sela (JIRA)
Amit Sela created BEAM-16:
-

 Summary: Make Spark RDDs readable as PCollections
 Key: BEAM-16
 URL: https://issues.apache.org/jira/browse/BEAM-16
 Project: Beam
  Issue Type: New Feature
Reporter: Amit Sela
Priority: Minor


This could be done by implementing a SparkSource.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BEAM-15) Applying windowing to cached RDDs fails

2016-02-15 Thread Amit Sela (JIRA)
Amit Sela created BEAM-15:
-

 Summary: Applying windowing to cached RDDs fails
 Key: BEAM-15
 URL: https://issues.apache.org/jira/browse/BEAM-15
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela


The Spark runner caches RDDs that are accessed more than once. If applying 
window operations to a cached RDD, it will fail because windowed RDDs will try 
to cache with a different cache level - windowing cache level is 
StorageLevel.MEMORY_ONLY_SER and RDD cache level is StorageLevel.MEMORY_ONLY.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (BEAM-13) Create JMS IO

2016-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147586#comment-15147586
 ] 

Jean-Baptiste Onofré edited comment on BEAM-13 at 2/15/16 5:19 PM:
---

Added Write and generic connection factory support (allowing to use any JMS 
broker). Now, I'm fixing issues, add support of queues and topics, and add 
tests.


was (Author: jbonofre):
Added Write and generic connection factory support (allowing to use any JMS 
broker). Now, I'm fixing issue, add support of queues and topics, and add tests.

> Create JMS IO
> -
>
> Key: BEAM-13
> URL: https://issues.apache.org/jira/browse/BEAM-13
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Work in progress: https://github.com/jbonofre/DataflowJavaSDK/tree/IO-JMS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-13) Create JMS IO

2016-02-15 Thread JIRA

[ 
https://issues.apache.org/jira/browse/BEAM-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147586#comment-15147586
 ] 

Jean-Baptiste Onofré commented on BEAM-13:
--

Added Write and generic connection factory support (allowing to use any JMS 
broker). Now, I'm fixing issue, add support of queues and topics, and add tests.

> Create JMS IO
> -
>
> Key: BEAM-13
> URL: https://issues.apache.org/jira/browse/BEAM-13
> Project: Beam
>  Issue Type: New Feature
>  Components: sdk-java-extensions
>Reporter: Jean-Baptiste Onofré
>Assignee: Jean-Baptiste Onofré
>
> Work in progress: https://github.com/jbonofre/DataflowJavaSDK/tree/IO-JMS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (BEAM-12) Apply GroupByKey transforms on PCollection of normal type other than KV

2016-02-15 Thread Kenneth Knowles (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147522#comment-15147522
 ] 

Kenneth Knowles commented on BEAM-12:
-

Note also that the SDK includes a very simple composite {{WithKeys}} transform 
for just this use.

> Apply GroupByKey transforms on PCollection of normal type other than KV
> ---
>
> Key: BEAM-12
> URL: https://issues.apache.org/jira/browse/BEAM-12
> Project: Beam
>  Issue Type: Improvement
>  Components: sdk-java-core
>Reporter: bakeypan
>Assignee: Frances Perry
>Priority: Trivial
>
> Now the GroupByKey transforms can only apply on PCollection>.So I 
> have to transform PCollection to PCollection> before I want to 
> apply GroupByKey.
> I think we can do better by apply GroupByKey on normal type of PCollection 
> other than KV.And user can offer one custome extract key function or we can 
> offer default extract key function.Just like this:
> PCollection input = ...
> PCollection> result = input.apply(GroupByKey. V>create(new ExtractFn()));



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


svn commit: r1730528 - /incubator/beam/website/

2016-02-15 Thread jbonofre
Author: jbonofre
Date: Mon Feb 15 13:45:15 2016
New Revision: 1730528

URL: http://svn.apache.org/viewvc?rev=1730528=rev
Log:
[scm-publish] Updating Beam website

Modified:
incubator/beam/website/building-and-deploying.html
incubator/beam/website/clustering.html
incubator/beam/website/configuration.html
incubator/beam/website/dependency-convergence.html
incubator/beam/website/dependency-info.html
incubator/beam/website/distribution-management.html
incubator/beam/website/getting-started.html
incubator/beam/website/index.html
incubator/beam/website/integration.html
incubator/beam/website/issue-tracking.html
incubator/beam/website/license.html
incubator/beam/website/mail-lists.html
incubator/beam/website/plugin-management.html
incubator/beam/website/plugins.html
incubator/beam/website/privacy-policy.html
incubator/beam/website/project-info.html
incubator/beam/website/project-summary.html
incubator/beam/website/source-repository.html
incubator/beam/website/team-list.html

Modified: incubator/beam/website/building-and-deploying.html
URL: 
http://svn.apache.org/viewvc/incubator/beam/website/building-and-deploying.html?rev=1730528=1730527=1730528=diff
==
--- incubator/beam/website/building-and-deploying.html (original)
+++ incubator/beam/website/building-and-deploying.html Mon Feb 15 13:45:15 2016
@@ -1,7 +1,7 @@
 
 
 
 
@@ -357,7 +357,7 @@ Java HotSpot(TM) 64-Bit Server VM (build

Back to 
top
Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights 
Reserved.
-   Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-12. 
+   Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-15. 
http://github.com/andriusvelykis/reflow-maven-skin; title="Reflow Maven 
skin">Reflow Maven skin by http://andrius.velykis.lt; 
target="_blank" title="Andrius Velykis">Andrius Velykis.



Modified: incubator/beam/website/clustering.html
URL: 
http://svn.apache.org/viewvc/incubator/beam/website/clustering.html?rev=1730528=1730527=1730528=diff
==
--- incubator/beam/website/clustering.html (original)
+++ incubator/beam/website/clustering.html Mon Feb 15 13:45:15 2016
@@ -1,7 +1,7 @@
 
 
 
 
@@ -291,7 +291,7 @@ monthlyIndex.numberOfReplicas=1

Back to 
top
Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights 
Reserved.
-   Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-12. 
+   Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-15. 
http://github.com/andriusvelykis/reflow-maven-skin; title="Reflow Maven 
skin">Reflow Maven skin by http://andrius.velykis.lt; 
target="_blank" title="Andrius Velykis">Andrius Velykis.



Modified: incubator/beam/website/configuration.html
URL: 
http://svn.apache.org/viewvc/incubator/beam/website/configuration.html?rev=1730528=1730527=1730528=diff
==
--- incubator/beam/website/configuration.html (original)
+++ incubator/beam/website/configuration.html Mon Feb 15 13:45:15 2016
@@ -1,7 +1,7 @@
 
 
 
 
@@ -416,7 +416,7 @@ ProxyPassReverse / http://localhost:8181

Back to 
top
Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights 
Reserved.
-   Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-12. 
+   Version: 1.0.0-incubating-SNAPSHOT. Last Published: 2016-02-15. 
http://github.com/andriusvelykis/reflow-maven-skin; title="Reflow Maven 
skin">Reflow Maven skin by http://andrius.velykis.lt; 
target="_blank" title="Andrius Velykis">Andrius Velykis.



Modified: incubator/beam/website/dependency-convergence.html
URL: 
http://svn.apache.org/viewvc/incubator/beam/website/dependency-convergence.html?rev=1730528=1730527=1730528=diff
==
--- incubator/beam/website/dependency-convergence.html (original)
+++ incubator/beam/website/dependency-convergence.html Mon Feb 15 13:45:15 2016
@@ -1,7 +1,7 @@
 
 
 
 
@@ -257,7 +257,7 @@

Back to 
top
Copyright 2016 http://www.apache.org;>Apache Software Foundation. All Rights 

[jira] [Created] (BEAM-14) Add data integration DSL

2016-02-15 Thread JIRA
Jean-Baptiste Onofré created BEAM-14:


 Summary: Add data integration DSL
 Key: BEAM-14
 URL: https://issues.apache.org/jira/browse/BEAM-14
 Project: Beam
  Issue Type: New Feature
Reporter: Jean-Baptiste Onofré
Assignee: Jean-Baptiste Onofré


Even if users would still be able to use directly the API, it would be great to 
provide a DSL on top of the API covering batch and streaming data processing 
but also data integration.
Instead of designing a pipeline as a chain of apply() wrapping function (DoFn), 
we can provide a fluent DSL allowing users to directly leverage keyturn 
functions.

For instance, an user would be able to design a pipeline like:

{code}
.from(“kafka:localhost:9092?topic=foo”).reduce(...).split(...).wiretap(...).map(...).to(“jms:queue:foo….”);
{code}

The DSL will allow to use existing pipelines, for instance:

{code}
.from("cxf:...").reduce().pipeline("other").map().to("kafka:localhost:9092?topic=foo=all")
{code}

So it means that we will have to create a IO Sink that can trigger the 
execution of a target pipeline: (from("trigger:other") triggering the pipeline 
execution when another pipeline design starts with pipeline("other")). We can 
also imagine to mix the runners: the pipeline() can be on one runner, the 
from("trigger:other") can be on another runner). It's not trivial, but it will 
give strong flexibility and key value for Beam.

In a second step, we can provide DSLs in different languages (the first one 
would be Java, but why not providing XML, akka, scala DSLs).

We can note in previous examples that the DSL would also provide data 
integration support to bean in addition of data processing. Data Integration is 
an extension of Beam API to support some Enterprise Integration Patterns 
(EIPs). As we would need metadata for data integration (even if metadata can 
also be interesting in stream/batch data processing pipeline), we can provide a 
DataxMessage built on top of PCollection. A DataxMessage would contain:
structured headers
binary payload
For instance, the headers can contains an Avro schema to describe the payload.
The headers can also contains useful information coming from the IO Source (for 
instance the partition/path where the data comes from, …).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (BEAM-6) Import Spark Runner code

2016-02-15 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/BEAM-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Baptiste Onofré reassigned BEAM-6:
---

Assignee: Jean-Baptiste Onofré

> Import Spark Runner code
> 
>
> Key: BEAM-6
> URL: https://issues.apache.org/jira/browse/BEAM-6
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Frances Perry
>Assignee: Jean-Baptiste Onofré
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)