[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-04-03 Thread Hung Tran (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424428#comment-16424428
 ] 

Hung Tran commented on GOBBLIN-385:
---

 [^samza.diff] 

[~vinothchandar], I have attached the Samza work. Gobblin was plugged into 
Samza through the {{SystemProducer}} interface. The {{GobblinSystemProducer}} 
creates a {{SamzaTaskRunner}} for each source that is registered by Samza. The 
{{SamzaTaskRunner}} instantiates a {{JobContext}} that is configured to execute 
a {{SamzaSource}}. This {{SamzaSource}} consumes from a queue that is written 
to when Samza calls the {{SystemProducer.send()}} call.

Please see the unit tests for example configuration.

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
> Attachments: samza.diff
>
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-03-29 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16419203#comment-16419203
 ] 

Vinoth Chandar commented on GOBBLIN-385:


At a high level, seems like we need a new

 - *CliSparkJobLauncher* : Just wrapping ServiceBasedApplicationLauncher and 
SparkJobLauncher

 - *SparkJobLauncher :* I think we should be able to extend MrJobLauncher and 
override `runWorkUnits` alone (which I think is the parallel work here). This 
class can create the SparkContext (there can be only 1 per jvm) and simply 
reuse existing input/output formats using SparkContext.newAPIHadoopRDD and 
PairRDD.saveAsNewAPIHadoopFile

 

[http://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaPairRDD.html#saveAsNewAPIHadoopFile-java.lang.String-java.lang.Class-java.lang.Class-java.lang.Class-]

[https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/api/java/JavaSparkContext.html#newAPIHadoopRDD(org.apache.hadoop.conf.Configuration,%20java.lang.Class,%20java.lang.Class,%20java.lang.Class)]

 

Above seems very simple, I am sure once we actually try doing it, we may see 
standard spark issues  like NotSerializable exceptions or some MR specific 
paths.

 

Do you have the Samza patch sitting in a diff somewhere? Could be useful to 
checkout before I embark on trying this out for reals.

 

 

 

 

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-03-26 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413933#comment-16413933
 ] 

Vinoth Chandar commented on GOBBLIN-385:


gtk :)  

I have started scoping. Will get something to you by EoW

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-03-23 Thread Abhishek Tiwari (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411898#comment-16411898
 ] 

Abhishek Tiwari commented on GOBBLIN-385:
-

This popped up twice again in conversations last week :) 

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-02-23 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16374524#comment-16374524
 ] 

Vinoth Chandar commented on GOBBLIN-385:


Apologies.. been busy with other things lately.. Will get to this in a week or 
so.. 

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-01-23 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336638#comment-16336638
 ] 

Vinoth Chandar commented on GOBBLIN-385:


Sg. Let me scope the change and respond back here with a plan to see if that 
make sense. 

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-01-23 Thread Abhishek Tiwari (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336619#comment-16336619
 ] 

Abhishek Tiwari commented on GOBBLIN-385:
-

[~vinothchandar] we will love that! I remember users have asked about it in our 
monthly video meetups a few times, so there is interest. 

Let us know if you have any questions in getting started. 

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GOBBLIN-385) Add Spark execution mode for Gobblin

2018-01-23 Thread Vinoth Chandar (JIRA)

[ 
https://issues.apache.org/jira/browse/GOBBLIN-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336538#comment-16336538
 ] 

Vinoth Chandar commented on GOBBLIN-385:


ah [~hutran] :) we meet again, looks like 

> Add Spark execution mode for Gobblin
> 
>
> Key: GOBBLIN-385
> URL: https://issues.apache.org/jira/browse/GOBBLIN-385
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-cluster
>Reporter: Vinoth Chandar
>Assignee: Hung Tran
>Priority: Major
>
> If there is interest, happy to contribute spark execution mode and eventually 
> add support for ingesting data into [https://github.com/uber/hudi] format..
> Please provide some guidance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)