Re: Gobblin on Yarn ?

2019-04-03 Thread Jay Sen
Hi Sudarshan

MR mode, will have dependency on hadoop cluster, I am thinking to have
independent gobblin cluster for all the data movement jobs.
also I have tried Hive-Distcp
 on
cluster mode and managed to run it. ( there are lot of configs are missing
that i was only able to figure out from the code base).

Is there any difference for MR vs Cluster mode in terms of performance or
feature set?

btw, Regarding GOBBLIN-714, I have lost the log, but this couldnt very edge
case, but for GOBBLIN-711
 I have captured all the
logs.

Thanks
Jay




On Tue, Apr 2, 2019 at 9:20 PM Sudarshan Vasudevan 
wrote:

> Hi Jay,
> For your immediate use case, will the MR mode work? If that is the case,
> you can take a look at Hive Distcp:
> https://gobblin.readthedocs.io/en/latest/case-studies/Hive-Distcp/
>
> For GOBBLIN-714, can you attach any relevant stacktraces that you see in
> the cluster logs that indicate the failure of the jobs? It is interesting
> that the Job execution state for most of the jobs is shown as COMMITTED as
> opposed to SUCCESSFUL.
>
> Thanks,
> Sudarshan
>
>
> --
> *From:* Jay Sen 
> *Sent:* Tuesday, April 2, 2019 8:02 PM
> *To:* Sudarshan Vasudevan; dev@gobblin.incubator.apache.org
> *Subject:* Re: Gobblin on Yarn ?
>
> Thanks Sudarshan for sharing the info.
>
> I started playing around gobblin cluster ( master/worker) mode and came
> across some weird issues, ( GOBBLIN-714
> 
>  & GOBBLIN-711
> 
>  ).
>
> I assume the standalone mode is limited to single node ( may be multi
> process ), so I really need cluster environment capable for tolerating node
> failures, etc...
>
> the immediate use-case i am looking at us hive to hive with overall 10TB a
> day.
>
> Pls let me know ur thoughts.
>
> Thanks
> Jay
>
> On Sun, Mar 31, 2019 at 8:29 PM Sudarshan Vasudevan <
> suvasude...@linkedin.com> wrote:
>
> Hi Jay,
> We run both Gobblin Cluster and Gobblin Standalone in production, which
> are both fairly stable. We also run Gobblin pipelines in Mapreduce mode in
> production.
>
> There is some recent interest to revive Gobblin-on-Yarn for a few internal
> use cases. We will hopefully have something to share on that front. So stay
> tuned!
>
> If you share more details about your use case (e.g. details about the
> source/sink, volume of data to be moved), that will help us point you in
> the right direction.
>
> Best,
> Sudarshan
> --
> *From:* Jay Sen 
> *Sent:* Sunday, March 31, 2019 7:07 PM
> *To:* dev@gobblin.incubator.apache.org
> *Subject:* Re: Gobblin on Yarn ?
>
> Hi All,
>
> What would be the most stable mode in gobblin to run on production ?
> cluster ( master + worker ) or standalone or any other ?
>
> what is the mode you are running on prod ? can u guys pls share ?
>
> Thanks
> Jay
>
>
> On Wed, Feb 27, 2019 at 6:16 PM Jay Sen  wrote:
>
> > Hi,
> >
> > anybody running Gobblin on yarn mode in production or even in dev
> > environment ? can u share pls the experience?
> >
> > looking for some data points on how it would be beneficial over
> standalone.
> >
> > Thanks
> > Jay
> >
>
>


Job Management

2019-04-03 Thread Jay Sen
Hi,

I see gobblin creates jobs and tasks and put into zookeeper for worker
nodes to pickup and process. ( not sure if this only applies to cluster
mode or all modes )

In case of job failures, how one suppose to restart, ignore or skip the
rest of the job or even disable it?
is there an API for such management, (exposable to Admin UI) ? and does
that essentially updates entries in zookeeper ?

would appreciate if someone can shade some light on this area.

Thanks
Jay


writer and publisher

2019-04-03 Thread Jay Sen
Hi Sudarshan,

had couple of questions, thought you might help me figure it out.

1. How gobblin can integrate with schema registry and how it support schema
evolution ?
2. based on class info, is there a way to remove or mask the column ?
3. how about encryption, is that something, connector has to take care of
or is there any underlying functionalities that can be leveraged?

Thanks
Jay


Gobblin at ApacheCon ?

2019-04-03 Thread Jay Sen
Hi Guys,

Lets present Apache Gobblin at the ApacheCon.

I would be interested in presenting/co-presenting PayPal's use-case.

@PMCs, Please share your thoughts.

Thanks
Jay


[jira] [Work logged] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-712?focusedWorklogId=222431=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222431
 ]

ASF GitHub Bot logged work on GOBBLIN-712:
--

Author: ASF GitHub Bot
Created on: 03/Apr/19 17:23
Start Date: 03/Apr/19 17:23
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on issue #2579: [GOBBLIN-712] Add 
version strategy pickup for ConfigBasedDataset distcp workflow
URL: 
https://github.com/apache/incubator-gobblin/pull/2579#issuecomment-479582467
 
 
   +1
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222431)
Time Spent: 2.5h  (was: 2h 20m)

> Add version strategy for configbased dataset copy
> -
>
> Key: GOBBLIN-712
> URL: https://issues.apache.org/jira/browse/GOBBLIN-712
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2584: Filter Out Empty MultiWorkUnits

2019-04-03 Thread GitBox
asfgit closed pull request #2584: Filter Out Empty MultiWorkUnits
URL: https://github.com/apache/incubator-gobblin/pull/2584
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (GOBBLIN-717) Filter Out Empty MultiWorkUnits

2019-04-03 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-717.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2584
[https://github.com/apache/incubator-gobblin/pull/2584]

> Filter Out Empty MultiWorkUnits
> ---
>
> Key: GOBBLIN-717
> URL: https://issues.apache.org/jira/browse/GOBBLIN-717
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Zihan Li
>Priority: Major
> Fix For: 0.15.0
>
>
> Now when we run a job, Gobblin use the value of max mappers or the target 
> size of a mapper to determine the number of mappers. But since one partition 
> cannot be divided into several WorkUnits, work cannot be evenly distributed, 
> there are many mappers(MultiWorkUnits) have no work to do. This will waste a 
> lot of resources. So we need to filter out MultiWorkUnits which contains no 
> WorkUnit when we determine the number of mappers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=222609=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222609
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 03/Apr/19 20:46
Start Date: 03/Apr/19 20:46
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2578: 
[GOBBLIN-707] rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r271922811
 
 

 ##
 File path: conf/yarn/reference.conf
 ##
 @@ -38,6 +38,6 @@ gobblin.yarn.work.dir=/gobblin
 gobblin.cluster.helix.cluster.name=GobblinYarn
 gobblin.cluster.zk.connection.string="localhost:2181"
 
-fs.uri="hdfs://localhost:9000"
 
 Review comment:
   Just curious, why port number needs to be changed here? Seems both 8020/9000 
can be default port number for IPC of Namenode
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222609)
Time Spent: 1.5h  (was: 1h 20m)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and there is a individual 
> script for each of them.
> 1. there can be one gobblin.sh script
> {{gobblin.sh   }}
> {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager}}
> {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> 2. Also configs needs to be structured and deduped accordingly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=222610=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222610
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 03/Apr/19 20:46
Start Date: 03/Apr/19 20:46
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2578: 
[GOBBLIN-707] rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r271925108
 
 

 ##
 File path: gobblin-docs/user-guide/Gobblin-CLI.md
 ##
 @@ -28,29 +28,29 @@ Gobblin ingestion applications
 
 Gobblin ingestion applications can be accessed through the command `run`:
 ```bash
-bin/gobblin run [listQuickApps] [] -jobName  [OPTIONS]
+bin/gobblin cli run [listQuickApps] [] -jobName  [OPTIONS]
 
 Review comment:
   Sorry for getting back to this late. 
   Yes that makes sense to me. I see them in `gobblin.sh` printUsage() method. 
Can you add/edit documentation to mention the existence of `gobblin.sh` so that 
newer users will be aware of it ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222610)
Time Spent: 1h 40m  (was: 1.5h)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and there is a individual 
> script for each of them.
> 1. there can be one gobblin.sh script
> {{gobblin.sh   }}
> {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager}}
> {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> 2. Also configs needs to be structured and deduped accordingly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=222611=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222611
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 03/Apr/19 20:46
Start Date: 03/Apr/19 20:46
Worklog Time Spent: 10m 
  Work Description: autumnust commented on pull request #2578: 
[GOBBLIN-707] rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r271923038
 
 

 ##
 File path: conf/yarn/application.conf
 ##
 @@ -22,15 +22,18 @@ gobblin.yarn.app.name=GobblinYarn
 gobblin.yarn.app.master.memory.mbs=256
 gobblin.yarn.initial.containers=2
 gobblin.yarn.container.memory.mbs=512
-gobblin.yarn.conf.dir=
-gobblin.yarn.lib.jars.dir=
-gobblin.yarn.app.master.files.local=${gobblin.yarn.conf.dir}"/log4j-yarn.properties,"${gobblin.yarn.conf.dir}"/application.conf,"${gobblin.yarn.conf.dir}"/reference.conf"
+gobblin.yarn.conf.dir=/tools/gobblin-dist/conf/yarn/
 
 Review comment:
   Why this is being hard coded ? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222611)
Time Spent: 1h 40m  (was: 1.5h)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and there is a individual 
> script for each of them.
> 1. there can be one gobblin.sh script
> {{gobblin.sh   }}
> {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager}}
> {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> 2. Also configs needs to be structured and deduped accordingly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-03 Thread GitBox
autumnust commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r271922811
 
 

 ##
 File path: conf/yarn/reference.conf
 ##
 @@ -38,6 +38,6 @@ gobblin.yarn.work.dir=/gobblin
 gobblin.cluster.helix.cluster.name=GobblinYarn
 gobblin.cluster.zk.connection.string="localhost:2181"
 
-fs.uri="hdfs://localhost:9000"
 
 Review comment:
   Just curious, why port number needs to be changed here? Seems both 8020/9000 
can be default port number for IPC of Namenode


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-03 Thread GitBox
autumnust commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r271925108
 
 

 ##
 File path: gobblin-docs/user-guide/Gobblin-CLI.md
 ##
 @@ -28,29 +28,29 @@ Gobblin ingestion applications
 
 Gobblin ingestion applications can be accessed through the command `run`:
 ```bash
-bin/gobblin run [listQuickApps] [] -jobName  [OPTIONS]
+bin/gobblin cli run [listQuickApps] [] -jobName  [OPTIONS]
 
 Review comment:
   Sorry for getting back to this late. 
   Yes that makes sense to me. I see them in `gobblin.sh` printUsage() method. 
Can you add/edit documentation to mention the existence of `gobblin.sh` so that 
newer users will be aware of it ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] autumnust commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-03 Thread GitBox
autumnust commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r271923038
 
 

 ##
 File path: conf/yarn/application.conf
 ##
 @@ -22,15 +22,18 @@ gobblin.yarn.app.name=GobblinYarn
 gobblin.yarn.app.master.memory.mbs=256
 gobblin.yarn.initial.containers=2
 gobblin.yarn.container.memory.mbs=512
-gobblin.yarn.conf.dir=
-gobblin.yarn.lib.jars.dir=
-gobblin.yarn.app.master.files.local=${gobblin.yarn.conf.dir}"/log4j-yarn.properties,"${gobblin.yarn.conf.dir}"/application.conf,"${gobblin.yarn.conf.dir}"/reference.conf"
+gobblin.yarn.conf.dir=/tools/gobblin-dist/conf/yarn/
 
 Review comment:
   Why this is being hard coded ? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: writer and publisher

2019-04-03 Thread Lei Sun
Hi Jay,


  1.  We have KafkaSchemaRegistry interface built in with several 
implementation. The schema was fetched where kafka byte being decoded, either 
in Extractor or Converter. There's no handling of schema evolution on Gobblin 
runtime ( @Sudarshan Vasudevan correct me if 
there's any )
  2.  We do drop some of the column due to couple of reasons and there's a 
AvroProjectionConverter to achieve this.
  3.   Encryption can also be achieved by converter, e.g. 
StringFieldEncryptorConverter.  Not sure if it is the specific usage you need.


Lei

From: Jay Sen 
Sent: Wednesday, April 3, 2019 1:22 AM
To: Sudarshan Vasudevan; dev@gobblin.incubator.apache.org
Subject: writer and publisher

Hi Sudarshan,

had couple of questions, thought you might help me figure it out.

1. How gobblin can integrate with schema registry and how it support schema
evolution ?
2. based on class info, is there a way to remove or mask the column ?
3. how about encryption, is that something, connector has to take care of
or is there any underlying functionalities that can be leveraged?

Thanks
Jay


Re: writer and publisher

2019-04-03 Thread Zhixiong Chen
Hi Jay,

1) Checkout `org.apache.gobblin.metrics.kafka.KafkaSchemaRegistry`. We're 
following the kafka way to support schema evolution. You may find more details 
here: https://docs.confluent.io/current/schema-registry/index.html

2) Were you asking for a converter that removes fields from a record? If so, 
check out `AvroProjectionConverter`. Otherwise, you can implement a specific 
converter that meets your use cases.

3) Typically, encryption is done by a `PasswordManager` instance in Gobblin.

Regards,
Zhixiong

From: Jay Sen 
Sent: Wednesday, April 3, 2019 1:22 AM
To: Sudarshan Vasudevan; dev@gobblin.incubator.apache.org
Subject: writer and publisher

Hi Sudarshan,

had couple of questions, thought you might help me figure it out.

1. How gobblin can integrate with schema registry and how it support schema
evolution ?
2. based on class info, is there a way to remove or mask the column ?
3. how about encryption, is that something, connector has to take care of
or is there any underlying functionalities that can be leveraged?

Thanks
Jay


[GitHub] [incubator-gobblin] ibuenros commented on issue #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow

2019-04-03 Thread GitBox
ibuenros commented on issue #2579: [GOBBLIN-712] Add version strategy pickup 
for ConfigBasedDataset distcp workflow
URL: 
https://github.com/apache/incubator-gobblin/pull/2579#issuecomment-479582467
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-719) gobblin-docs has invalid git links

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-719?focusedWorklogId=222440=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222440
 ]

ASF GitHub Bot logged work on GOBBLIN-719:
--

Author: ASF GitHub Bot
Created on: 03/Apr/19 17:33
Start Date: 03/Apr/19 17:33
Worklog Time Spent: 10m 
  Work Description: yukuai518 commented on issue #2586: [GOBBLIN-719] fix 
invalid git links for classes in docs
URL: 
https://github.com/apache/incubator-gobblin/pull/2586#issuecomment-479586004
 
 
   +1 
   Thanks for fixing all the link issues!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222440)
Time Spent: 20m  (was: 10m)

> gobblin-docs has invalid git links
> --
>
> Key: GOBBLIN-719
> URL: https://issues.apache.org/jira/browse/GOBBLIN-719
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Jay Sen
>Priority: Trivial
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> gobblin docs had some invalid links pointing not only LinkedIn repo but also 
> old location of the classes that has changes since then.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] yukuai518 commented on issue #2586: [GOBBLIN-719] fix invalid git links for classes in docs

2019-04-03 Thread GitBox
yukuai518 commented on issue #2586: [GOBBLIN-719] fix invalid git links for 
classes in docs
URL: 
https://github.com/apache/incubator-gobblin/pull/2586#issuecomment-479586004
 
 
   +1 
   Thanks for fixing all the link issues!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] arjun4084346 commented on a change in pull request #2589: [GOBBLIN-722] Unschedule gaas flow

2019-04-03 Thread GitBox
arjun4084346 commented on a change in pull request #2589: [GOBBLIN-722] 
Unschedule gaas flow
URL: https://github.com/apache/incubator-gobblin/pull/2589#discussion_r272010852
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/scheduler/JobScheduler.java
 ##
 @@ -351,8 +351,8 @@ public void scheduleJob(Properties jobProps, JobListener 
jobListener, Map

[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2589: [GOBBLIN-722] Unschedule gaas flow

2019-04-03 Thread GitBox
arjun4084346 opened a new pull request #2589: [GOBBLIN-722] Unschedule gaas flow
URL: https://github.com/apache/incubator-gobblin/pull/2589
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000  please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   it adds an option to let user unschedule a gaas flow.
   it also made it possible to schedule a job even if it is already scheduled
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   added a unit test in FlowConfigTest
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-722) add option to unschedule a gaas flow

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-722?focusedWorklogId=222746=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222746
 ]

ASF GitHub Bot logged work on GOBBLIN-722:
--

Author: ASF GitHub Bot
Created on: 04/Apr/19 01:34
Start Date: 04/Apr/19 01:34
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2589: 
[GOBBLIN-722] Unschedule gaas flow
URL: https://github.com/apache/incubator-gobblin/pull/2589
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000  please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   it adds an option to let user unschedule a gaas flow.
   it also made it possible to schedule a job even if it is already scheduled
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   added a unit test in FlowConfigTest
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222746)
Time Spent: 10m
Remaining Estimate: 0h

> add option to unschedule a gaas flow
> 
>
> Key: GOBBLIN-722
> URL: https://issues.apache.org/jira/browse/GOBBLIN-722
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-04-03 Thread Hung Tran (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hung Tran resolved GOBBLIN-712.
---
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2579
[https://github.com/apache/incubator-gobblin/pull/2579]

> Add version strategy for configbased dataset copy
> -
>
> Key: GOBBLIN-712
> URL: https://issues.apache.org/jira/browse/GOBBLIN-712
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-722) add option to unschedule a gaas flow

2019-04-03 Thread Arjun Singh Bora (JIRA)
Arjun Singh Bora created GOBBLIN-722:


 Summary: add option to unschedule a gaas flow
 Key: GOBBLIN-722
 URL: https://issues.apache.org/jira/browse/GOBBLIN-722
 Project: Apache Gobblin
  Issue Type: Improvement
Reporter: Arjun Singh Bora






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-712) Add version strategy for configbased dataset copy

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-712?focusedWorklogId=222737=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222737
 ]

ASF GitHub Bot logged work on GOBBLIN-712:
--

Author: ASF GitHub Bot
Created on: 04/Apr/19 00:58
Start Date: 04/Apr/19 00:58
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2579: [GOBBLIN-712] 
Add version strategy pickup for ConfigBasedDataset distcp workflow
URL: https://github.com/apache/incubator-gobblin/pull/2579
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222737)
Time Spent: 2h 40m  (was: 2.5h)

> Add version strategy for configbased dataset copy
> -
>
> Key: GOBBLIN-712
> URL: https://issues.apache.org/jira/browse/GOBBLIN-712
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Kuai Yu
>Priority: Major
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2579: [GOBBLIN-712] Add version strategy pickup for ConfigBasedDataset distcp workflow

2019-04-03 Thread GitBox
asfgit closed pull request #2579: [GOBBLIN-712] Add version strategy pickup for 
ConfigBasedDataset distcp workflow
URL: https://github.com/apache/incubator-gobblin/pull/2579
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-722) add option to unschedule a gaas flow

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-722?focusedWorklogId=222796=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222796
 ]

ASF GitHub Bot logged work on GOBBLIN-722:
--

Author: ASF GitHub Bot
Created on: 04/Apr/19 04:06
Start Date: 04/Apr/19 04:06
Worklog Time Spent: 10m 
  Work Description: arjun4084346 commented on pull request #2589: 
[GOBBLIN-722] Unschedule gaas flow
URL: https://github.com/apache/incubator-gobblin/pull/2589#discussion_r272010852
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/scheduler/JobScheduler.java
 ##
 @@ -351,8 +351,8 @@ public void scheduleJob(Properties jobProps, JobListener 
jobListener, Map add option to unschedule a gaas flow
> 
>
> Key: GOBBLIN-722
> URL: https://issues.apache.org/jira/browse/GOBBLIN-722
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] sv2000 commented on a change in pull request #2589: [GOBBLIN-722] Unschedule gaas flow

2019-04-03 Thread GitBox
sv2000 commented on a change in pull request #2589: [GOBBLIN-722] Unschedule 
gaas flow
URL: https://github.com/apache/incubator-gobblin/pull/2589#discussion_r272006054
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/scheduler/JobScheduler.java
 ##
 @@ -351,8 +351,8 @@ public void scheduleJob(Properties jobProps, JobListener 
jobListener, Map

[jira] [Work logged] (GOBBLIN-722) add option to unschedule a gaas flow

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-722?focusedWorklogId=222782=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222782
 ]

ASF GitHub Bot logged work on GOBBLIN-722:
--

Author: ASF GitHub Bot
Created on: 04/Apr/19 03:27
Start Date: 04/Apr/19 03:27
Worklog Time Spent: 10m 
  Work Description: sv2000 commented on pull request #2589: [GOBBLIN-722] 
Unschedule gaas flow
URL: https://github.com/apache/incubator-gobblin/pull/2589#discussion_r272006054
 
 

 ##
 File path: 
gobblin-runtime/src/main/java/org/apache/gobblin/scheduler/JobScheduler.java
 ##
 @@ -351,8 +351,8 @@ public void scheduleJob(Properties jobProps, JobListener 
jobListener, Map add option to unschedule a gaas flow
> 
>
> Key: GOBBLIN-722
> URL: https://issues.apache.org/jira/browse/GOBBLIN-722
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] jhsenjaliya commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-03 Thread GitBox
jhsenjaliya commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r272026110
 
 

 ##
 File path: conf/yarn/application.conf
 ##
 @@ -22,15 +22,18 @@ gobblin.yarn.app.name=GobblinYarn
 gobblin.yarn.app.master.memory.mbs=256
 gobblin.yarn.initial.containers=2
 gobblin.yarn.container.memory.mbs=512
-gobblin.yarn.conf.dir=
-gobblin.yarn.lib.jars.dir=
-gobblin.yarn.app.master.files.local=${gobblin.yarn.conf.dir}"/log4j-yarn.properties,"${gobblin.yarn.conf.dir}"/application.conf,"${gobblin.yarn.conf.dir}"/reference.conf"
+gobblin.yarn.conf.dir=/tools/gobblin-dist/conf/yarn/
 
 Review comment:
   this is missed, let me change this to 
`gobblin.yarn.conf.dir=${GOBBLIN_HOME}/conf/yarn/` thanks for catching this, 
this was my local config.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=222821=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222821
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 04/Apr/19 05:56
Start Date: 04/Apr/19 05:56
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2578: 
[GOBBLIN-707] rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r272026110
 
 

 ##
 File path: conf/yarn/application.conf
 ##
 @@ -22,15 +22,18 @@ gobblin.yarn.app.name=GobblinYarn
 gobblin.yarn.app.master.memory.mbs=256
 gobblin.yarn.initial.containers=2
 gobblin.yarn.container.memory.mbs=512
-gobblin.yarn.conf.dir=
-gobblin.yarn.lib.jars.dir=
-gobblin.yarn.app.master.files.local=${gobblin.yarn.conf.dir}"/log4j-yarn.properties,"${gobblin.yarn.conf.dir}"/application.conf,"${gobblin.yarn.conf.dir}"/reference.conf"
+gobblin.yarn.conf.dir=/tools/gobblin-dist/conf/yarn/
 
 Review comment:
   this is missed, let me change this to 
`gobblin.yarn.conf.dir=${GOBBLIN_HOME}/conf/yarn/` thanks for catching this, 
this was my local config.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222821)
Time Spent: 2h  (was: 1h 50m)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and there is a individual 
> script for each of them.
> 1. there can be one gobblin.sh script
> {{gobblin.sh   }}
> {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager}}
> {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> 2. Also configs needs to be structured and deduped accordingly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] jhsenjaliya commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-03 Thread GitBox
jhsenjaliya commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r272025514
 
 

 ##
 File path: conf/yarn/reference.conf
 ##
 @@ -38,6 +38,6 @@ gobblin.yarn.work.dir=/gobblin
 gobblin.cluster.helix.cluster.name=GobblinYarn
 gobblin.cluster.zk.connection.string="localhost:2181"
 
-fs.uri="hdfs://localhost:9000"
 
 Review comment:
   yes, both are acceptable, just changing it to 8020 as default which i 
believe most people use, can change it to 9000 if its otherwise, no prob.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-04-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=222820=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-222820
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 04/Apr/19 05:52
Start Date: 04/Apr/19 05:52
Worklog Time Spent: 10m 
  Work Description: jhsenjaliya commented on pull request #2578: 
[GOBBLIN-707] rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r272025514
 
 

 ##
 File path: conf/yarn/reference.conf
 ##
 @@ -38,6 +38,6 @@ gobblin.yarn.work.dir=/gobblin
 gobblin.cluster.helix.cluster.name=GobblinYarn
 gobblin.cluster.zk.connection.string="localhost:2181"
 
-fs.uri="hdfs://localhost:9000"
 
 Review comment:
   yes, both are acceptable, just changing it to 8020 as default which i 
believe most people use, can change it to 9000 if its otherwise, no prob.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 222820)
Time Spent: 1h 50m  (was: 1h 40m)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and there is a individual 
> script for each of them.
> 1. there can be one gobblin.sh script
> {{gobblin.sh   }}
> {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager}}
> {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> 2. Also configs needs to be structured and deduped accordingly.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] jhsenjaliya commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-04-03 Thread GitBox
jhsenjaliya commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r272026110
 
 

 ##
 File path: conf/yarn/application.conf
 ##
 @@ -22,15 +22,18 @@ gobblin.yarn.app.name=GobblinYarn
 gobblin.yarn.app.master.memory.mbs=256
 gobblin.yarn.initial.containers=2
 gobblin.yarn.container.memory.mbs=512
-gobblin.yarn.conf.dir=
-gobblin.yarn.lib.jars.dir=
-gobblin.yarn.app.master.files.local=${gobblin.yarn.conf.dir}"/log4j-yarn.properties,"${gobblin.yarn.conf.dir}"/application.conf,"${gobblin.yarn.conf.dir}"/reference.conf"
+gobblin.yarn.conf.dir=/tools/gobblin-dist/conf/yarn/
 
 Review comment:
   this is missed, let me change this to 
`gobblin.yarn.conf.dir=${GOBBLIN_HOME}/conf/yarn/` will be better than having 
 btw, thanks for catching this, this was my local config.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services