[jira] [Work logged] (GOBBLIN-775) Add job level retry for gobblin service

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-775?focusedWorklogId=247268=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247268
 ]

ASF GitHub Bot logged work on GOBBLIN-775:
--

Author: ASF GitHub Bot
Created on: 23/May/19 03:50
Start Date: 23/May/19 03:50
Worklog Time Spent: 10m 
  Work Description: jack-moseley commented on issue #2640: [GOBBLIN-775] 
Add job level retries for gobblin service
URL: 
https://github.com/apache/incubator-gobblin/pull/2640#issuecomment-495057221
 
 
   - Changed `JobExecutionPlan` equals and hashCode instead of changing the key 
of the `dagNode` maps
   - Changed orchestration events to include attempt counter, so that when 
there's a failure event we can avoid updating the `JobStatus` to failed if 
there will be a retry
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 247268)
Time Spent: 0.5h  (was: 20m)

> Add job level retry for gobblin service
> ---
>
> Key: GOBBLIN-775
> URL: https://issues.apache.org/jira/browse/GOBBLIN-775
> Project: Apache Gobblin
>  Issue Type: New Feature
>  Components: gobblin-service
>Reporter: Jack Moseley
>Assignee: Abhishek Tiwari
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] jack-moseley commented on issue #2640: [GOBBLIN-775] Add job level retries for gobblin service

2019-05-22 Thread GitBox
jack-moseley commented on issue #2640: [GOBBLIN-775] Add job level retries for 
gobblin service
URL: 
https://github.com/apache/incubator-gobblin/pull/2640#issuecomment-495057221
 
 
   - Changed `JobExecutionPlan` equals and hashCode instead of changing the key 
of the `dagNode` maps
   - Changed orchestration events to include attempt counter, so that when 
there's a failure event we can avoid updating the `JobStatus` to failed if 
there will be a retry


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-779) make job status retriever configurable

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-779?focusedWorklogId=247267=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247267
 ]

ASF GitHub Bot logged work on GOBBLIN-779:
--

Author: ASF GitHub Bot
Created on: 23/May/19 03:38
Start Date: 23/May/19 03:38
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2643: [GOBBLIN-779] 
make job status retriever configurable
URL: https://github.com/apache/incubator-gobblin/pull/2643
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 247267)
Time Spent: 10m
Remaining Estimate: 0h

> make job status retriever configurable
> --
>
> Key: GOBBLIN-779
> URL: https://issues.apache.org/jira/browse/GOBBLIN-779
> Project: Apache Gobblin
>  Issue Type: Bug
>Reporter: Arjun Singh Bora
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2643: [GOBBLIN-779] make job status retriever configurable

2019-05-22 Thread GitBox
asfgit closed pull request #2643: [GOBBLIN-779] make job status retriever 
configurable
URL: https://github.com/apache/incubator-gobblin/pull/2643
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (GOBBLIN-779) make job status retriever configurable

2019-05-22 Thread Arjun Singh Bora (JIRA)
Arjun Singh Bora created GOBBLIN-779:


 Summary: make job status retriever configurable
 Key: GOBBLIN-779
 URL: https://issues.apache.org/jira/browse/GOBBLIN-779
 Project: Apache Gobblin
  Issue Type: Bug
Reporter: Arjun Singh Bora






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] arjun4084346 opened a new pull request #2643: make job status retriever configurable

2019-05-22 Thread GitBox
arjun4084346 opened a new pull request #2643: make job status retriever 
configurable
URL: https://github.com/apache/incubator-gobblin/pull/2643
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below! @sv2000  please review
   
   
   ### JIRA
   - [x] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
   - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [x] Here are some details about my PR, including screenshots (if 
applicable):
   
   
   ### Tests
   - [x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   trivial changes
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=247138=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247138
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 22/May/19 23:08
Start Date: 22/May/19 23:08
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2578: [GOBBLIN-707] 
rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286705253
 
 

 ##
 File path: conf/cluster-master/application.conf
 ##
 @@ -69,3 +70,20 @@ task.status.reportintervalinms=1000
 # Enable metrics / events
 metrics.enabled=true
 
+# UI
+admin.server.enabled=true
+admin.server.port=9000
+
+#  is this required/redundent ?
+rest.server.host=localhost
+rest.server.port=9090
+
+# job history store
+job.execinfo.server.enabled=true
 
 Review comment:
   Ditto
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 247138)
Time Spent: 6h 20m  (was: 6h 10m)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and 
> hence changes the syntax for all commands and services.{color}



--
This message 

[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=247135=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247135
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 22/May/19 23:08
Start Date: 22/May/19 23:08
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2578: [GOBBLIN-707] 
rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286705377
 
 

 ##
 File path: conf/cluster-worker/application.conf
 ##
 @@ -69,3 +69,20 @@ task.status.reportintervalinms=1000
 # Enable metrics / events
 metrics.enabled=true
 
+failure.log.dir=${gobblin.cluster.work.dir}/failure-logs
+
+# UI
+admin.server.enabled=false
+# admin.server.port=9000
+
+rest.server.host=localhost
+rest.server.port=9090
+
+# job history store ( WARN [GobblinYarnAppLauncher] NOT starting the admin UI 
because the job execution info server is NOT enabled )
+job.execinfo.server.enabled=true
 
 Review comment:
   Why enabled by default?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 247135)
Time Spent: 6h 10m  (was: 6h)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  

[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=247136=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247136
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 22/May/19 23:08
Start Date: 22/May/19 23:08
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2578: [GOBBLIN-707] 
rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286703827
 
 

 ##
 File path: bin/gobblin.sh
 ##
 @@ -17,50 +17,488 @@
 # limitations under the License.
 #
 
-calling_dir() {
-  echo "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+# JAVA_HOME is required.
+if [[ -z "$JAVA_HOME" ]]; then
+echo -e "\nError: Environment variable JAVA_HOME not set!\n"
+exit 1
+fi
+
+# global vars
+
+GOBBLIN_VERSION=@project.version@
+GOBBLIN_HOME="$(cd `dirname $0`/..; pwd)"
+GOBBLIN_LIB=${GOBBLIN_HOME}/lib
+GOBBLIN_BIN=${GOBBLIN_HOME}/bin
+GOBBLIN_LOGS=${GOBBLIN_HOME}/logs
+GOBBLIN_CONF=''
+
+#sourcing basic gobblin env vars like GOBBLIN_HOME and GOBBLIN_LIB
+. ${GOBBLIN_BIN}/gobblin-env.sh
+
+CLUSTER_NAME="gobblin_cluster"
+JVM_OPTS="-Xmx1g -Xms512m"
+LOG4J_FILE_PATH=''
+LOG4J_OPTS=''
+GOBBLIN_MODE=''
+ACTION=''
+JVM_FLAGS=''
+EXTRA_JARS=''
+VERBOSE=0
+ENABLE_GC_LOGS=0
+CMD_PARAMS=''
+
+
+# Gobblin Commands, Modes & respective Classes
+GOBBLIN_MODE_TYPE=''
+CLI='cli'
+SERVICE='service'
+
+# Commands
+JOB_STATE_TO_JSON_CMD='job-state-to-json'
+JOB_STORE_SCHEMA_MANAGER_CMD='job-store-schema-manager'
+CLASSPATH_CMD='classpath'
+
+# Execution Modes
+STANDALONE_MODE='standalone'
+CLUSTER_MASTER_MODE='cluster-master'
+CLUSTER_WORKER_MODE='cluster-worker'
+AWS_MODE='aws'
+YARN_MODE='yarn'
+MAPREDUCE_MODE='mapreduce'
+SERVICE_MANAGER_MODE='service-manager'
+
+GOBBLIN_EXEC_MODE_LIST="$STANDALONE_MODE $CLUSTER_MASTER_MODE 
$CLUSTER_WORKER_MODE $AWS_MODE $YARN_MODE $MAPREDUCE_MODE $SERVICE_MANAGER_MODE"
+
+# CLI Command class
+CLI_CLASS='org.apache.gobblin.runtime.cli.GobblinCli'
+
+# Service Class
+STANDALONE_CLASS='org.apache.gobblin.scheduler.SchedulerDaemon'
+CLUSTER_MASTER_CLASS='org.apache.gobblin.cluster.GobblinClusterManager'
+CLUSTER_WORKER_CLASS='org.apache.gobblin.cluster.GobblinTaskRunner'
+AWS_CLASS='org.apache.gobblin.aws.GobblinAWSClusterLauncher'
+YARN_CLASS='org.apache.gobblin.yarn.GobblinYarnAppLauncher'
+MAPREDUCE_CLASS='org.apache.gobblin.runtime.mapreduce.CliMRJobLauncher'
+SERVICE_MANAGER_CLASS='org.apache.gobblin.service.modules.core.GobblinServiceManager'
+
+
+function print_gobblin_usage() {
+echo "Usage:"
+echo "gobblin.sh  cli "
+echo "gobblin.sh  service  "
+echo ""
+echo "Use \"gobblin  --help\" for more information. 
(Gobblin Version: $GOBBLIN_VERSION)"
+}
+
+function print_gobblin_cli_usage() {
 
 Review comment:
   Why is this needed? `GobblinCli` should be able to automatically generate 
this usage info.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 247136)
Time Spent: 6h 10m  (was: 6h)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 

[jira] [Work logged] (GOBBLIN-707) combine & standardize all gobblin scripts into one master script & restructure configs accordingly

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-707?focusedWorklogId=247137=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-247137
 ]

ASF GitHub Bot logged work on GOBBLIN-707:
--

Author: ASF GitHub Bot
Created on: 22/May/19 23:08
Start Date: 22/May/19 23:08
Worklog Time Spent: 10m 
  Work Description: ibuenros commented on pull request #2578: [GOBBLIN-707] 
rewrite gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286705209
 
 

 ##
 File path: conf/cluster-master/application.conf
 ##
 @@ -69,3 +70,20 @@ task.status.reportintervalinms=1000
 # Enable metrics / events
 metrics.enabled=true
 
+# UI
+admin.server.enabled=true
 
 Review comment:
   Why do we want admin server enabled by default?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 247137)

> combine & standardize all gobblin scripts into one master script & 
> restructure configs accordingly
> --
>
> Key: GOBBLIN-707
> URL: https://issues.apache.org/jira/browse/GOBBLIN-707
> Project: Apache Gobblin
>  Issue Type: Improvement
>Reporter: Jay Sen
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> gobblin supports multiple modes of executions ( CLI, Standalone, 
> cluster-master, cluster-worker, AWS, YARN, MR ) and various command lines 
> utility to run cli and admin commands. There is a individual script for each 
> of them.
> Having individual script introduces lot of issues
>  # all scripts handles gobblin variables, user parameters differently, and 
> its highly inconsistent among various different gobblin scripts
>  # functionality around start, stop, status checking and handling PID's among 
> lot of other things, varies vastly as per the implementation of the script.
>  # features like GC & JVM params, log4j file selection, classpath 
> calculation, etc... exists in some gobblin scripts but not all, adding to 
> inconsistent user experience.
>  # maintaining total 13 script would be too much effort.
> Also all the gobblin scripts share lot of common code to handle params, 
> start, stop services, status checks, pid handling, etc... combining all the 
> scripts into  1 not only makes maintenance easier but also brings clarity and 
> consistency.
>  
> Solution:
> 1. there can be one gobblin.sh script to handle all gobblin commands and 
> deployment options as per following signature. NOTE: This
> {{gobblin.sh   }}
>  {{gobblin.sh   }}
> {{commands values: admin, cli, statestore-check, statestore-clean, 
> historystore-manager, classpath}}
>  {{service values: standalone, cluster-master, cluster-worker, aws, yarn, mr, 
> service}}
> with above change, following becomes valid command.
> {code:java}
> # all under GobblinCli class
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run listQuickApps  –> gobblin cli run listQuickApps
> gobblin run  -> gobblin cli run 
> # class: JobStateToJsonConverter
> statestore-checker.sh  -> gobblin statestore-checker 
> # class: StateStoreCleaner
> statestore-clean.sh  -> gobblin statestore-clean 
> # class: DatabaseJobHistoryStoreSchemaManager
> historystore-manager.sh  -> gobblin historystore-manager 
> # class: Cli
> gobblin-admin.sh-> gobblin admin 
> # all gobblin deployment modes
> gobblin-cluster-master.sh   -> gobblin cluster-mater start|stop|status
> gobblin-cluster-worker.sh   -> gobblin cluster-mater start|stop|status
> gobblin-compaction.sh   -> gobblin cluster-mater start|stop|status
> gobblin-env.sh  -> gobblin cluster-mater start|stop|status
> gobblin-mapreduce.sh-> gobblin cluster-mater start|stop|status
> gobblin-service.sh  -> gobblin cluster-mater start|stop|status
> gobblin-standalone.sh   -> gobblin cluster-mater start|stop|status
> gobblin-yarn.sh -> gobblin cluster-mater start|stop|status
> {code}
>  
> 2. Also configs needs to be structured and deduped accordingly to make it 
> clear on which config will be picked up for which execution mode.
>  {color:#ff}
>  NOTE: this refactoring adds all cli and service commands to gobblin.sh and 
> hence changes the syntax for all commands and services.{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-05-22 Thread GitBox
ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286705377
 
 

 ##
 File path: conf/cluster-worker/application.conf
 ##
 @@ -69,3 +69,20 @@ task.status.reportintervalinms=1000
 # Enable metrics / events
 metrics.enabled=true
 
+failure.log.dir=${gobblin.cluster.work.dir}/failure-logs
+
+# UI
+admin.server.enabled=false
+# admin.server.port=9000
+
+rest.server.host=localhost
+rest.server.port=9090
+
+# job history store ( WARN [GobblinYarnAppLauncher] NOT starting the admin UI 
because the job execution info server is NOT enabled )
+job.execinfo.server.enabled=true
 
 Review comment:
   Why enabled by default?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-05-22 Thread GitBox
ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286705253
 
 

 ##
 File path: conf/cluster-master/application.conf
 ##
 @@ -69,3 +70,20 @@ task.status.reportintervalinms=1000
 # Enable metrics / events
 metrics.enabled=true
 
+# UI
+admin.server.enabled=true
+admin.server.port=9000
+
+#  is this required/redundent ?
+rest.server.host=localhost
+rest.server.port=9090
+
+# job history store
+job.execinfo.server.enabled=true
 
 Review comment:
   Ditto


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-05-22 Thread GitBox
ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286703827
 
 

 ##
 File path: bin/gobblin.sh
 ##
 @@ -17,50 +17,488 @@
 # limitations under the License.
 #
 
-calling_dir() {
-  echo "$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+# JAVA_HOME is required.
+if [[ -z "$JAVA_HOME" ]]; then
+echo -e "\nError: Environment variable JAVA_HOME not set!\n"
+exit 1
+fi
+
+# global vars
+
+GOBBLIN_VERSION=@project.version@
+GOBBLIN_HOME="$(cd `dirname $0`/..; pwd)"
+GOBBLIN_LIB=${GOBBLIN_HOME}/lib
+GOBBLIN_BIN=${GOBBLIN_HOME}/bin
+GOBBLIN_LOGS=${GOBBLIN_HOME}/logs
+GOBBLIN_CONF=''
+
+#sourcing basic gobblin env vars like GOBBLIN_HOME and GOBBLIN_LIB
+. ${GOBBLIN_BIN}/gobblin-env.sh
+
+CLUSTER_NAME="gobblin_cluster"
+JVM_OPTS="-Xmx1g -Xms512m"
+LOG4J_FILE_PATH=''
+LOG4J_OPTS=''
+GOBBLIN_MODE=''
+ACTION=''
+JVM_FLAGS=''
+EXTRA_JARS=''
+VERBOSE=0
+ENABLE_GC_LOGS=0
+CMD_PARAMS=''
+
+
+# Gobblin Commands, Modes & respective Classes
+GOBBLIN_MODE_TYPE=''
+CLI='cli'
+SERVICE='service'
+
+# Commands
+JOB_STATE_TO_JSON_CMD='job-state-to-json'
+JOB_STORE_SCHEMA_MANAGER_CMD='job-store-schema-manager'
+CLASSPATH_CMD='classpath'
+
+# Execution Modes
+STANDALONE_MODE='standalone'
+CLUSTER_MASTER_MODE='cluster-master'
+CLUSTER_WORKER_MODE='cluster-worker'
+AWS_MODE='aws'
+YARN_MODE='yarn'
+MAPREDUCE_MODE='mapreduce'
+SERVICE_MANAGER_MODE='service-manager'
+
+GOBBLIN_EXEC_MODE_LIST="$STANDALONE_MODE $CLUSTER_MASTER_MODE 
$CLUSTER_WORKER_MODE $AWS_MODE $YARN_MODE $MAPREDUCE_MODE $SERVICE_MANAGER_MODE"
+
+# CLI Command class
+CLI_CLASS='org.apache.gobblin.runtime.cli.GobblinCli'
+
+# Service Class
+STANDALONE_CLASS='org.apache.gobblin.scheduler.SchedulerDaemon'
+CLUSTER_MASTER_CLASS='org.apache.gobblin.cluster.GobblinClusterManager'
+CLUSTER_WORKER_CLASS='org.apache.gobblin.cluster.GobblinTaskRunner'
+AWS_CLASS='org.apache.gobblin.aws.GobblinAWSClusterLauncher'
+YARN_CLASS='org.apache.gobblin.yarn.GobblinYarnAppLauncher'
+MAPREDUCE_CLASS='org.apache.gobblin.runtime.mapreduce.CliMRJobLauncher'
+SERVICE_MANAGER_CLASS='org.apache.gobblin.service.modules.core.GobblinServiceManager'
+
+
+function print_gobblin_usage() {
+echo "Usage:"
+echo "gobblin.sh  cli "
+echo "gobblin.sh  service  "
+echo ""
+echo "Use \"gobblin  --help\" for more information. 
(Gobblin Version: $GOBBLIN_VERSION)"
+}
+
+function print_gobblin_cli_usage() {
 
 Review comment:
   Why is this needed? `GobblinCli` should be able to automatically generate 
this usage info.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [incubator-gobblin] ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite gobblin script to combine all modes and command

2019-05-22 Thread GitBox
ibuenros commented on a change in pull request #2578: [GOBBLIN-707] rewrite 
gobblin script to combine all modes and command
URL: https://github.com/apache/incubator-gobblin/pull/2578#discussion_r286705209
 
 

 ##
 File path: conf/cluster-master/application.conf
 ##
 @@ -69,3 +70,20 @@ task.status.reportintervalinms=1000
 # Enable metrics / events
 metrics.enabled=true
 
+# UI
+admin.server.enabled=true
 
 Review comment:
   Why do we want admin server enabled by default?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Work logged] (GOBBLIN-778) Enhance SalesforceExtractor bulkConnection config for setting transport factory

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-778?focusedWorklogId=246936=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246936
 ]

ASF GitHub Bot logged work on GOBBLIN-778:
--

Author: ASF GitHub Bot
Created on: 22/May/19 19:23
Start Date: 22/May/19 19:23
Worklog Time Spent: 10m 
  Work Description: asfgit commented on pull request #2642: GOBBLIN-778 - 
Moving config creation to a separate method
URL: https://github.com/apache/incubator-gobblin/pull/2642
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246936)
Time Spent: 20m  (was: 10m)

> Enhance SalesforceExtractor bulkConnection config for setting transport 
> factory
> ---
>
> Key: GOBBLIN-778
> URL: https://issues.apache.org/jira/browse/GOBBLIN-778
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: gobblin-salesforce
>Reporter: Monish Vachhani
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> SalesforceExtractor uses bulk connection to connect to Salesforce using bulk 
> API. Since bulkConnection is private variable it cannot be modified to pass 
> custom transportFactory via config.
> This task is to separate the config creation from bulkApiLogin method so as 
> it can be overridden for passing custom params like setTransport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] asfgit closed pull request #2642: GOBBLIN-778 - Moving config creation to a separate method

2019-05-22 Thread GitBox
asfgit closed pull request #2642: GOBBLIN-778 - Moving config creation to a 
separate method
URL: https://github.com/apache/incubator-gobblin/pull/2642
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (GOBBLIN-778) Enhance SalesforceExtractor bulkConnection config for setting transport factory

2019-05-22 Thread Issac Buenrostro (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Issac Buenrostro resolved GOBBLIN-778.
--
   Resolution: Fixed
Fix Version/s: 0.15.0

Issue resolved by pull request #2642
[https://github.com/apache/incubator-gobblin/pull/2642]

> Enhance SalesforceExtractor bulkConnection config for setting transport 
> factory
> ---
>
> Key: GOBBLIN-778
> URL: https://issues.apache.org/jira/browse/GOBBLIN-778
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: gobblin-salesforce
>Reporter: Monish Vachhani
>Assignee: Hung Tran
>Priority: Major
> Fix For: 0.15.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SalesforceExtractor uses bulk connection to connect to Salesforce using bulk 
> API. Since bulkConnection is private variable it cannot be modified to pass 
> custom transportFactory via config.
> This task is to separate the config creation from bulkApiLogin method so as 
> it can be overridden for passing custom params like setTransport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Work logged] (GOBBLIN-778) Enhance SalesforceExtractor bulkConnection config for setting transport factory

2019-05-22 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/GOBBLIN-778?focusedWorklogId=246613=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-246613
 ]

ASF GitHub Bot logged work on GOBBLIN-778:
--

Author: ASF GitHub Bot
Created on: 22/May/19 08:23
Start Date: 22/May/19 08:23
Worklog Time Spent: 10m 
  Work Description: mvachhani commented on pull request #2642: GOBBLIN-778 
- Moving config creation to a separate method
URL: https://github.com/apache/incubator-gobblin/pull/2642
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title.
   - https://issues.apache.org/jira/browse/GOBBLIN-778
   
   
   ### Description
   - [ x] Here are some details about my PR, including screenshots (if 
applicable):
   SalesforceExtractor uses bulk connection to connect to Salesforce using bulk 
API. Since bulkConnection is private variable it cannot be modified to pass 
custom transportFactory via config.
   This task is to separate the config creation from bulkApiLogin method so as 
it can be overridden for passing custom params like setTransport.
   
   ### Tests
   - [ x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   No new code added, this change only refactors the code into a separate 
method.
   
   ### Commits
   - [ x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 246613)
Time Spent: 10m
Remaining Estimate: 0h

> Enhance SalesforceExtractor bulkConnection config for setting transport 
> factory
> ---
>
> Key: GOBBLIN-778
> URL: https://issues.apache.org/jira/browse/GOBBLIN-778
> Project: Apache Gobblin
>  Issue Type: Task
>  Components: gobblin-salesforce
>Reporter: Monish Vachhani
>Assignee: Hung Tran
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SalesforceExtractor uses bulk connection to connect to Salesforce using bulk 
> API. Since bulkConnection is private variable it cannot be modified to pass 
> custom transportFactory via config.
> This task is to separate the config creation from bulkApiLogin method so as 
> it can be overridden for passing custom params like setTransport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (GOBBLIN-778) Enhance SalesforceExtractor bulkConnection config for setting transport factory

2019-05-22 Thread Monish Vachhani (JIRA)
Monish Vachhani created GOBBLIN-778:
---

 Summary: Enhance SalesforceExtractor bulkConnection config for 
setting transport factory
 Key: GOBBLIN-778
 URL: https://issues.apache.org/jira/browse/GOBBLIN-778
 Project: Apache Gobblin
  Issue Type: Task
  Components: gobblin-salesforce
Reporter: Monish Vachhani
Assignee: Hung Tran


SalesforceExtractor uses bulk connection to connect to Salesforce using bulk 
API. Since bulkConnection is private variable it cannot be modified to pass 
custom transportFactory via config.
This task is to separate the config creation from bulkApiLogin method so as it 
can be overridden for passing custom params like setTransport.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] [incubator-gobblin] mvachhani opened a new pull request #2642: GOBBLIN-778 - Moving config creation to a separate method

2019-05-22 Thread GitBox
mvachhani opened a new pull request #2642: GOBBLIN-778 - Moving config creation 
to a separate method
URL: https://github.com/apache/incubator-gobblin/pull/2642
 
 
   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I 
have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin 
JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references 
them in the PR title.
   - https://issues.apache.org/jira/browse/GOBBLIN-778
   
   
   ### Description
   - [ x] Here are some details about my PR, including screenshots (if 
applicable):
   SalesforceExtractor uses bulk connection to connect to Salesforce using bulk 
API. Since bulkConnection is private variable it cannot be modified to pass 
custom transportFactory via config.
   This task is to separate the config creation from bulkApiLogin method so as 
it can be overridden for passing custom params like setTransport.
   
   ### Tests
   - [ x] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   No new code added, this change only refactors the code into a separate 
method.
   
   ### Commits
   - [ x] My commits all reference JIRA issues in their subject lines, and I 
have squashed multiple commits if they address the same issue. In addition, my 
commits follow the guidelines from "[How to write a good git commit 
message](http://chris.beams.io/posts/git-commit/)":
   1. Subject is separated from body by a blank line
   2. Subject is limited to 50 characters
   3. Subject does not end with a period
   4. Subject uses the imperative mood ("add", not "adding")
   5. Body wraps at 72 characters
   6. Body explains "what" and "why", not "how"
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services