Hello, We don't have a conf file per say since we are building it on the fly (Since we are using embedded mode). Here is the final Configuration which is passed to the driver.
{ "GOBBLIN_WORK_DIR": "/tmp/${USER}/gobblin/work_dir", "cleanup.staging.data.per.task": false, "converter.classes": "org.apache.gobblin.converter.IdentityConverter", "data.publisher.appendExtractToFinalDir": false, "data.publisher.final.dir": "${to}", "data.publisher.metadata.output.dir": "hdfs://nodenameha/tmp/", "data.publisher.type": "org.apache.gobblin.data.management.copy.publisher.CopyDataPublisher", "distcp.persist.dir": "/tmp/distcp-persist-dir", "extract.namespace": "org.apache.gobblin.copy", "from": "hdfs://nodenameha/tmp/distcptest", "fs.uri": "hdfs://nodenameha", "gobblin.copy.recursive.delete": "true", "gobblin.copy.recursive.deleteEmptyDirectories": "true", "gobblin.copy.recursive.update": "true", "gobblin.dataset.pattern": "${from}", "gobblin.dataset.profile.class": "org.apache.gobblin.data.management.copy.CopyableGlobDatasetFinder", "gobblin.runtime.commit.sequence.store.dir": "${GOBBLIN_WORK_DIR}/commit-sequence-store", "gobblin.template.required_attributes": "from,to", "gobblin.trash.skip.trash": true, "gobblin.workDir": "${GOBBLIN_WORK_DIR}", "job.commit.parallelize": true, "job.description": "Some descriprion hh ", "job.history.store.enabled": "true", "job.history.store.jdbc.driver": "com.mysql.jdbc.Driver", "job.history.store.password": "appuser", "job.history.store.url": "jdbc:mysql://mysqlserver:3306/gobblindb?zeroDateTimeBehavior=convertToNull", "job.history.store.user": "appuser", "job.lock.enabled": false, "job.name": "distcp20", "metrics.log.dir": "${GOBBLIN_WORK_DIR}/metrics", "mr.jars.dir": "/tmp/${USER}/gobblin/_jars", "mr.job.root.dir": "/tmp/_distcp20_1518422272543", "qualitychecker.row.err.file": "${GOBBLIN_WORK_DIR}/err", "source.class": "org.apache.gobblin.data.management.copy.CopySource", "source.filebased.fs.uri": "hdfs://nodenameha", "state.store.dir": "${GOBBLIN_WORK_DIR}/state-store", "state.store.enabled": false, "state.store.fs.uri": "${fs.uri}", "task.maxretries": 0, "task.status.reportintervalinms": 5000, "taskexecutor.threadpool.size": 2, "taskretry.threadpool.coresize": 1, "taskretry.threadpool.maxsize": 2, "to": "hdfs://nodenameha/tmp/rk_bak", "workunit.retry.enabled": false, "writer.builder.class": "org.apache.gobblin.data.management.copy.writer.FileAwareInputStreamDataWriterBuilder", "writer.destination.type": "HDFS", "writer.fs.uri": "hdfs://nodenameha", "writer.output.dir": "${GOBBLIN_WORK_DIR}/task-output", "writer.output.format": "AVRO", "writer.staging.dir": "${GOBBLIN_WORK_DIR}/task-staging" } best regards Rohit. On Mon, Feb 12, 2018 at 1:09 AM, Sudarshan Vasudevan < suvasude...@linkedin.com> wrote: > Hi Rohit, > > Can you share the job config file for your distcp job? > > > > Thanks, > > Sudarshan > > > > *From: *Rohit Kalhans <rohit.kalh...@gmail.com> > *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator. > apache.org> > *Date: *Sunday, February 11, 2018 at 4:13 AM > *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator. > apache.org> > > *Subject: *Re: PriviledgedActionException while submitting a gobblin job > to mapreduce. > > > > Hello Sudarshan, et. al, > > > > Thanks for the help. Based on your response we were able to figure out the > problem and were able to move past it after adding lib to the classpath. > > Now the yarn job succeeds as per the counter/log as follows. > > > > INFO [2018-02-11 11:50:57,267] > org.apache.gobblin.runtime.TaskStateCollectorService: > Starting the TaskStateCollectorService > > INFO [2018-02-11 11:50:57,268] > org.apache.gobblin.runtime.mapreduce.MRJobLauncher: > Launching Hadoop MR job Gobblin-distcp20 > > WARN [2018-02-11 11:50:57,607] > org.apache.hadoop.mapreduce.JobResourceUploader: > Hadoop command-line option parsing not performed. Implement the Tool > interface and execute your application with ToolRunner to remedy this. > > INFO [2018-02-11 11:50:57,734] > org.apache.gobblin.runtime.mapreduce.GobblinWorkUnitsInputFormat: > Found 1 input files at hdfs://namenodeha/tmp/_distcp20_1518349854235/ > distcp20/job_distcp20_1518349854763/input: [FileStatus{path=hdfs:// > namenodeha/tmp/_distcp20_1518349854235/distcp20/job_ > distcp20_1518349854763/input/task_distcp20_1518349854763_0.wu; > isDirectory=false; length=9201; replication=3; blocksize=134217728; > modification_time=1518349857234; access_time=1518349857214; > owner=applicationetl; group=supergroup; permission=rw-r--r--; > isSymlink=false}] > > INFO [2018-02-11 11:50:57,799] org.apache.hadoop.mapreduce.JobSubmitter: > number of splits:1 > > INFO [2018-02-11 11:50:57,891] org.apache.hadoop.mapreduce.JobSubmitter: > Submitting tokens for job: job_1518179003398_40028 > > INFO [2018-02-11 11:50:58,130] > org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: > Submitted application application_1518179003398_40028 > > INFO [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: The url > to track the job: http://jobtracker.application.example.com:8088/proxy/ > application_1518179003398_40028/ > > INFO [2018-02-11 11:50:58,158] > org.apache.gobblin.runtime.mapreduce.MRJobLauncher: > Waiting for Hadoop MR job job_1518179003398_40028 to complete > > INFO [2018-02-11 11:50:58,158] org.apache.hadoop.mapreduce.Job: Running > job: job_1518179003398_40028 > > INFO [2018-02-11 11:51:04,362] org.apache.hadoop.mapreduce.Job: Job > job_1518179003398_40028 running in uber mode : false > > INFO [2018-02-11 11:51:04,363] org.apache.hadoop.mapreduce.Job: map 0% > reduce 0% > > INFO [2018-02-11 11:51:11,421] org.apache.hadoop.mapreduce.Job: map > 100% reduce 0% > > INFO [2018-02-11 11:51:12,433] org.apache.hadoop.mapreduce.Job: Job > job_1518179003398_40028 completed successfully > > INFO [2018-02-11 11:51:12,563] org.apache.hadoop.mapreduce.Job: > Counters: 30 > > File System Counters > > FILE: Number of bytes read=0 > > FILE: Number of bytes written=152940 > > FILE: Number of read operations=0 > > FILE: Number of large read operations=0 > > FILE: Number of write operations=0 > > HDFS: Number of bytes read=504209 > > HDFS: Number of bytes written=498190 > > HDFS: Number of read operations=15 > > HDFS: Number of large read operations=0 > > HDFS: Number of write operations=9 > > Job Counters > > Launched map tasks=1 > > Other local map tasks=1 > > Total time spent by all maps in occupied slots (ms)=9704 > > Total time spent by all reduces in occupied slots (ms)=0 > > Total time spent by all map tasks (ms)=4852 > > Total vcore-seconds taken by all map tasks=4852 > > Total megabyte-seconds taken by all map tasks=19873792 > > Map-Reduce Framework > > Map input records=1 > > Map output records=0 > > Input split bytes=206 > > Spilled Records=0 > > Failed Shuffles=0 > > Merged Map outputs=0 > > GC time elapsed (ms)=61 > > CPU time spent (ms)=6290 > > Physical memory (bytes) snapshot=515375104 > > Virtual memory (bytes) snapshot=5540597760 <05540%20597%20760> > > Total committed heap usage (bytes)=1500512256 <01500%20512%20256> > > File Input Format Counters > > Bytes Read=0 > > File Output Format Counters > > Bytes Written=0 > > > > > > However, it seems that the publisher does not produce any output. I am not > able to see any data in the sink folder although the job has successfully > completed. > > > > WARN [2018-02-11 11:51:12,703] > org.apache.gobblin.publisher.BaseDataPublisher: > Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data > > > > Also I can see a warning which points to an issue during mering of meta > info. > > WARN [2018-02-11 11:51:12,708] > org.apache.gobblin.publisher.BaseDataPublisher: > Metadata merger for branch 0 returned null - bug in merger? > > INFO [2018-02-11 11:51:12,708] > org.apache.gobblin.publisher.BaseDataPublisher: > Metadata output path not set for branch 0, not publishing. > > But this seems to be harmless. > > > > INFO [2018-02-11 11:51:12,659] > org.apache.gobblin.runtime.TaskStateCollectorService: > Collected task state of 1 completed tasks > > INFO [2018-02-11 11:51:12,660] org.apache.gobblin.runtime.JobContext: 1 > more tasks of job job_distcp20_1518349854763 have completed > > INFO [2018-02-11 11:51:12,665] > org.apache.gobblin.runtime.mapreduce.MRJobLauncher: > Deleted working directory /tmp/_distcp20_1518349854235/ > distcp20/job_distcp20_1518349854763 > > INFO [2018-02-11 11:51:12,670] > org.apache.gobblin.runtime.AbstractJobLauncher: > Persisting dataset urns. > > INFO [2018-02-11 11:51:12,680] org.apache.gobblin.runtime.SafeDatasetCommit: > Committing dataset CopyEntity.DatasetAndPartition(dataset= > CopyableDatasetMetadata(datasetURN=/tmp/distcptest), > partition=/tmp/distcptest) of job job_distcp20_1518349854763 with commit > policy COMMIT_ON_FULL_SUCCESS and state SUCCESSFUL > > INFO [2018-02-11 11:51:12,701] > org.apache.gobblin.publisher.BaseDataPublisher: > Retry disabled for publish. > > WARN [2018-02-11 11:51:12,701] org.apache.gobblin.runtime.SafeDatasetCommit: > Gobblin is set up to parallelize publishing, however the publisher > org.apache.gobblin.publisher.BaseDataPublisher is not thread-safe. > Falling back to serial publishing. > > WARN [2018-02-11 11:51:12,703] > org.apache.gobblin.publisher.BaseDataPublisher: > Branch 0 of WorkUnit task_distcp20_1518349854763_0 produced no data > > INFO [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: > Attempting to shutdown ExecutorService: com.google.common.util. > concurrent.MoreExecutors$ListeningDecorator@a195448 > > INFO [2018-02-11 11:51:12,703] org.apache.gobblin.util.ParallelRunner: > Successfully shutdown ExecutorService: com.google.common.util. > concurrent.MoreExecutors$ListeningDecorator@a195448 > > WARN [2018-02-11 11:51:12,708] > org.apache.gobblin.publisher.BaseDataPublisher: > Metadata merger for branch 0 returned null - bug in merger? > > INFO [2018-02-11 11:51:12,708] > org.apache.gobblin.publisher.BaseDataPublisher: > Metadata output path not set for branch 0, not publishing. > > INFO [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: > Submitted 1 lineage events for dataset CopyEntity. > DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), > partition=/tmp/distcptest) > > INFO [2018-02-11 11:51:12,711] org.apache.gobblin.runtime.SafeDatasetCommit: > Persisting dataset state for dataset CopyEntity. > DatasetAndPartition(dataset=CopyableDatasetMetadata(datasetURN=/tmp/distcptest), > partition=/tmp/distcptest) > > INFO [2018-02-11 11:51:12,711] > org.apache.gobblin.util.executors.IteratorExecutor: > Attempting to shutdown ExecutorService: com.google.common.util. > concurrent.MoreExecutors$ListeningDecorator@b4864a4 > > INFO [2018-02-11 11:51:12,711] > org.apache.gobblin.util.executors.IteratorExecutor: > Successfully shutdown ExecutorService: com.google.common.util. > concurrent.MoreExecutors$ListeningDecorator@b4864a4 > > INFO [2018-02-11 11:51:12,738] > org.apache.gobblin.runtime.AbstractJobLauncher: > Cleaning up staging directory /gobblin/task-staging/distcp20/job_distcp20_ > 1518349854763 > > INFO [2018-02-11 11:51:12,743] > org.apache.gobblin.runtime.AbstractJobLauncher: > Deleting directory /gobblin/task-staging/distcp20 > > INFO [2018-02-11 11:51:12,746] > org.apache.gobblin.runtime.AbstractJobLauncher: > Cleaning up output directory /gobblin/task-output/distcp20/ > job_distcp20_1518349854763 > > INFO [2018-02-11 11:51:12,751] > org.apache.gobblin.runtime.AbstractJobLauncher: > Deleting directory /gobblin/task-output/distcp20 > > INFO [2018-02-11 11:51:12,757] com.example.applications.test. > executor.jobs.testGobblinRunner.distcp20/1: jobCompletion: > JobContext{jobName=distcp20, jobId=job_distcp20_1518349854763, jobState={ > > "job name": "distcp20", > > "job id": "job_distcp20_1518349854763", > > "job state": "COMMITTED", > > "start time": 1518349855793, > > "end time": 1518349872716, > > "duration": 16923, > > "tasks": 1, > > "completed tasks": 1, > > "task states": [ > > { > > "task id": "task_distcp20_1518349854763_0", > > "task state": "COMMITTED", > > "start time": 1518349869446, > > "end time": 1518349869981, > > "duration": 535, > > "retry count": 0 > > } > > ] > > }} > > > > Thanks for all the help. > > > > Best regards > > Rohit. > > > > > > ---------- Forwarded message ---------- > From: *Sudarshan Vasudevan* <suvasude...@linkedin.com> > Date: Thu, Feb 8, 2018 at 3:08 AM > Subject: Re: PriviledgedActionException while submitting a gobblin job to > mapreduce. > To: "user@gobblin.incubator.apache.org" <user@gobblin.incubator.apache.org > > > > Hi Rohit, > > Your yarn.application.classpath is missing the following: > > $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_ > MAPRED_HOME/share/hadoop/mapreduce/lib/* > > > > I think this is a hunch, but the JobClient inside the yarn application is > not finding the hadoop-mapreduce-client-jobclient-2.3.0.jar, which has > the YarnClientProtocolProvider class and is defaulting to > LocalClientProtocolProvider and hence unable to initiate a connection to > your YARN cluster. The above jar is typically located under > $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*. > > > > Can you add the above to your yarn-site.xml, restart yarn and give it a go? > > > > Thanks, > > Sudarshan > > > > *From: *Rohit Kalhans <rohit.kalh...@gmail.com> > *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator. > apache.org> > *Date: *Wednesday, February 7, 2018 at 1:02 PM > *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator. > apache.org> > *Subject: *Re: PriviledgedActionException while submitting a gobblin job > to mapreduce. > > > > > > hello all, > > > > First of all, thanks for the quick rtt. really appreciate the help. > > > > The environment variables have been set correctly(atleast that's what i > think. ). i am running this on a feeder box (gateway) of a cdh 5.7 cluster > managed by cloudera manager. > > > > the yarn-site.xml contains the following > > > > <property> > > <name>yarn.application.classpath</name> > > <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_ > COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/* > ,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* > </value> > > </property> > > > > Before the execution of my application I call the following. > > > > export HADOOP_PREFIX="/opt/cloudera/parcels/CDH/" > > export HADOOP_HOME=$HADOOP_PREFIX > > export HADOOP_COMMON_HOME=$HADOOP_PREFIX > > export HADOOP_CONF_DIR=HADOOP_PREFIX/etc/hadoop/ > > export HADOOP_HDFS_HOME=$HADOOP_PREFIX > > export HADOOP_CLIENT_CONF_DIR="/etc/hadoop/conf" > > export HADOOP_MAPRED_HOME=$HADOOP_PREFIX > > export HADOOP_YARN_HOME=$HADOOP_PREFIX > > export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin > > source /etc/hadoop/conf/hadoop-env.sh > > > > The hadoop-env.sh sets a few variables as well. > > > > > > $>_ cat /etc/hadoop/conf/hadoop-env.sh > > > > # Prepend/Append plugin parcel classpaths > > > > if [ "$HADOOP_USER_CLASSPATH_FIRST" = 'true' ]; then > > # HADOOP_CLASSPATH={{HADOOP_CLASSPATH_APPEND}} > > : > > else > > # HADOOP_CLASSPATH={{HADOOP_CLASSPATH}} > > : > > fi > > # JAVA_LIBRARY_PATH={{JAVA_LIBRARY_PATH}} > > > > export HADOOP_MAPRED_HOME=$( ([[ ! > '/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce' > =~ CDH_MR2_HOME ]] && echo /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce > ) || echo ${CDH_MR2_HOME:-/usr/lib/hadoop-mapreduce/} ) > > export HADOOP_CLIENT_OPTS="-Xmx268435456 $HADOOP_CLIENT_OPTS" > > export HADOOP_CLIENT_OPTS="-Djava.net.preferIPv4Stack=true > $HADOOP_CLIENT_OPTS" > > export YARN_OPTS="-Xmx825955249 -Djava.net.preferIPv4Stack=true > $YARN_OPTS" > > > > > > On Thu, Feb 8, 2018 at 12:48 AM, Sudarshan Vasudevan < > suvasude...@linkedin.com> wrote: > > Hi Rohit, > > Can you share the properties in your yarn-site.xml file? > > > > The following is an example config that worked for me: > > I set the yarn.application.classpath in yarn-site.xml to the following: > > <property> > > <description>Classpath for typical applications.</description> > > <name>yarn.application.classpath</name> > > <value> > > $HADOOP_CONF_DIR, > > $HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_ > COMMON_HOME/share/hadoop/common/lib/*, > > $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_ > HOME/share/hadoop/hdfs/lib/*, > > $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,$HADOOP_ > MAPRED_HOME/share/hadoop/mapreduce/lib/*, > > $HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_ > HOME/share/hadoop/yarn/lib/* > > </value> > > </property> > > > > In my local Hadoop installation, I set the HADOOP_* environment variables > as follows: > > export HADOOP_PREFIX="/usr/local/hadoop-2.3.0" > > export HADOOP_HOME=$HADOOP_PREFIX > > export HADOOP_COMMON_HOME=$HADOOP_PREFIX > > export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop > > export HADOOP_HDFS_HOME=$HADOOP_PREFIX > > export HADOOP_MAPRED_HOME=$HADOOP_PREFIX > > export HADOOP_YARN_HOME=$HADOOP_PREFIX > > export HADOOP_BIN_DIR=$HADOOP_PREFIX/bin > > > > > > Hope this helps, > > Sudarshan > > > > *From: *Rohit Kalhans <rohit.kalh...@gmail.com> > *Reply-To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator. > apache.org> > *Date: *Wednesday, February 7, 2018 at 10:57 AM > *To: *"user@gobblin.incubator.apache.org" <user@gobblin.incubator. > apache.org> > *Subject: *PriviledgedActionException while submitting a gobblin job to > mapreduce. > > > > Hello > > I am integrating gobblin in embedded mode with an existing application. > While submitting the job it seems like there is a unresolved > dependency/requirement to mapreduce launcher. > > > > I have checked that mapreduce.framework.name is set to yarn and the > other yarn application are running fine. Somehow I keep hitting the issue > with the gobblin mr job launcher. > > I was hoping that you guys can help me setting up Gobblin in embedded mode > for my application. > > > > Here is the stack. Do let me know if some other info is needed. > > > > > > Launching Hadoop MR job Gobblin-test9 > WARN [2018-02-07 11:43:22,990] > org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:<userName> (auth:SIMPLE) > cause:java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > INFO [2018-02-07 11:43:22,991] > org.apache.gobblin.runtime.TaskStateCollectorService: > Stopping the TaskStateCollectorService > INFO [2018-02-07 11:43:23,033] > org.apache.gobblin.runtime.mapreduce.MRJobLauncher: > Deleted working directory /tmp/_test9_1518003781707/ > test9/job_test9_1518003782322 > ERROR [2018-02-07 11:43:23,033] > org.apache.gobblin.runtime.AbstractJobLauncher: > Failed to launch and run job job_test9_1518003782322: java.io.IOException: > Cannot initialize Cluster. Please check your conf > iguration for mapreduce.framework.name and the correspond server > addresses. > ! java.io.IOException: Cannot initialize Cluster. Please check your > configuration for mapreduce.framework.name and the correspond server > addresses. > ! at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120) > ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82) > ! at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75) > ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1277) > ! at org.apache.hadoop.mapreduce.Job$9.run(Job.java:1273) > ! at java.security.AccessController.doPrivileged(Native Method) > ! at javax.security.auth.Subject.doAs(Subject.java:422) > ! at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1693) > ! at org.apache.hadoop.mapreduce.Job.connect(Job.java:1272) > ! at org.apache.hadoop.mapreduce.Job.submit(Job.java:1301) > ! at org.apache.gobblin.runtime.mapreduce.MRJobLauncher. > runWorkUnits(MRJobLauncher.java:244) > ! at org.apache.gobblin.runtime.AbstractJobLauncher.runWorkUnitStream( > AbstractJobLauncher.java:596) > ! at org.apache.gobblin.runtime.AbstractJobLauncher.launchJob( > AbstractJobLauncher.java:443) > ! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$ > DriverRunnable.call(JobLauncherExecutionDriver.java:159) > ! at org.apache.gobblin.runtime.job_exec.JobLauncherExecutionDriver$ > DriverRunnable.call(JobLauncherExecutionDriver.java:147) > ! at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ! at java.lang.Thread.run(Thread.java:745) > > > > -- > > Cheerio! > > *Rohit* > > > > > > -- > > Cheerio! > > *Rohit* > > > > > > -- > > Cheerio! > > *Rohit* > -- Cheerio! *Rohit*