Re: Spark Druid Ingestion

Jorge Machado Thu, 22 Mar 2018 00:25:39 -0700

Seems to me permissions problems  ! Can you check your user / folder 
permissions ?


Jorge Machado





> On 22 Mar 2018, at 08:21, nayan sharma <nayansharm...@gmail.com> wrote:
> 
> Hi All,
> As druid uses Hadoop MapReduce to ingest batch data but I am trying spark for 
> ingesting data into druid taking reference from 
> https://github.com/metamx/druid-spark-batch 
> <https://github.com/metamx/druid-spark-batch>
> But we are stuck at the following error.
> Application Log:—>
> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Will allocate AM container, with 896 MB 
> memory including 384 MB overhead
> 2018-03-20T07:54:28,782 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Setting up container launch context for 
> our AM
> 2018-03-20T07:54:28,785 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Setting up the launch environment for 
> our AM container
> 2018-03-20T07:54:28,793 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Preparing resources for our AM container
> 2018-03-20T07:54:29,364 WARN [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Neither spark.yarn.jars nor 
> spark.yarn.archive is set, falling back to uploading libraries under 
> SPARK_HOME.
> 2018-03-20T07:54:29,371 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Uploading resource 
> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_libs__8247917347016008883.zip
>  -> 
> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip
>  
> <hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_libs__8247917347016008883.zip>
> 2018-03-20T07:54:29,607 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Uploading resource 
> file:/hdfs1/druid-0.11.0/var/tmp/spark-49af67df-1a21-4790-a02b-c737c7a44946/__spark_conf__2240950972346324291.zip
>  -> 
> hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip
>  
> <hdfs://n2pl-pa-hdn220.xxx.xxx:8020/user/yarn/.sparkStaging/application_1521457397747_0013/__spark_conf__.zip>
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing view acls to: yarn
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing modify acls to: yarn
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing view acls groups to: 
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - Changing modify acls groups to: 
> 2018-03-20T07:54:29,673 INFO [task-runner-0-priority-0] 
> org.apache.spark.SecurityManager - SecurityManager: authentication disabled; 
> ui acls disabled; users  with view permissions: Set(yarn); groups with view 
> permissions: Set(); users  with modify permissions: Set(yarn); groups with 
> modify permissions: Set()
> 2018-03-20T07:54:29,679 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Submitting application 
> application_1521457397747_0013 to ResourceManager
> 2018-03-20T07:54:29,709 INFO [task-runner-0-priority-0] 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application 
> application_1521457397747_0013
> 2018-03-20T07:54:29,713 INFO [task-runner-0-priority-0] 
> org.apache.spark.scheduler.cluster.SchedulerExtensionServices - Starting Yarn 
> extension services with app application_1521457397747_0013 and attemptId None
> 2018-03-20T07:54:30,722 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - Application report for 
> application_1521457397747_0013 (state: FAILED)
> 2018-03-20T07:54:30,729 INFO [task-runner-0-priority-0] 
> org.apache.spark.deploy.yarn.Client - 
>        client token: N/A
>        diagnostics: Application application_1521457397747_0013 failed 2 times 
> due to AM Container for appattempt_1521457397747_0013_000002 exited with  
> exitCode: -1000
> For more detailed output, check the application tracking page: 
> http://n-pa-hdn220.xxx.xxxx:8088/cluster/app/application_1521457397747_0013 
> <http://n-pa-hdn220.xxx.xxxx:8088/cluster/app/application_1521457397747_0013> 
> Then click on links to logs of each attempt.
> Diagnostics: No such file or directory
> ENOENT: No such file or directory
>       at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmodImpl(Native Method)
>       at org.apache.hadoop.io.nativeio.NativeIO$POSIX.chmod(NativeIO.java:230)
>       at 
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:756)
>       at 
> org.apache.hadoop.fs.DelegateToFileSystem.setPermission(DelegateToFileSystem.java:211)
>       at org.apache.hadoop.fs.FilterFs.setPermission(FilterFs.java:252)
>       at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:1003)
>       at org.apache.hadoop.fs.FileContext$11.next(FileContext.java:999)
>       at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
>       at org.apache.hadoop.fs.FileContext.setPermission(FileContext.java:1006)
>       at org.apache.hadoop.yarn.util.FSDownload$3.run(FSDownload.java:421)
>       at org.apache.hadoop.yarn.util.FSDownload$3.run(FSDownload.java:419)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>       at 
> org.apache.hadoop.yarn.util.FSDownload.changePermissions(FSDownload.java:419)
>       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:365)
>       at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> 
> 
> As far as I can understand there is something wrong with the job submission 
> through Yarn.
> 
> On local machine it is running but HDP cluster it is giving error.
> 
> 
> <yarnlogs.txt>
> 
> Thanks,
> Nayan
> 
> 
> 
>

Re: Spark Druid Ingestion

Reply via email to