Hi Gabe: You can try to configure 'kylin.query.spark-conf.spark.yarn.stagingDir' in kylin.properties to make this configuration take effect in kylin.
> 在 2021年9月13日,下午9:56,Michael, Gabe <[email protected]> 写道: > > Thank you for your reply. > > HADOOP_CONF_DIR is set correctly to /usr/local/kylin/hadoop_conf > fs.defaultFS in /usr/local/kylin/hadoop_conf/core-site.xml is set to hdfs:// > <hdfs://xxxxx:8020>xxxxx <hdfs://xxxxx:8020>:8020 <hdfs://xxxxx:8020> (domain > name omitted) > > I also tested submitting a simple Spark app from the command line with > spark-submit, and it succeeds. > According to the log messages it is uploading the files to HDFS when I submit > directly from spark-submit: > > 21/09/13 13:49:19 INFO Client: Preparing resources for our AM container > 21/09/13 13:49:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive > is set, falling back to uploading libraries under SPARK_HOME. > 21/09/13 13:49:23 INFO Client: Uploading resource > file:/mnt/tmp/spark-7256648b-ffe0-4455-8a80-d56f1a7fd707/__spark_libs__3285017367714177339.zip > -> hdfs:// > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip>xxxxx > > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip>:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip > > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip> > 21/09/13 13:49:25 INFO Client: Uploading resource > file:/usr/local/kylin/spark/python/lib/pyspark.zip -> hdfs:// > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip>xxxxx > > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip>:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip > > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip> > 21/09/13 13:49:25 INFO Client: Uploading resource > file:/usr/local/kylin/spark/python/lib/py4j-0.10.9-src.zip -> hdfs:// > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip>xxxxx > > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip>:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip > > <hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip> > 21/09/13 13:49:25 INFO Client: Uploading resource > file:/mnt/tmp/spark-7256648b-ffe0-4455-8a80-d56f1a7fd707/__spark_conf__6717448128964414860.zip > -> hdfs:// > <hdfs://xxxxx/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip>xxxxx > > <hdfs://xxxxx/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip>/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip > > <hdfs://xxxxx/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip> > > However I can reproduce the same problem I encounter with Kylin by setting > the spark.yarn.stagingDir configuration: > > spark-submit --master yarn --conf spark.yarn.stagingDir=file:///home/hadoop > <file:///home/hadoop> --deploy-mode client /home/hadoop/foo.py > > It will try to upload to a local destination > “file:/home/hadoop/.sparkStaging/application_1631282030708_2945/…” and the > application will fail. > > I am able to set spark.yarn.stagingDir to an HDFS location in > /usr/local/kylin/spark/conf/spark-defaults.conf and spark-submit succeeds. > > However it seems Kylin ignores values set in spark.yarn.stagingDir? > > If I am able to set spark.yarn.stagingDir correctly I think it would work. > > Thank you for your assistance, > > Gabe > > De : Yaqian Zhang <[email protected] <mailto:[email protected]>> > Date : dimanche, 12 septembre 2021 à 22:45 > À : [email protected] <mailto:[email protected]> > <[email protected] <mailto:[email protected]>> > Objet : Re: Kylin v4.0.0 GA on EMR 6.3.0 fail to start Sparder due to YARN > staging files missing > > Hi: > I noticed this in your kylin.log: > > “Uploading resource > file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_libs__7584573487901234438.zip > -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip > 2021-09-10 18:45:51,487 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/lib/kylin-parquet-job-4.0.0.jar -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/kylin-parquet-job-4.0.0.jar > 2021-09-10 18:45:51,597 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/conf/spark-executor-log4j.properties -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/spark-executor-log4j.properties > 2021-09-10 18:45:51,718 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_conf__5546014978595262008.zip > -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_conf__.zip” > > This does not seem normal. According to the process of submitting spark > application, it needs to upload these libs to HDFS or S3, but it is obvious > that the path here shows that these libs have been uploaded to the local > directory of the driver running node, so that other nodes cannot find the > path. > > I'm not sure what caused these libs not to be uploaded to the correct path, > but you can check whether this configuration ‘HADOOP_CONF_DIR' exists in the > front page of kylin, as shown in the following figure: > <image001.png> > If so, you can check whether 'fs.defaultFS' in core-site.xml under this path > is configured to the correct directory. > > By the way, the configuration > 'kylin.query.spark-conf.spark.executor.extraJavaOptions' in kylin.properties > does not need to be manually modified by the user, kylin will automatically > configure those variables at runtime. > > > 在 2021年9月11日,上午2:57,Michael, Gabe <[email protected] > <mailto:[email protected]>> 写道: > > Hello, > > When running Kylin 4.0.0 on AWS EMR 6.3.0, I am able to successfully build a > cube. > > But when I try to query it, the Sparder application cannot start. > > Kylin attempts to upload some files to a local directory, then the Spark job > fails because it cannot read files from that directory. > > 2021-09-10 18:45:47,407 INFO [Thread-9] yarn.Client:57 : Preparing resources > for our AM container > 2021-09-10 18:45:47,428 WARN [Thread-9] yarn.Client:69 : Neither > spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading > libraries under SPARK_HOME. > 2021-09-10 18:45:50,861 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_libs__7584573487901234438.zip > -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip > 2021-09-10 18:45:51,487 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/lib/kylin-parquet-job-4.0.0.jar -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/kylin-parquet-job-4.0.0.jar > 2021-09-10 18:45:51,597 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/conf/spark-executor-log4j.properties -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/spark-executor-log4j.properties > 2021-09-10 18:45:51,718 INFO [Thread-9] yarn.Client:57 : Uploading resource > file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_conf__5546014978595262008.zip > -> > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_conf__.zip > 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing > view acls to: hadoop > 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing > modify acls to: hadoop > 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing > view acls groups to: > 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing > modify acls groups to: > 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : > SecurityManager: authentication disabled; ui acls disabled; users with view > permissions: Set(hadoop); groups with view permissions: Set(); users with > modify permissions: Set(hadoop); groups with modify permissions: Set() > 2021-09-10 18:45:51,814 INFO [Thread-9] yarn.Client:57 : Submitting > application application_1631282030708_2863 to ResourceManager > 2021-09-10 18:45:51,861 INFO [Thread-9] impl.YarnClientImpl:329 : Submitted > application application_1631282030708_2863 > 2021-09-10 18:45:52,863 INFO [Thread-9] yarn.Client:57 : Application report > for application_1631282030708_2863 (state: FAILED) > 2021-09-10 18:45:52,866 INFO [Thread-9] yarn.Client:57 : > client token: N/A > diagnostics: Application application_1631282030708_2863 failed 2 times > due to AM Container for appattempt_1631282030708_2863_000002 exited with > exitCode: -1000 > Failing this attempt.Diagnostics: [2021-09-10 18:45:52.033]File > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip > does not exist > java.io.FileNotFoundException: File > file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip > does not exist > at > org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:671) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:992) > at > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:661) > at > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:464) > at > org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269) > at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:243) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:236) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:224) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > > For more detailed output, check the application tracking page: > http://ip-10-240-102-189.bamtech.test.us-east-1.bamgrid.net:8088/cluster/app/application_1631282030708_2863 > > <http://ip-10-240-102-189.bamtech.test.us-east-1.bamgrid.net:8088/cluster/app/application_1631282030708_2863> > Then click on links to logs of each attempt. > . Failing the application. > ApplicationMaster host: N/A > ApplicationMaster RPC port: -1 > queue: default > start time: 1631299551829 > final status: FAILED > tracking URL: > http://ip-10-240-102-189.bamtech.test.us-east-1.bamgrid.net:8088/cluster/app/application_1631282030708_2863 > > <http://ip-10-240-102-189.bamtech.test.us-east-1.bamgrid.net:8088/cluster/app/application_1631282030708_2863> > user: hadoop > 2021-09-10 18:45:52,941 INFO [Thread-9] yarn.Client:57 : Deleted staging > directory file:/home/hadoop/.sparkStaging/application_1631282030708_2863 > 2021-09-10 18:45:52,942 ERROR [Thread-9] > cluster.YarnClientSchedulerBackend:73 : The YARN application has already > ended! It might have been killed or the Application Master may have failed to > start. Check the YARN application logs for more details. > 2021-09-10 18:45:52,943 ERROR [Thread-9] spark.SparkContext:94 : Error > initializing SparkContext. > > Here are my kylin.properties with irrelevant/sensitive values removed: > > kylin.env.hdfs-working-dir=s3a:// <s3a://XXXXX/qa/kylin/hdfs/>XXXXX > <s3a://XXXXX/qa/kylin/hdfs/>/qa/kylin/hdfs/ <s3a://XXXXX/qa/kylin/hdfs/> > kylin.env=QA > kylin.server.mode=all > kylin.server.cluster-servers=localhost:7070 > kylin.engine.default=6 > kylin.storage.default=4 > kylin.server.external-acl-provider= > kylin.source.hive.database-for-flat-table=default > kylin.web.default-time-filter=1 > kylin.storage.clean-after-delete-operation=false > kylin.job.retry=1 > kylin.job.max-concurrent-jobs=1 > kylin.job.sampling-percentage=100 > kylin.job.scheduler.provider.100=org.apache.kylin.job.impl.curator.CuratorScheduler > kylin.job.scheduler.default=2 > kylin.spark-conf.auto.prior=true > kylin.engine.spark-conf.spark.master=yarn > kylin.engine.spark-conf.spark.submit.deployMode=client > kylin.engine.spark-conf.spark.yarn.queue=default > kylin.engine.spark-conf.spark.eventLog.enabled=true > kylin.engine.spark-conf.spark.eventLog.dir=hdfs:///kylin/spark-history > <hdfs://kylin/spark-history> > kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs:///kylin/spark-history > <hdfs://kylin/spark-history> > kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false > kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 > -Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties > -Dlog4j.debug -Dkylin.hdfs.working.dir=${hdfs.working.dir} > -Dkylin.metadata.identifier=kylin_metadata -Dkylin.spark.category=job > -Dkylin.spark.project=${job.project} -Dkylin.spark.identifier=${job.id > <https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjob.id%2F&data=04%7C01%7CGabe.Michael%40disneystreaming.com%7C77de7c2283aa466d15fc08d9766074af%7C65f03ca86d0a493e9e4ac85ac9526a03%7C1%7C0%7C637670979514787428%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=GHWXzC1cClz06wfF2nFYMaGKuLlVo69yFsmFi%2BVwIi0%3D&reserved=0>} > -Dkylin.spark.jobName=${job.stepId} -Duser.timezone=${user.timezone} > kylin.engine.spark-conf.spark.driver.extraJavaOptions=-XX:+CrashOnOutOfMemoryError > kylin.query.auto-sparder-context-enabled-enabled=false > kylin.query.spark-conf.spark.master=yarn > kylin.query.spark-conf.spark.driver.cores=1 > kylin.query.spark-conf.spark.driver.memory=4G > kylin.query.spark-conf.spark.driver.memoryOverhead=1G > kylin.query.spark-conf.spark.executor.cores=1 > kylin.query.spark-conf.spark.executor.instances=1 > kylin.query.spark-conf.spark.executor.memory=4G > kylin.query.spark-conf.spark.executor.memoryOverhead=1G > kylin.query.spark-conf.spark.serializer=org.apache.spark.serializer.JavaSerializer > kylin.query.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false > kylin.query.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current > -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug > -Dkylin.hdfs.working.dir=s3a://dataeng-data-test/qa/kylin/hdfs/ > <s3a://dataeng-data-test/qa/kylin/hdfs/> > -Dkylin.metadata.identifier=kylin_metadata -Dkylin.spark.category=sparder > -Dkylin.spark.identifier={{APP_ID}} > kylin.source.hive.redistribute-flat-table=false > kylin.metadata.jdbc.dialect=mysql > kylin.metadata.jdbc.json-always-small-cell=true > kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperDistributedJobLock > kylin.web.set-config-enable=true > kylin.job.allow-empty-segment=false > kylin.env.hadoop-conf-dir=/etc/hadoop/conf > kylin.query.lazy-query-enabled=true > kylin.query.cache-signature-enabled=true > kylin.query.segment-cache-enabled=false > kylin.engine.spark-fact-distinct=true > kylin.engine.spark-dimension-dictionary=false > kylin.engine.spark-uhc-dictionary=true > kylin.engine.spark.rdd-partition-cut-mb=10 > kylin.engine.spark.min-partition=1 > kylin.engine.spark.max-partition=5000 > kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true > kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1 > kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000 > kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300 > kylin.engine.spark-conf.spark.driver.memory=2G > kylin.engine.spark-conf.spark.executor.memory=4G > kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024 > kylin.engine.spark-conf.spark.executor.cores=1 > kylin.engine.spark-conf.spark.network.timeout=600 > kylin.engine.spark-conf.spark.shuffle.service.enabled=true > kylin.engine.spark-conf.spark.hadoop.dfs.replication=2 > kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true > kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec > kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec > kylin.engine.spark-conf-mergedict.spark.executor.memory=6G > kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2 > kylin.engine.spark-conf.spark.sql.hive.metastore.version=3.1.2 > kylin.engine.spark-conf.spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/lib/* > kylin.query.spark-conf.spark.sql.hive.metastore.version=3.1.2 > kylin.query.spark-conf.spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/lib/* > kylin.server.cluster-name=kylin_metadata > kylin.log.spark-executor-properties-file=/usr/local/kylin/conf/spark-executor-log4j.properties > kylin.metadata.url.identifier=kylin_metadata > > Thank you for your assistance, > > Gabe > > -- > Gabe Michael > Principal Data Engineer > Disney Streaming Services >
