Yaqian Zhang thank you for the suggestion, I configured "kylin.query.spark-conf.spark.yarn.stagingDir=hdfs://my-cluster-hostname:8020/tmp/spark-staging" (I also created this directory on HDFS first with "hdfs dfs -mkdir -p /tmp/spark-staging") and now the file uploads are going to HDFS, the Sparder Spark job runs successfully and I receive query results!
De : Yaqian Zhang <[email protected]> Date : lundi, 13 septembre 2021 à 22:43 À : [email protected] <[email protected]> Objet : Re: Kylin v4.0.0 GA on EMR 6.3.0 fail to start Sparder due to YARN staging files missing Hi Gabe: You can try to configure 'kylin.query.spark-conf.spark.yarn.stagingDir' in kylin.properties to make this configuration take effect in kylin. 在 2021年9月13日,下午9:56,Michael, Gabe <[email protected]<mailto:[email protected]>> 写道: Thank you for your reply. HADOOP_CONF_DIR is set correctly to /usr/local/kylin/hadoop_conf fs.defaultFS in /usr/local/kylin/hadoop_conf/core-site.xml is set to hdfs://<hdfs://xxxxx:8020>xxxxx<hdfs://xxxxx:8020>:8020<hdfs://xxxxx:8020> (domain name omitted) I also tested submitting a simple Spark app from the command line with spark-submit, and it succeeds. According to the log messages it is uploading the files to HDFS when I submit directly from spark-submit: 21/09/13 13:49:19 INFO Client: Preparing resources for our AM container 21/09/13 13:49:19 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 21/09/13 13:49:23 INFO Client: Uploading resource file:/mnt/tmp/spark-7256648b-ffe0-4455-8a80-d56f1a7fd707/__spark_libs__3285017367714177339.zip -> hdfs://<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip>xxxxx<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip>:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_libs__3285017367714177339.zip> 21/09/13 13:49:25 INFO Client: Uploading resource file:/usr/local/kylin/spark/python/lib/pyspark.zip -> hdfs://<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip>xxxxx<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip>:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/pyspark.zip> 21/09/13 13:49:25 INFO Client: Uploading resource file:/usr/local/kylin/spark/python/lib/py4j-0.10.9-src.zip -> hdfs://<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip>xxxxx<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip>:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip<hdfs://xxxxx:8020/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/py4j-0.10.9-src.zip> 21/09/13 13:49:25 INFO Client: Uploading resource file:/mnt/tmp/spark-7256648b-ffe0-4455-8a80-d56f1a7fd707/__spark_conf__6717448128964414860.zip -> hdfs://<hdfs://xxxxx/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip>xxxxx<hdfs://xxxxx/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip>/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip<hdfs://xxxxx/tmp/spark-staging/hadoop/.sparkStaging/application_1631282030708_2987/__spark_conf__.zip> However I can reproduce the same problem I encounter with Kylin by setting the spark.yarn.stagingDir configuration: spark-submit --master yarn --conf spark.yarn.stagingDir=file:///home/hadoop --deploy-mode client /home/hadoop/foo.py It will try to upload to a local destination “file:/home/hadoop/.sparkStaging/application_1631282030708_2945/…” and the application will fail. I am able to set spark.yarn.stagingDir to an HDFS location in /usr/local/kylin/spark/conf/spark-defaults.conf and spark-submit succeeds. However it seems Kylin ignores values set in spark.yarn.stagingDir? If I am able to set spark.yarn.stagingDir correctly I think it would work. Thank you for your assistance, Gabe De : Yaqian Zhang <[email protected]<mailto:[email protected]>> Date : dimanche, 12 septembre 2021 à 22:45 À : [email protected]<mailto:[email protected]> <[email protected]<mailto:[email protected]>> Objet : Re: Kylin v4.0.0 GA on EMR 6.3.0 fail to start Sparder due to YARN staging files missing Hi: I noticed this in your kylin.log: “Uploading resource file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_libs__7584573487901234438.zip -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip 2021-09-10 18:45:51,487 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/lib/kylin-parquet-job-4.0.0.jar -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/kylin-parquet-job-4.0.0.jar 2021-09-10 18:45:51,597 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/conf/spark-executor-log4j.properties -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/spark-executor-log4j.properties 2021-09-10 18:45:51,718 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_conf__5546014978595262008.zip -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_conf__.zip” This does not seem normal. According to the process of submitting spark application, it needs to upload these libs to HDFS or S3, but it is obvious that the path here shows that these libs have been uploaded to the local directory of the driver running node, so that other nodes cannot find the path. I'm not sure what caused these libs not to be uploaded to the correct path, but you can check whether this configuration ‘HADOOP_CONF_DIR' exists in the front page of kylin, as shown in the following figure: <image001.png> If so, you can check whether 'fs.defaultFS' in core-site.xml under this path is configured to the correct directory. By the way, the configuration 'kylin.query.spark-conf.spark.executor.extraJavaOptions' in kylin.properties does not need to be manually modified by the user, kylin will automatically configure those variables at runtime. 在 2021年9月11日,上午2:57,Michael, Gabe <[email protected]<mailto:[email protected]>> 写道: Hello, When running Kylin 4.0.0 on AWS EMR 6.3.0, I am able to successfully build a cube. But when I try to query it, the Sparder application cannot start. Kylin attempts to upload some files to a local directory, then the Spark job fails because it cannot read files from that directory. 2021-09-10 18:45:47,407 INFO [Thread-9] yarn.Client:57 : Preparing resources for our AM container 2021-09-10 18:45:47,428 WARN [Thread-9] yarn.Client:69 : Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 2021-09-10 18:45:50,861 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_libs__7584573487901234438.zip -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip 2021-09-10 18:45:51,487 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/lib/kylin-parquet-job-4.0.0.jar -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/kylin-parquet-job-4.0.0.jar 2021-09-10 18:45:51,597 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/conf/spark-executor-log4j.properties -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/spark-executor-log4j.properties 2021-09-10 18:45:51,718 INFO [Thread-9] yarn.Client:57 : Uploading resource file:/usr/local/kylin/tomcat/temp/spark-8ec4dae7-5f3c-477e-bda3-4c4f00978586/__spark_conf__5546014978595262008.zip -> file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_conf__.zip 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing view acls to: hadoop 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing modify acls to: hadoop 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing view acls groups to: 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : Changing modify acls groups to: 2021-09-10 18:45:51,780 INFO [Thread-9] spark.SecurityManager:57 : SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 2021-09-10 18:45:51,814 INFO [Thread-9] yarn.Client:57 : Submitting application application_1631282030708_2863 to ResourceManager 2021-09-10 18:45:51,861 INFO [Thread-9] impl.YarnClientImpl:329 : Submitted application application_1631282030708_2863 2021-09-10 18:45:52,863 INFO [Thread-9] yarn.Client:57 : Application report for application_1631282030708_2863 (state: FAILED) 2021-09-10 18:45:52,866 INFO [Thread-9] yarn.Client:57 : client token: N/A diagnostics: Application application_1631282030708_2863 failed 2 times due to AM Container for appattempt_1631282030708_2863_000002 exited with exitCode: -1000 Failing this attempt.Diagnostics: [2021-09-10 18:45:52.033]File file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip does not exist java.io.FileNotFoundException: File file:/home/hadoop/.sparkStaging/application_1631282030708_2863/__spark_libs__7584573487901234438.zip does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:671) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:992) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:661) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:464) at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:243) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:236) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:224) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) For more detailed output, check the application tracking page: http://ip-10-240-102-189.bamtech.test.us-east-1.bamgrid.net:8088/cluster/app/application_1631282030708_2863 Then click on links to logs of each attempt. . Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1631299551829 final status: FAILED tracking URL: http://ip-10-240-102-189.bamtech.test.us-east-1.bamgrid.net:8088/cluster/app/application_1631282030708_2863 user: hadoop 2021-09-10 18:45:52,941 INFO [Thread-9] yarn.Client:57 : Deleted staging directory file:/home/hadoop/.sparkStaging/application_1631282030708_2863 2021-09-10 18:45:52,942 ERROR [Thread-9] cluster.YarnClientSchedulerBackend:73 : The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details. 2021-09-10 18:45:52,943 ERROR [Thread-9] spark.SparkContext:94 : Error initializing SparkContext. Here are my kylin.properties with irrelevant/sensitive values removed: kylin.env.hdfs-working-dir=s3a://<s3a://XXXXX/qa/kylin/hdfs/>XXXXX<s3a://XXXXX/qa/kylin/hdfs/>/qa/kylin/hdfs/<s3a://XXXXX/qa/kylin/hdfs/> kylin.env=QA kylin.server.mode=all kylin.server.cluster-servers=localhost:7070 kylin.engine.default=6 kylin.storage.default=4 kylin.server.external-acl-provider= kylin.source.hive.database-for-flat-table=default kylin.web.default-time-filter=1 kylin.storage.clean-after-delete-operation=false kylin.job.retry=1 kylin.job.max-concurrent-jobs=1 kylin.job.sampling-percentage=100 kylin.job.scheduler.provider.100=org.apache.kylin.job.impl.curator.CuratorScheduler kylin.job.scheduler.default=2 kylin.spark-conf.auto.prior=true kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=client kylin.engine.spark-conf.spark.yarn.queue=default kylin.engine.spark-conf.spark.eventLog.enabled=true kylin.engine.spark-conf.spark.eventLog.dir=hdfs:///kylin/spark-history<hdfs://kylin/spark-history> kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs:///kylin/spark-history<hdfs://kylin/spark-history> kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 -Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug -Dkylin.hdfs.working.dir=${hdfs.working.dir} -Dkylin.metadata.identifier=kylin_metadata -Dkylin.spark.category=job -Dkylin.spark.project=${job.project} -Dkylin.spark.identifier=${job.id<https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fjob.id%2F&data=04%7C01%7CGabe.Michael%40disneystreaming.com%7Cd1fbea80570f44f96d7408d977293df4%7C65f03ca86d0a493e9e4ac85ac9526a03%7C1%7C0%7C637671841890968845%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=t1ypBK86%2BI99DZphr3CIZnLiZvgrm1ZeFeBqPQkxs1E%3D&reserved=0>} -Dkylin.spark.jobName=${job.stepId} -Duser.timezone=${user.timezone} kylin.engine.spark-conf.spark.driver.extraJavaOptions=-XX:+CrashOnOutOfMemoryError kylin.query.auto-sparder-context-enabled-enabled=false kylin.query.spark-conf.spark.master=yarn kylin.query.spark-conf.spark.driver.cores=1 kylin.query.spark-conf.spark.driver.memory=4G kylin.query.spark-conf.spark.driver.memoryOverhead=1G kylin.query.spark-conf.spark.executor.cores=1 kylin.query.spark-conf.spark.executor.instances=1 kylin.query.spark-conf.spark.executor.memory=4G kylin.query.spark-conf.spark.executor.memoryOverhead=1G kylin.query.spark-conf.spark.serializer=org.apache.spark.serializer.JavaSerializer kylin.query.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false kylin.query.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current -Dlog4j.configuration=spark-executor-log4j.properties -Dlog4j.debug -Dkylin.hdfs.working.dir=s3a://dataeng-data-test/qa/kylin/hdfs/ -Dkylin.metadata.identifier=kylin_metadata -Dkylin.spark.category=sparder -Dkylin.spark.identifier={{APP_ID}} kylin.source.hive.redistribute-flat-table=false kylin.metadata.jdbc.dialect=mysql kylin.metadata.jdbc.json-always-small-cell=true kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperDistributedJobLock kylin.web.set-config-enable=true kylin.job.allow-empty-segment=false kylin.env.hadoop-conf-dir=/etc/hadoop/conf kylin.query.lazy-query-enabled=true kylin.query.cache-signature-enabled=true kylin.query.segment-cache-enabled=false kylin.engine.spark-fact-distinct=true kylin.engine.spark-dimension-dictionary=false kylin.engine.spark-uhc-dictionary=true kylin.engine.spark.rdd-partition-cut-mb=10 kylin.engine.spark.min-partition=1 kylin.engine.spark.max-partition=5000 kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1 kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000 kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300 kylin.engine.spark-conf.spark.driver.memory=2G kylin.engine.spark-conf.spark.executor.memory=4G kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024 kylin.engine.spark-conf.spark.executor.cores=1 kylin.engine.spark-conf.spark.network.timeout=600 kylin.engine.spark-conf.spark.shuffle.service.enabled=true kylin.engine.spark-conf.spark.hadoop.dfs.replication=2 kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec kylin.engine.spark-conf-mergedict.spark.executor.memory=6G kylin.engine.spark-conf-mergedict.spark.memory.fraction=0.2 kylin.engine.spark-conf.spark.sql.hive.metastore.version=3.1.2 kylin.engine.spark-conf.spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/lib/* kylin.query.spark-conf.spark.sql.hive.metastore.version=3.1.2 kylin.query.spark-conf.spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/lib/* kylin.server.cluster-name=kylin_metadata kylin.log.spark-executor-properties-file=/usr/local/kylin/conf/spark-executor-log4j.properties kylin.metadata.url.identifier=kylin_metadata Thank you for your assistance, Gabe -- Gabe Michael Principal Data Engineer Disney Streaming Services
