1.我使用CDH版本的hadoop 2.参照http://kylin.apache.org/docs21/tutorial/cube_spark.html 进行配置 3.修改kylin.properties #### SPARK ENGINE CONFIGS ### # ## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run spark-submit ## This must contain site xmls of core, yarn, hive, and hbase in one folder kylin.env.hadoop-conf-dir=/etc/kylin/conf # ## Estimate the RDD partition numbers kylin.engine.spark.rdd-partition-cut-mb=100 # ## Minimal partition numbers of rdd kylin.engine.spark.min-partition=1 # ## Max partition numbers of rdd kylin.engine.spark.max-partition=5000 # ## Spark conf (default is in spark/conf/spark-defaults.conf) kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.submit.deployMode=cluster kylin.engine.spark-conf.spark.yarn.queue=default kylin.engine.spark-conf.spark.executor.memory=4G kylin.engine.spark-conf.spark.executor.cores=2 kylin.engine.spark-conf.spark.executor.instances=1 kylin.engine.spark-conf.spark.eventLog.enabled=true kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false
## manually upload spark-assembly jar to HDFS and then set this property will avoid repeatedly uploading jar at runtime kylin.engine.spark-conf.spark.yarn.archive=hdfs://master:8020/kylin/spark/spark-assembly-1.6.0-cdh5.12.1-hadoop2.6.0-cdh5.12.1.jar kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec 4.并且将 -rw-r--r-- 1 root root 3849 Jan 19 04:48 core-site.xml -rw-r--r-- 1 root root 2824 Jan 19 04:50 hbase-site.xml -rw-r--r-- 1 root root 1705 Jan 19 04:48 hdfs-site.xml -rw-r--r-- 1 root root 5377 Jan 19 04:50 hive-site.xml -rw-r--r-- 1 root root 5028 Jan 19 04:48 mapred-site.xml -rw-r--r-- 1 root root 3533 Jan 19 04:49 yarn-site.xml 配置到相应文件中 执行cube build 遇到了 2018-01-19 05:30:33,692 INFO [main] spark.SparkCubingByLayer (SparkCubingByLayer.java:execute(145)) - RDD Output path: hdfs://master:8020/kylin/kylin_metadata/kylin-3bbc0044-f353-4f7b-9db3-3c888dd73e3b/kylin_sales_cube/cuboid/ 2018-01-19 05:30:33,776 INFO [main] spark.SparkCubingByLayer (SparkCubingByLayer.java:execute(163)) - All measure are normal (agg on all cuboids) ? : true Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveVariableSource at org.apache.kylin.engine.spark.SparkCubingByLayer.execute(SparkCubingByLayer.java:166) at org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37) at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveVariableSource at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 12 more 2018-01-19 05:30:33,792 INFO [Thread-3] spark.SparkContext (Logging.scala:logInfo(58)) - Invoking stop() from shutdown hook