build cube use spark

han cheryl Fri, 19 Jan 2018 02:42:07 -0800

1.我使用CDH版本的hadoop
2.参照http://kylin.apache.org/docs21/tutorial/cube_spark.html 进行配置
3.修改kylin.properties
#### SPARK ENGINE CONFIGS ###
#
## Hadoop conf folder, will export this as "HADOOP_CONF_DIR" to run
spark-submit
## This must contain site xmls of core, yarn, hive, and hbase in one folder
kylin.env.hadoop-conf-dir=/etc/kylin/conf
#
## Estimate the RDD partition numbers
kylin.engine.spark.rdd-partition-cut-mb=100
#
## Minimal partition numbers of rdd
kylin.engine.spark.min-partition=1
#
## Max partition numbers of rdd
kylin.engine.spark.max-partition=5000
#
## Spark conf (default is in spark/conf/spark-defaults.conf)
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.executor.memory=4G
kylin.engine.spark-conf.spark.executor.cores=2
kylin.engine.spark-conf.spark.executor.instances=1
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
kylin.engine.spark-conf.spark.hadoop.yarn.timeline-service.enabled=false


## manually upload spark-assembly jar to HDFS and then set this property
will avoid repeatedly uploading jar at runtime

kylin.engine.spark-conf.spark.yarn.archive=hdfs://master:8020/kylin/spark/spark-assembly-1.6.0-cdh5.12.1-hadoop2.6.0-cdh5.12.1.jar

kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec


4.并且将

-rw-r--r-- 1 root root 3849 Jan 19 04:48 core-site.xml
-rw-r--r-- 1 root root 2824 Jan 19 04:50 hbase-site.xml
-rw-r--r-- 1 root root 1705 Jan 19 04:48 hdfs-site.xml
-rw-r--r-- 1 root root 5377 Jan 19 04:50 hive-site.xml
-rw-r--r-- 1 root root 5028 Jan 19 04:48 mapred-site.xml
-rw-r--r-- 1 root root 3533 Jan 19 04:49 yarn-site.xml
配置到相应文件中



执行cube build 遇到了

2018-01-19 05:30:33,692 INFO  [main] spark.SparkCubingByLayer
(SparkCubingByLayer.java:execute(145)) - RDD Output path:
hdfs://master:8020/kylin/kylin_metadata/kylin-3bbc0044-f353-4f7b-9db3-3c888dd73e3b/kylin_sales_cube/cuboid/
2018-01-19 05:30:33,776 INFO  [main] spark.SparkCubingByLayer
(SparkCubingByLayer.java:execute(163)) - All measure are normal (agg
on all cuboids) ? : true
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/hadoop/hive/conf/HiveVariableSource
        at 
org.apache.kylin.engine.spark.SparkCubingByLayer.execute(SparkCubingByLayer.java:166)
        at 
org.apache.kylin.common.util.AbstractApplication.execute(AbstractApplication.java:37)
        at org.apache.kylin.common.util.SparkEntry.main(SparkEntry.java:44)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
        at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.conf.HiveVariableSource
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 12 more
2018-01-19 05:30:33,792 INFO  [Thread-3] spark.SparkContext
(Logging.scala:logInfo(58)) - Invoking stop() from shutdown hook

build cube use spark

Reply via email to