Re: HCatalogFormat error
> My understanding was that the 'kylin.job.mr.lib.dir' setting would distribute the jars through the hadoop tmpjars property for Kylin to use. This is correct. However not the most ideal path. 'kylin.sh' should detect hive/hcat installation and distribute the jars automatically. User should not have to set 'kylin.job.mr.lib.dir' as of v1.5.3. Thanks for the env info. Have verified Kylin on EMR once and it worked at that time. Will double check again. Yang On Wed, Aug 3, 2016 at 10:51 PM, Jason Halewrote: > Thanks for the response Li Yang. This was an EMR cluster which I don't have > running now. I switched to setting up a HDP sandbox to get it up and > running for testing purposes. If I get a chance to spin up the EMR cluster > again, I will look into this further. > > To answer your question, though, it was the latest version of Kylin, 1.5.3, > and I believe hadoop 2.4 on EMR, so this could very well have been the > issue. > My understanding was that the 'kylin.job.mr.lib.dir' setting would > distribute the jars through the hadoop tmpjars property for Kylin to use. > Is this not correct, or not available on this version? > > On Tue, Aug 2, 2016 at 11:52 PM, Li Yang wrote: > > > What's your Kylin version? > > > > If it is 1.5.x, your problem is detecting the right hive jar on the Kylin > > node. > > > > Checkout bin/find-hive-dependency.sh. See if it returns right hive path. > > > > On Thu, Jul 28, 2016 at 6:20 AM, Jason Hale wrote: > > > > > I have set up a Kylin instance on the master node of my Hadoop > cluster. I > > > was trying on a separate client node, but had some permission issues, > so > > to > > > simplify the test case, I've just installed it on master. Now I am > > getting > > > the below error. > > > > > > To correct this, I've tried the solution to distribute the jars in > > > https://issues.apache.org/jira/browse/KYLIN-1082 using ' > > > kylin.job.mr.lib.dir'. > > > I'm not sure how to append to 'kylin.hive.dependency' as I cannot find > > > information on that (perhaps I'm not looking in the right place). But > the > > > lib dir setting did not help and it still is unable to find that class. > > > > > > > > > On #2 Step Name: Extract Fact Table Distinct Columns > > > > > > Kylin executes with the following parameters: > > > > > > -conf /opt/kylin/bin/../conf/kylin_job_conf.xml -cubename Testing > -output > > > > > > > > /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613- > 3febe773ce2c/Testing/fact_distinct_columns > > > -segmentname 1970010100_2016073100 -statisticsenabled true > > > -statisticsoutput > > > > > > > > /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613- > 3febe773ce2c/Testing/statistics > > > -statisticssamplingpercent 100 -jobname > > > Kylin_Fact_Distinct_Columns_Testing_Step > > > > > > Error Msg: > > > > > > 2016-07-27 21:54:03,387 ERROR [pool-6-thread-2] > > > execution.AbstractExecutable:116 : error running Executable > > > java.lang.NoClassDefFoundError: > > > org/apache/hive/hcatalog/mapreduce/HCatInputFormat > > > at > > > > > > > > org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat. > configureJob(HiveMRInput.java:81) > > > at > > > > > > > > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper( > FactDistinctColumnsJob.java:111) > > > at > > > > > > > > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob. > run(FactDistinctColumnsJob.java:91) > > > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91) > > > at > > > > > > > > org.apache.kylin.engine.mr.common.MapReduceExecutable. > doWork(MapReduceExecutable.java:121) > > > at > > > > > > > > org.apache.kylin.job.execution.AbstractExecutable. > execute(AbstractExecutable.java:114) > > > at > > > > > > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork( > DefaultChainedExecutable.java:50) > > > at > > > > > > > > org.apache.kylin.job.execution.AbstractExecutable. > execute(AbstractExecutable.java:114) > > > at > > > > > > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run( > DefaultScheduler.java:124) > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > > > at > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > > > at java.lang.Thread.run(Thread.java:745) > > > Caused by: java.lang.ClassNotFoundException: > > > org.apache.hive.hcatalog.mapreduce.HCatInputFormat > > > at > > > > > > > > org.apache.catalina.loader.WebappClassLoader.loadClass( > WebappClassLoader.java:1720) > > > at > > > > > > > > org.apache.catalina.loader.WebappClassLoader.loadClass( > WebappClassLoader.java:1571) > > > ... 12 more > > > 2016-07-27 21:54:03,399 INFO [pool-6-thread-2] > > > manager.ExecutableManager:274 : job > > > id:40827168-d18f-4b17-a613-3febe773ce2c-01 from RUNNING to ERROR > > > 2016-07-27 21:54:03,399 ERROR [pool-6-thread-2] > > > execution.AbstractExecutable:116 : error
Re: HCatalogFormat error
Thanks for the response Li Yang. This was an EMR cluster which I don't have running now. I switched to setting up a HDP sandbox to get it up and running for testing purposes. If I get a chance to spin up the EMR cluster again, I will look into this further. To answer your question, though, it was the latest version of Kylin, 1.5.3, and I believe hadoop 2.4 on EMR, so this could very well have been the issue. My understanding was that the 'kylin.job.mr.lib.dir' setting would distribute the jars through the hadoop tmpjars property for Kylin to use. Is this not correct, or not available on this version? On Tue, Aug 2, 2016 at 11:52 PM, Li Yangwrote: > What's your Kylin version? > > If it is 1.5.x, your problem is detecting the right hive jar on the Kylin > node. > > Checkout bin/find-hive-dependency.sh. See if it returns right hive path. > > On Thu, Jul 28, 2016 at 6:20 AM, Jason Hale wrote: > > > I have set up a Kylin instance on the master node of my Hadoop cluster. I > > was trying on a separate client node, but had some permission issues, so > to > > simplify the test case, I've just installed it on master. Now I am > getting > > the below error. > > > > To correct this, I've tried the solution to distribute the jars in > > https://issues.apache.org/jira/browse/KYLIN-1082 using ' > > kylin.job.mr.lib.dir'. > > I'm not sure how to append to 'kylin.hive.dependency' as I cannot find > > information on that (perhaps I'm not looking in the right place). But the > > lib dir setting did not help and it still is unable to find that class. > > > > > > On #2 Step Name: Extract Fact Table Distinct Columns > > > > Kylin executes with the following parameters: > > > > -conf /opt/kylin/bin/../conf/kylin_job_conf.xml -cubename Testing -output > > > > > /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613-3febe773ce2c/Testing/fact_distinct_columns > > -segmentname 1970010100_2016073100 -statisticsenabled true > > -statisticsoutput > > > > > /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613-3febe773ce2c/Testing/statistics > > -statisticssamplingpercent 100 -jobname > > Kylin_Fact_Distinct_Columns_Testing_Step > > > > Error Msg: > > > > 2016-07-27 21:54:03,387 ERROR [pool-6-thread-2] > > execution.AbstractExecutable:116 : error running Executable > > java.lang.NoClassDefFoundError: > > org/apache/hive/hcatalog/mapreduce/HCatInputFormat > > at > > > > > org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:81) > > at > > > > > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:111) > > at > > > > > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:91) > > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91) > > at > > > > > org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121) > > at > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > > at > > > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > at > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > > at > > > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.ClassNotFoundException: > > org.apache.hive.hcatalog.mapreduce.HCatInputFormat > > at > > > > > org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720) > > at > > > > > org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571) > > ... 12 more > > 2016-07-27 21:54:03,399 INFO [pool-6-thread-2] > > manager.ExecutableManager:274 : job > > id:40827168-d18f-4b17-a613-3febe773ce2c-01 from RUNNING to ERROR > > 2016-07-27 21:54:03,399 ERROR [pool-6-thread-2] > > execution.AbstractExecutable:116 : error running Executable > > org.apache.kylin.job.exception.ExecuteException: > > java.lang.NoClassDefFoundError: > > org/apache/hive/hcatalog/mapreduce/HCatInputFormat > > at > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124) > > at > > > > > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > > at > > > > > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > > at > > > > > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > > at > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at