Re: HCatalogFormat error

2016-08-10 Thread Li Yang
> My understanding was that the 'kylin.job.mr.lib.dir' setting would
distribute the jars through the hadoop tmpjars property for Kylin to use.

This is correct. However not the most ideal path.

'kylin.sh' should detect hive/hcat installation and distribute the jars
automatically. User should not have to set 'kylin.job.mr.lib.dir' as of
v1.5.3.

Thanks for the env info. Have verified Kylin on EMR once and it worked at
that time. Will double check again.

Yang

On Wed, Aug 3, 2016 at 10:51 PM, Jason Hale  wrote:

> Thanks for the response Li Yang. This was an EMR cluster which I don't have
> running now. I switched to setting up a HDP sandbox to get it up and
> running for testing purposes. If I get a chance to spin up the EMR cluster
> again, I will look into this further.
>
> To answer your question, though, it was the latest version of Kylin, 1.5.3,
> and I believe hadoop 2.4 on EMR, so this could very well have been the
> issue.
> My understanding was that the 'kylin.job.mr.lib.dir' setting would
> distribute the jars through the hadoop tmpjars property for Kylin to use.
> Is this not correct, or not available on this version?
>
> On Tue, Aug 2, 2016 at 11:52 PM, Li Yang  wrote:
>
> > What's your Kylin version?
> >
> > If it is 1.5.x, your problem is detecting the right hive jar on the Kylin
> > node.
> >
> > Checkout bin/find-hive-dependency.sh. See if it returns right hive path.
> >
> > On Thu, Jul 28, 2016 at 6:20 AM, Jason Hale  wrote:
> >
> > > I have set up a Kylin instance on the master node of my Hadoop
> cluster. I
> > > was trying on a separate client node, but had some permission issues,
> so
> > to
> > > simplify the test case, I've just installed it on master. Now I am
> > getting
> > > the below error.
> > >
> > > To correct this, I've tried the solution to distribute the jars in
> > > https://issues.apache.org/jira/browse/KYLIN-1082 using '
> > > kylin.job.mr.lib.dir'.
> > > I'm not sure how to append to 'kylin.hive.dependency' as I cannot find
> > > information on that (perhaps I'm not looking in the right place). But
> the
> > > lib dir setting did not help and it still is unable to find that class.
> > >
> > >
> > > On #2 Step Name: Extract Fact Table Distinct Columns
> > >
> > > Kylin executes with the following parameters:
> > >
> > > -conf /opt/kylin/bin/../conf/kylin_job_conf.xml -cubename Testing
> -output
> > >
> > >
> > /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613-
> 3febe773ce2c/Testing/fact_distinct_columns
> > > -segmentname 1970010100_2016073100 -statisticsenabled true
> > > -statisticsoutput
> > >
> > >
> > /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613-
> 3febe773ce2c/Testing/statistics
> > > -statisticssamplingpercent 100 -jobname
> > > Kylin_Fact_Distinct_Columns_Testing_Step
> > >
> > > Error Msg:
> > >
> > > 2016-07-27 21:54:03,387 ERROR [pool-6-thread-2]
> > > execution.AbstractExecutable:116 : error running Executable
> > > java.lang.NoClassDefFoundError:
> > > org/apache/hive/hcatalog/mapreduce/HCatInputFormat
> > > at
> > >
> > >
> > org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.
> configureJob(HiveMRInput.java:81)
> > > at
> > >
> > >
> > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(
> FactDistinctColumnsJob.java:111)
> > > at
> > >
> > >
> > org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.
> run(FactDistinctColumnsJob.java:91)
> > > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
> > > at
> > >
> > >
> > org.apache.kylin.engine.mr.common.MapReduceExecutable.
> doWork(MapReduceExecutable.java:121)
> > > at
> > >
> > >
> > org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:114)
> > > at
> > >
> > >
> > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:50)
> > > at
> > >
> > >
> > org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:114)
> > > at
> > >
> > >
> > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(
> DefaultScheduler.java:124)
> > > at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > at
> > >
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:745)
> > > Caused by: java.lang.ClassNotFoundException:
> > > org.apache.hive.hcatalog.mapreduce.HCatInputFormat
> > > at
> > >
> > >
> > org.apache.catalina.loader.WebappClassLoader.loadClass(
> WebappClassLoader.java:1720)
> > > at
> > >
> > >
> > org.apache.catalina.loader.WebappClassLoader.loadClass(
> WebappClassLoader.java:1571)
> > > ... 12 more
> > > 2016-07-27 21:54:03,399 INFO  [pool-6-thread-2]
> > > manager.ExecutableManager:274 : job
> > > id:40827168-d18f-4b17-a613-3febe773ce2c-01 from RUNNING to ERROR
> > > 2016-07-27 21:54:03,399 ERROR [pool-6-thread-2]
> > > execution.AbstractExecutable:116 : error 

Re: HCatalogFormat error

2016-08-03 Thread Jason Hale
Thanks for the response Li Yang. This was an EMR cluster which I don't have
running now. I switched to setting up a HDP sandbox to get it up and
running for testing purposes. If I get a chance to spin up the EMR cluster
again, I will look into this further.

To answer your question, though, it was the latest version of Kylin, 1.5.3,
and I believe hadoop 2.4 on EMR, so this could very well have been the
issue.
My understanding was that the 'kylin.job.mr.lib.dir' setting would
distribute the jars through the hadoop tmpjars property for Kylin to use.
Is this not correct, or not available on this version?

On Tue, Aug 2, 2016 at 11:52 PM, Li Yang  wrote:

> What's your Kylin version?
>
> If it is 1.5.x, your problem is detecting the right hive jar on the Kylin
> node.
>
> Checkout bin/find-hive-dependency.sh. See if it returns right hive path.
>
> On Thu, Jul 28, 2016 at 6:20 AM, Jason Hale  wrote:
>
> > I have set up a Kylin instance on the master node of my Hadoop cluster. I
> > was trying on a separate client node, but had some permission issues, so
> to
> > simplify the test case, I've just installed it on master. Now I am
> getting
> > the below error.
> >
> > To correct this, I've tried the solution to distribute the jars in
> > https://issues.apache.org/jira/browse/KYLIN-1082 using '
> > kylin.job.mr.lib.dir'.
> > I'm not sure how to append to 'kylin.hive.dependency' as I cannot find
> > information on that (perhaps I'm not looking in the right place). But the
> > lib dir setting did not help and it still is unable to find that class.
> >
> >
> > On #2 Step Name: Extract Fact Table Distinct Columns
> >
> > Kylin executes with the following parameters:
> >
> > -conf /opt/kylin/bin/../conf/kylin_job_conf.xml -cubename Testing -output
> >
> >
> /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613-3febe773ce2c/Testing/fact_distinct_columns
> > -segmentname 1970010100_2016073100 -statisticsenabled true
> > -statisticsoutput
> >
> >
> /kylin/kylin_metadata/kylin-40827168-d18f-4b17-a613-3febe773ce2c/Testing/statistics
> > -statisticssamplingpercent 100 -jobname
> > Kylin_Fact_Distinct_Columns_Testing_Step
> >
> > Error Msg:
> >
> > 2016-07-27 21:54:03,387 ERROR [pool-6-thread-2]
> > execution.AbstractExecutable:116 : error running Executable
> > java.lang.NoClassDefFoundError:
> > org/apache/hive/hcatalog/mapreduce/HCatInputFormat
> > at
> >
> >
> org.apache.kylin.source.hive.HiveMRInput$HiveTableInputFormat.configureJob(HiveMRInput.java:81)
> > at
> >
> >
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.setupMapper(FactDistinctColumnsJob.java:111)
> > at
> >
> >
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsJob.run(FactDistinctColumnsJob.java:91)
> > at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:91)
> > at
> >
> >
> org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:121)
> > at
> >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> > at
> >
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > at
> >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> > at
> >
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.hive.hcatalog.mapreduce.HCatInputFormat
> > at
> >
> >
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1720)
> > at
> >
> >
> org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1571)
> > ... 12 more
> > 2016-07-27 21:54:03,399 INFO  [pool-6-thread-2]
> > manager.ExecutableManager:274 : job
> > id:40827168-d18f-4b17-a613-3febe773ce2c-01 from RUNNING to ERROR
> > 2016-07-27 21:54:03,399 ERROR [pool-6-thread-2]
> > execution.AbstractExecutable:116 : error running Executable
> > org.apache.kylin.job.exception.ExecuteException:
> > java.lang.NoClassDefFoundError:
> > org/apache/hive/hcatalog/mapreduce/HCatInputFormat
> > at
> >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:124)
> > at
> >
> >
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
> > at
> >
> >
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
> > at
> >
> >
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at