Hi Rajat & Amareshwari,

Thanks very much for your suggestions. I will follow the steps.

BTW, will you please send me the lens-site.xml and hivedriver-site.xml
files?

It seems when I added the MySQL database configuration in lens-site.xml,
Lens can directly talk with MySQL, is that right?

Thanks.

On Wed, Jul 27, 2016 at 5:13 AM, amareshwarisr . <[email protected]>
wrote:

> Tao,
>
> here are some points which can help you.
> - Default examples setup works only with single node setup. Not sure if
> you running in multinode cluster.
>
> - For missing htrace class, can you try adding htrace jar from hadoop
> libraries in LENSCPPATH
>
> Thanks
>
> On Wed, Jul 27, 2016 at 9:49 AM, Tao Yan <[email protected]> wrote:
>
>> Hi Lens Developers,
>>
>> I have been doing Lens POC for my team for a while, and encountered
>> several issues which prevent me from querying the example data. A brief
>> summary of the issues:
>>
>> 1. Lens Query Failed to execute.
>> 2. Partition data is not correctly loaded in hive.
>>
>> I have read through all the documents in Lens website, and followed
>> step-by-step the 20 minutes demo. Following are the details for the first
>> issue:
>>
>> a. It failed to execute the Lens query, and returned the following error:
>>
>> *lens-shell>query execute cube select product_id, store_sales from sales
>> where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')*
>> *26 Jul 2016 23:08:15 [Spring Shell] INFO  cliLogger - Query handle:
>> 69f44b77-97e9-4500-b385-a1881e5c365e*
>> *26 Jul 2016 23:08:22 [Spring Shell] INFO  cliLogger - User query: 'cube
>> select product_id, store_sales from sales where time_range_in(order_time,
>> '2015-04-11-00', '2015-04-13-00')' was submitted to hive/hive1*
>> *26 Jul 2016 23:08:22 [Spring Shell] INFO  cliLogger -  Driver query:
>> 'INSERT OVERWRITE DIRECTORY
>> "file:/tmp/lensreports/hdfsout/69f44b77-97e9-4500-b385-a1881e5c365e"
>>  SELECT ( sales . product_id ), sum(( sales . store_sales )) FROM
>> newdb.local_sales_aggr_fact1 sales WHERE ((((( sales . ot ) =  '2015-04-11'
>> ) or (( sales . ot ) =  '2015-04-12' )))) GROUP BY ( sales . product_id ) '
>> and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT,
>> getHandleIdentifier()=8d1f3b8a-a660-472f-bfb7-32d6b73d0532]*
>> *Command failed java.lang.NullPointerException*
>>
>>
>> b. The server logs shows the following error:
>>
>> 26 Jul 2016 23:34:44 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>> [pool-19-thread-7] INFO  org.apache.hadoop.hive.ql.exec.mr.ExecDriver -
>> Executing: /export/apps/hadoop/latest/bin/hadoop jar
>> /export/home/dev_svc/lens/apache-hive-0.13.1-inm-bin/lib/hive-common-0.13.1-inm.jar
>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver  -plan
>> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10003/plan.xml
>>   -jobconffile
>> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10002/jobconf.xml
>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Execution
>> failed with exit status: 1
>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Obtaining
>> error information
>> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
>> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task -
>> Task failed!
>> Task ID:
>>   Stage-1
>>
>> Logs:
>>
>>
>> c. The session logs shows the following error(*** is used to replace
>> internal cluster' hdfs path):
>>
>> 2016-07-20 17:45:04,103 ERROR [main]: mr.ExecDriver
>> (SessionState.java:printError(572)) - Job Submission failed with exception
>> 'java.lang.IllegalArgumentException(Wrong FS:
>> hdfs:/*****/hadoop-2.6.1.*****.tar.gz, expected: file:///)'
>>
>> java.lang.IllegalArgumentException: Wrong FS: 
>> hdfs:/*****/hadoop-2.6.1.****.tar.gz,
>> expected: file:///
>>
>>        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
>>
>>        at
>> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
>>
>>        at
>> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
>>
>>        at
>> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:455)
>>
>>        at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
>>
>>        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>>
>>        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>>
>>        at java.security.AccessController.doPrivileged(Native Method)
>>
>>        at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>
>>        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>>
>>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>>
>>        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>>
>>        at java.security.AccessController.doPrivileged(Native Method)
>>
>>        at javax.security.auth.Subject.doAs(Subject.java:422)
>>
>>        at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>
>>        at
>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>>
>>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>>
>>        at
>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
>>
>>        at
>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741)
>>
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>
>>        at java.lang.reflect.Method.invoke(Method.java:483)
>>
>>        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
>>
>>
>> d. Hive Cli shows the following result:
>>
>> hive (newdb)> describe formatted local_sales_aggr_fact1;
>> OK
>> # col_name             data_type           comment
>>
>> order_time           timestamp
>> delivery_time       timestamp
>> customer_id         int
>> product_id           int
>> promotion_id         int
>> customer_city_id     int
>> production_city_id   int
>> delivery_city_id     int
>> unit_sales           bigint
>> store_sales         double
>> store_cost           double
>> max_line_item_price float
>> max_line_item_discount float
>>
>> # Partition Information
>> # col_name             data_type           comment
>>
>> pt                   string               Process time partition
>> ot                   string               Order time partition
>> dt                   string               Delivery time partition
>>
>> # Detailed Table Information
>> Database:           newdb
>> Owner:               null
>> CreateTime:         Mon Jul 25 16:21:03 PDT 2016
>> LastAccessTime:     UNKNOWN
>> Protect Mode:       None
>> Retention:           0
>> Location:           file:/tmp/examples/aggrfact1
>> Table Type:         EXTERNAL_TABLE
>> Table Parameters:
>> EXTERNAL             TRUE
>> cube.storagetable.partition.timeline.cache.DAILY.dt.first
>>
>> cube.storagetable.partition.timeline.cache.DAILY.dt.holes.size 0
>>
>> cube.storagetable.partition.timeline.cache.DAILY.dt.latest
>>
>> cube.storagetable.partition.timeline.cache.DAILY.dt.storage.class
>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>> cube.storagetable.partition.timeline.cache.DAILY.ot.first
>>
>> cube.storagetable.partition.timeline.cache.DAILY.ot.holes.size 0
>>
>> cube.storagetable.partition.timeline.cache.DAILY.ot.latest
>>
>> cube.storagetable.partition.timeline.cache.DAILY.ot.storage.class
>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>> cube.storagetable.partition.timeline.cache.DAILY.pt.first
>>
>> cube.storagetable.partition.timeline.cache.DAILY.pt.holes.size 0
>>
>> cube.storagetable.partition.timeline.cache.DAILY.pt.latest
>>
>> cube.storagetable.partition.timeline.cache.DAILY.pt.storage.class
>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>> cube.storagetable.partition.timeline.cache.HOURLY.dt.first
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.dt.holes.size 0
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.dt.latest
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.dt.storage.class
>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>> cube.storagetable.partition.timeline.cache.HOURLY.ot.first
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.ot.holes.size 0
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.ot.latest
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.ot.storage.class
>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>> cube.storagetable.partition.timeline.cache.HOURLY.pt.first
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.pt.holes.size 0
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.pt.latest
>>
>> cube.storagetable.partition.timeline.cache.HOURLY.pt.storage.class
>> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
>> cube.storagetable.partition.timeline.cache.present true
>> cube.storagetable.time.partcols pt,ot,dt
>> transient_lastDdlTime 1469574936
>> # Storage Information
>> SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>> InputFormat:         org.apache.hadoop.mapred.TextInputFormat
>> OutputFormat:
>> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>> Compressed:         No
>> Num Buckets:         0
>> Bucket Columns:     []
>> Sort Columns:       []
>> Storage Desc Params:
>> field.delim         ,
>> serialization.format ,
>> Time taken: 6.435 seconds, Fetched: 73 row(s)
>>
>>
>> Because the query is finally against the hive table *local_sales_aggr_fact1,
>> which use the storage local:*
>>
>> <x_storage classname="org.apache.lens.cube.metadata.HDFSStorage"
>> name="local" xmlns="uri:lens:cube:0.1"
>>   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
>> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
>>   <properties>
>>     *<property name="storage.url" value="file:///"/>*
>>   </properties>
>> </x_storage>
>>
>>
>> *e. So, I tried to change the value from "file:///" to
>> "hdfs:/hadoop-host:port/", (similar to
>> examples/resources/local-cluster-storage.xml), which didn't work.f. I also
>> tried to change the fact table "examples/resources/sales-aggr-fact1.xml"'s
>> location from "/tmp/examples/aggrfact1" to HDFS path, and then add
>> partition, it failed with the error:*
>> lens-shell>fact add partitions --fact_name sales_aggr_fact1
>> --storage_name local --path
>> examples/resources/sales-aggr-fact1-local-parts.xml
>> *Command failed javax.ws.rs.InternalServerErrorException: HTTP 500
>> Request failed.-- *
>>
>>
>> Corresponding Server logs:
>>
>> *Caused by: java.lang.NoClassDefFoundError: org/htrace/Trace*
>>         at
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:214)
>> ~[hadoop-common-2.6.1.52.jar:na]
>>         at com.sun.proxy.$Proxy89.getFileInfo(Unknown Source) ~[na:na]
>>         at
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
>> ~[hadoop-hdfs-2.6.1.52.jar:na]
>>
>>
>> g. I also tried to change the hive table's location from
>> file:/tmp/examples/aggrfact1 to a HDFS location, and the Lens query
>> returned the same error.
>>
>>
>> For the second issue. I tried to query the hive table directory and found
>> the data is not loaded, because in the examples, partitions' location is
>> example/resource/..., which is not an absolute path from root '/', so, I
>> changed it to an absolute path, and the hive query 'Select * from
>> newdb.local_sales_aggr_fact1' is able to return the data, but the converted
>> lens query failed in the mapreduce stage because it cannot find the local
>> filesystem path I assigned.
>>
>> B*TW, I use the company's hadoop cluster, with hadoop version 2.6.1, I
>> setup a hive on top of that using the version hive-0.13.1-inm downloaded
>> from Lens website, and setup a mysql database as the metastore.*
>>
>> *Could you please help me resolve the above two issues?  I really
>> appreciate that.*
>>
>>
>> *Tao Yan*
>> Software Engineer
>> Data Analytics Infrastructure Tools and Services
>>
>>
>>
>> 206.250.5345
>> [email protected]
>> https://www.linkedin.com/in/taousc
>>
>
>


-- 

*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
[email protected]
https://www.linkedin.com/in/taousc

Reply via email to