Hi Lens Developers,
I have been doing Lens POC for my team for a while, and encountered several
issues which prevent me from querying the example data. A brief summary of
the issues:
1. Lens Query Failed to execute.
2. Partition data is not correctly loaded in hive.
I have read through all the documents in Lens website, and followed
step-by-step the 20 minutes demo. Following are the details for the first
issue:
a. It failed to execute the Lens query, and returned the following error:
*lens-shell>query execute cube select product_id, store_sales from sales
where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')*
*26 Jul 2016 23:08:15 [Spring Shell] INFO cliLogger - Query handle:
69f44b77-97e9-4500-b385-a1881e5c365e*
*26 Jul 2016 23:08:22 [Spring Shell] INFO cliLogger - User query: 'cube
select product_id, store_sales from sales where time_range_in(order_time,
'2015-04-11-00', '2015-04-13-00')' was submitted to hive/hive1*
*26 Jul 2016 23:08:22 [Spring Shell] INFO cliLogger - Driver query:
'INSERT OVERWRITE DIRECTORY
"file:/tmp/lensreports/hdfsout/69f44b77-97e9-4500-b385-a1881e5c365e"
SELECT ( sales . product_id ), sum(( sales . store_sales )) FROM
newdb.local_sales_aggr_fact1 sales WHERE ((((( sales . ot ) = '2015-04-11'
) or (( sales . ot ) = '2015-04-12' )))) GROUP BY ( sales . product_id ) '
and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT,
getHandleIdentifier()=8d1f3b8a-a660-472f-bfb7-32d6b73d0532]*
*Command failed java.lang.NullPointerException*
b. The server logs shows the following error:
26 Jul 2016 23:34:44 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] INFO org.apache.hadoop.hive.ql.exec.mr.ExecDriver -
Executing: /export/apps/hadoop/latest/bin/hadoop jar
/export/home/dev_svc/lens/apache-hive-0.13.1-inm-bin/lib/hive-common-0.13.1-inm.jar
org.apache.hadoop.hive.ql.exec.mr.ExecDriver -plan
file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10003/plan.xml
-jobconffile
file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10002/jobconf.xml
26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Execution
failed with exit status: 1
26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Obtaining
error information
26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task -
Task failed!
Task ID:
Stage-1
Logs:
c. The session logs shows the following error(*** is used to replace
internal cluster' hdfs path):
2016-07-20 17:45:04,103 ERROR [main]: mr.ExecDriver
(SessionState.java:printError(572)) - Job Submission failed with exception
'java.lang.IllegalArgumentException(Wrong FS:
hdfs:/*****/hadoop-2.6.1.*****.tar.gz, expected: file:///)'
java.lang.IllegalArgumentException: Wrong FS:
hdfs:/*****/hadoop-2.6.1.****.tar.gz,
expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)
at
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)
at
org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:455)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
at
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
d. Hive Cli shows the following result:
hive (newdb)> describe formatted local_sales_aggr_fact1;
OK
# col_name data_type comment
order_time timestamp
delivery_time timestamp
customer_id int
product_id int
promotion_id int
customer_city_id int
production_city_id int
delivery_city_id int
unit_sales bigint
store_sales double
store_cost double
max_line_item_price float
max_line_item_discount float
# Partition Information
# col_name data_type comment
pt string Process time partition
ot string Order time partition
dt string Delivery time partition
# Detailed Table Information
Database: newdb
Owner: null
CreateTime: Mon Jul 25 16:21:03 PDT 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: file:/tmp/examples/aggrfact1
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
cube.storagetable.partition.timeline.cache.DAILY.dt.first
cube.storagetable.partition.timeline.cache.DAILY.dt.holes.size 0
cube.storagetable.partition.timeline.cache.DAILY.dt.latest
cube.storagetable.partition.timeline.cache.DAILY.dt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.DAILY.ot.first
cube.storagetable.partition.timeline.cache.DAILY.ot.holes.size 0
cube.storagetable.partition.timeline.cache.DAILY.ot.latest
cube.storagetable.partition.timeline.cache.DAILY.ot.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.DAILY.pt.first
cube.storagetable.partition.timeline.cache.DAILY.pt.holes.size 0
cube.storagetable.partition.timeline.cache.DAILY.pt.latest
cube.storagetable.partition.timeline.cache.DAILY.pt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.HOURLY.dt.first
cube.storagetable.partition.timeline.cache.HOURLY.dt.holes.size 0
cube.storagetable.partition.timeline.cache.HOURLY.dt.latest
cube.storagetable.partition.timeline.cache.HOURLY.dt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.HOURLY.ot.first
cube.storagetable.partition.timeline.cache.HOURLY.ot.holes.size 0
cube.storagetable.partition.timeline.cache.HOURLY.ot.latest
cube.storagetable.partition.timeline.cache.HOURLY.ot.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.HOURLY.pt.first
cube.storagetable.partition.timeline.cache.HOURLY.pt.holes.size 0
cube.storagetable.partition.timeline.cache.HOURLY.pt.latest
cube.storagetable.partition.timeline.cache.HOURLY.pt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.present true
cube.storagetable.time.partcols pt,ot,dt
transient_lastDdlTime 1469574936
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: 0
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 6.435 seconds, Fetched: 73 row(s)
Because the query is finally against the hive table *local_sales_aggr_fact1,
which use the storage local:*
<x_storage classname="org.apache.lens.cube.metadata.HDFSStorage"
name="local" xmlns="uri:lens:cube:0.1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
<properties>
*<property name="storage.url" value="file:///"/>*
</properties>
</x_storage>
*e. So, I tried to change the value from "file:///" to
"hdfs:/hadoop-host:port/", (similar to
examples/resources/local-cluster-storage.xml), which didn't work.f. I also
tried to change the fact table "examples/resources/sales-aggr-fact1.xml"'s
location from "/tmp/examples/aggrfact1" to HDFS path, and then add
partition, it failed with the error:*
lens-shell>fact add partitions --fact_name sales_aggr_fact1 --storage_name
local --path examples/resources/sales-aggr-fact1-local-parts.xml
*Command failed javax.ws.rs.InternalServerErrorException: HTTP 500 Request
failed.-- *
Corresponding Server logs:
*Caused by: java.lang.NoClassDefFoundError: org/htrace/Trace*
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:214)
~[hadoop-common-2.6.1.52.jar:na]
at com.sun.proxy.$Proxy89.getFileInfo(Unknown Source) ~[na:na]
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
~[hadoop-hdfs-2.6.1.52.jar:na]
g. I also tried to change the hive table's location from
file:/tmp/examples/aggrfact1 to a HDFS location, and the Lens query
returned the same error.
For the second issue. I tried to query the hive table directory and found
the data is not loaded, because in the examples, partitions' location is
example/resource/..., which is not an absolute path from root '/', so, I
changed it to an absolute path, and the hive query 'Select * from
newdb.local_sales_aggr_fact1' is able to return the data, but the converted
lens query failed in the mapreduce stage because it cannot find the local
filesystem path I assigned.
B*TW, I use the company's hadoop cluster, with hadoop version 2.6.1, I
setup a hive on top of that using the version hive-0.13.1-inm downloaded
from Lens website, and setup a mysql database as the metastore.*
*Could you please help me resolve the above two issues? I really
appreciate that.*
*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services
206.250.5345
[email protected]
https://www.linkedin.com/in/taousc