Hi Lens Developers,

I have been doing Lens POC for my team for a while, and encountered several
issues which prevent me from querying the example data. A brief summary of
the issues:

1. Lens Query Failed to execute.
2. Partition data is not correctly loaded in hive.

I have read through all the documents in Lens website, and followed
step-by-step the 20 minutes demo. Following are the details for the first
issue:

a. It failed to execute the Lens query, and returned the following error:

*lens-shell>query execute cube select product_id, store_sales from sales
where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')*
*26 Jul 2016 23:08:15 [Spring Shell] INFO  cliLogger - Query handle:
69f44b77-97e9-4500-b385-a1881e5c365e*
*26 Jul 2016 23:08:22 [Spring Shell] INFO  cliLogger - User query: 'cube
select product_id, store_sales from sales where time_range_in(order_time,
'2015-04-11-00', '2015-04-13-00')' was submitted to hive/hive1*
*26 Jul 2016 23:08:22 [Spring Shell] INFO  cliLogger -  Driver query:
'INSERT OVERWRITE DIRECTORY
"file:/tmp/lensreports/hdfsout/69f44b77-97e9-4500-b385-a1881e5c365e"
 SELECT ( sales . product_id ), sum(( sales . store_sales )) FROM
newdb.local_sales_aggr_fact1 sales WHERE ((((( sales . ot ) =  '2015-04-11'
) or (( sales . ot ) =  '2015-04-12' )))) GROUP BY ( sales . product_id ) '
and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT,
getHandleIdentifier()=8d1f3b8a-a660-472f-bfb7-32d6b73d0532]*
*Command failed java.lang.NullPointerException*


b. The server logs shows the following error:

26 Jul 2016 23:34:44 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] INFO  org.apache.hadoop.hive.ql.exec.mr.ExecDriver -
Executing: /export/apps/hadoop/latest/bin/hadoop jar
/export/home/dev_svc/lens/apache-hive-0.13.1-inm-bin/lib/hive-common-0.13.1-inm.jar
org.apache.hadoop.hive.ql.exec.mr.ExecDriver  -plan
file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10003/plan.xml
  -jobconffile
file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10002/jobconf.xml
26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Execution
failed with exit status: 1
26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Obtaining
error information
26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6]
[pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task -
Task failed!
Task ID:
  Stage-1

Logs:


c. The session logs shows the following error(*** is used to replace
internal cluster' hdfs path):

2016-07-20 17:45:04,103 ERROR [main]: mr.ExecDriver
(SessionState.java:printError(572)) - Job Submission failed with exception
'java.lang.IllegalArgumentException(Wrong FS:
hdfs:/*****/hadoop-2.6.1.*****.tar.gz, expected: file:///)'

java.lang.IllegalArgumentException: Wrong FS:
hdfs:/*****/hadoop-2.6.1.****.tar.gz,
expected: file:///

       at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645)

       at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465)

       at
org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119)

       at
org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:455)

       at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)

       at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)

       at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)

       at java.security.AccessController.doPrivileged(Native Method)

       at javax.security.auth.Subject.doAs(Subject.java:422)

       at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

       at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)

       at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)

       at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)

       at java.security.AccessController.doPrivileged(Native Method)

       at javax.security.auth.Subject.doAs(Subject.java:422)

       at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

       at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)

       at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)

       at
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)

       at
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741)

       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

       at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

       at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

       at java.lang.reflect.Method.invoke(Method.java:483)

       at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


d. Hive Cli shows the following result:

hive (newdb)> describe formatted local_sales_aggr_fact1;
OK
# col_name             data_type           comment

order_time           timestamp
delivery_time       timestamp
customer_id         int
product_id           int
promotion_id         int
customer_city_id     int
production_city_id   int
delivery_city_id     int
unit_sales           bigint
store_sales         double
store_cost           double
max_line_item_price float
max_line_item_discount float

# Partition Information
# col_name             data_type           comment

pt                   string               Process time partition
ot                   string               Order time partition
dt                   string               Delivery time partition

# Detailed Table Information
Database:           newdb
Owner:               null
CreateTime:         Mon Jul 25 16:21:03 PDT 2016
LastAccessTime:     UNKNOWN
Protect Mode:       None
Retention:           0
Location:           file:/tmp/examples/aggrfact1
Table Type:         EXTERNAL_TABLE
Table Parameters:
EXTERNAL             TRUE
cube.storagetable.partition.timeline.cache.DAILY.dt.first

cube.storagetable.partition.timeline.cache.DAILY.dt.holes.size 0

cube.storagetable.partition.timeline.cache.DAILY.dt.latest

cube.storagetable.partition.timeline.cache.DAILY.dt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.DAILY.ot.first

cube.storagetable.partition.timeline.cache.DAILY.ot.holes.size 0

cube.storagetable.partition.timeline.cache.DAILY.ot.latest

cube.storagetable.partition.timeline.cache.DAILY.ot.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.DAILY.pt.first

cube.storagetable.partition.timeline.cache.DAILY.pt.holes.size 0

cube.storagetable.partition.timeline.cache.DAILY.pt.latest

cube.storagetable.partition.timeline.cache.DAILY.pt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.HOURLY.dt.first

cube.storagetable.partition.timeline.cache.HOURLY.dt.holes.size 0

cube.storagetable.partition.timeline.cache.HOURLY.dt.latest

cube.storagetable.partition.timeline.cache.HOURLY.dt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.HOURLY.ot.first

cube.storagetable.partition.timeline.cache.HOURLY.ot.holes.size 0

cube.storagetable.partition.timeline.cache.HOURLY.ot.latest

cube.storagetable.partition.timeline.cache.HOURLY.ot.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.HOURLY.pt.first

cube.storagetable.partition.timeline.cache.HOURLY.pt.holes.size 0

cube.storagetable.partition.timeline.cache.HOURLY.pt.latest

cube.storagetable.partition.timeline.cache.HOURLY.pt.storage.class
org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline
cube.storagetable.partition.timeline.cache.present true
cube.storagetable.time.partcols pt,ot,dt
transient_lastDdlTime 1469574936
# Storage Information
SerDe Library:       org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:         org.apache.hadoop.mapred.TextInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:         No
Num Buckets:         0
Bucket Columns:     []
Sort Columns:       []
Storage Desc Params:
field.delim         ,
serialization.format ,
Time taken: 6.435 seconds, Fetched: 73 row(s)


Because the query is finally against the hive table *local_sales_aggr_fact1,
which use the storage local:*

<x_storage classname="org.apache.lens.cube.metadata.HDFSStorage"
name="local" xmlns="uri:lens:cube:0.1"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd ">
  <properties>
    *<property name="storage.url" value="file:///"/>*
  </properties>
</x_storage>


*e. So, I tried to change the value from "file:///" to
"hdfs:/hadoop-host:port/", (similar to
examples/resources/local-cluster-storage.xml), which didn't work.f. I also
tried to change the fact table "examples/resources/sales-aggr-fact1.xml"'s
location from "/tmp/examples/aggrfact1" to HDFS path, and then add
partition, it failed with the error:*
lens-shell>fact add partitions --fact_name sales_aggr_fact1 --storage_name
local --path examples/resources/sales-aggr-fact1-local-parts.xml
*Command failed javax.ws.rs.InternalServerErrorException: HTTP 500 Request
failed.-- *


Corresponding Server logs:

*Caused by: java.lang.NoClassDefFoundError: org/htrace/Trace*
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:214)
~[hadoop-common-2.6.1.52.jar:na]
        at com.sun.proxy.$Proxy89.getFileInfo(Unknown Source) ~[na:na]
        at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
~[hadoop-hdfs-2.6.1.52.jar:na]


g. I also tried to change the hive table's location from
file:/tmp/examples/aggrfact1 to a HDFS location, and the Lens query
returned the same error.


For the second issue. I tried to query the hive table directory and found
the data is not loaded, because in the examples, partitions' location is
example/resource/..., which is not an absolute path from root '/', so, I
changed it to an absolute path, and the hive query 'Select * from
newdb.local_sales_aggr_fact1' is able to return the data, but the converted
lens query failed in the mapreduce stage because it cannot find the local
filesystem path I assigned.

B*TW, I use the company's hadoop cluster, with hadoop version 2.6.1, I
setup a hive on top of that using the version hive-0.13.1-inm downloaded
from Lens website, and setup a mysql database as the metastore.*

*Could you please help me resolve the above two issues?  I really
appreciate that.*


*Tao Yan*
Software Engineer
Data Analytics Infrastructure Tools and Services



206.250.5345
[email protected]
https://www.linkedin.com/in/taousc

Reply via email to