Hi Rajat & Amareshwari, Thanks very much for your suggestions. I will follow the steps.
BTW, will you please send me the lens-site.xml and hivedriver-site.xml files? It seems when I added the MySQL database configuration in lens-site.xml, Lens can directly talk with MySQL, is that right? Thanks. On Wed, Jul 27, 2016 at 5:13 AM, amareshwarisr . <[email protected]> wrote: > Tao, > > here are some points which can help you. > - Default examples setup works only with single node setup. Not sure if > you running in multinode cluster. > > - For missing htrace class, can you try adding htrace jar from hadoop > libraries in LENSCPPATH > > Thanks > > On Wed, Jul 27, 2016 at 9:49 AM, Tao Yan <[email protected]> wrote: > >> Hi Lens Developers, >> >> I have been doing Lens POC for my team for a while, and encountered >> several issues which prevent me from querying the example data. A brief >> summary of the issues: >> >> 1. Lens Query Failed to execute. >> 2. Partition data is not correctly loaded in hive. >> >> I have read through all the documents in Lens website, and followed >> step-by-step the 20 minutes demo. Following are the details for the first >> issue: >> >> a. It failed to execute the Lens query, and returned the following error: >> >> *lens-shell>query execute cube select product_id, store_sales from sales >> where time_range_in(order_time, '2015-04-11-00', '2015-04-13-00')* >> *26 Jul 2016 23:08:15 [Spring Shell] INFO cliLogger - Query handle: >> 69f44b77-97e9-4500-b385-a1881e5c365e* >> *26 Jul 2016 23:08:22 [Spring Shell] INFO cliLogger - User query: 'cube >> select product_id, store_sales from sales where time_range_in(order_time, >> '2015-04-11-00', '2015-04-13-00')' was submitted to hive/hive1* >> *26 Jul 2016 23:08:22 [Spring Shell] INFO cliLogger - Driver query: >> 'INSERT OVERWRITE DIRECTORY >> "file:/tmp/lensreports/hdfsout/69f44b77-97e9-4500-b385-a1881e5c365e" >> SELECT ( sales . product_id ), sum(( sales . store_sales )) FROM >> newdb.local_sales_aggr_fact1 sales WHERE ((((( sales . ot ) = '2015-04-11' >> ) or (( sales . ot ) = '2015-04-12' )))) GROUP BY ( sales . product_id ) ' >> and Driver handle: OperationHandle [opType=EXECUTE_STATEMENT, >> getHandleIdentifier()=8d1f3b8a-a660-472f-bfb7-32d6b73d0532]* >> *Command failed java.lang.NullPointerException* >> >> >> b. The server logs shows the following error: >> >> 26 Jul 2016 23:34:44 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >> [pool-19-thread-7] INFO org.apache.hadoop.hive.ql.exec.mr.ExecDriver - >> Executing: /export/apps/hadoop/latest/bin/hadoop jar >> /export/home/dev_svc/lens/apache-hive-0.13.1-inm-bin/lib/hive-common-0.13.1-inm.jar >> org.apache.hadoop.hive.ql.exec.mr.ExecDriver -plan >> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10003/plan.xml >> -jobconffile >> file:/tmp/dev_svc/hive_2016-07-26_16-34-41_205_3382750706353330702-8/-local-10002/jobconf.xml >> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Execution >> failed with exit status: 1 >> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - Obtaining >> error information >> 26 Jul 2016 23:34:47 [f3f5c221-e099-4f8e-9c32-42733365d3b6] >> [pool-19-thread-7] ERROR org.apache.hadoop.hive.ql.exec.Task - >> Task failed! >> Task ID: >> Stage-1 >> >> Logs: >> >> >> c. The session logs shows the following error(*** is used to replace >> internal cluster' hdfs path): >> >> 2016-07-20 17:45:04,103 ERROR [main]: mr.ExecDriver >> (SessionState.java:printError(572)) - Job Submission failed with exception >> 'java.lang.IllegalArgumentException(Wrong FS: >> hdfs:/*****/hadoop-2.6.1.*****.tar.gz, expected: file:///)' >> >> java.lang.IllegalArgumentException: Wrong FS: >> hdfs:/*****/hadoop-2.6.1.****.tar.gz, >> expected: file:/// >> >> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:645) >> >> at >> org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:465) >> >> at >> org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:119) >> >> at >> org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:455) >> >> at >> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143) >> >> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297) >> >> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >> >> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294) >> >> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562) >> >> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557) >> >> at java.security.AccessController.doPrivileged(Native Method) >> >> at javax.security.auth.Subject.doAs(Subject.java:422) >> >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) >> >> at >> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557) >> >> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548) >> >> at >> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420) >> >> at >> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:741) >> >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) >> >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> >> at java.lang.reflect.Method.invoke(Method.java:483) >> >> at org.apache.hadoop.util.RunJar.run(RunJar.java:221) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:136) >> >> >> d. Hive Cli shows the following result: >> >> hive (newdb)> describe formatted local_sales_aggr_fact1; >> OK >> # col_name data_type comment >> >> order_time timestamp >> delivery_time timestamp >> customer_id int >> product_id int >> promotion_id int >> customer_city_id int >> production_city_id int >> delivery_city_id int >> unit_sales bigint >> store_sales double >> store_cost double >> max_line_item_price float >> max_line_item_discount float >> >> # Partition Information >> # col_name data_type comment >> >> pt string Process time partition >> ot string Order time partition >> dt string Delivery time partition >> >> # Detailed Table Information >> Database: newdb >> Owner: null >> CreateTime: Mon Jul 25 16:21:03 PDT 2016 >> LastAccessTime: UNKNOWN >> Protect Mode: None >> Retention: 0 >> Location: file:/tmp/examples/aggrfact1 >> Table Type: EXTERNAL_TABLE >> Table Parameters: >> EXTERNAL TRUE >> cube.storagetable.partition.timeline.cache.DAILY.dt.first >> >> cube.storagetable.partition.timeline.cache.DAILY.dt.holes.size 0 >> >> cube.storagetable.partition.timeline.cache.DAILY.dt.latest >> >> cube.storagetable.partition.timeline.cache.DAILY.dt.storage.class >> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >> cube.storagetable.partition.timeline.cache.DAILY.ot.first >> >> cube.storagetable.partition.timeline.cache.DAILY.ot.holes.size 0 >> >> cube.storagetable.partition.timeline.cache.DAILY.ot.latest >> >> cube.storagetable.partition.timeline.cache.DAILY.ot.storage.class >> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >> cube.storagetable.partition.timeline.cache.DAILY.pt.first >> >> cube.storagetable.partition.timeline.cache.DAILY.pt.holes.size 0 >> >> cube.storagetable.partition.timeline.cache.DAILY.pt.latest >> >> cube.storagetable.partition.timeline.cache.DAILY.pt.storage.class >> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >> cube.storagetable.partition.timeline.cache.HOURLY.dt.first >> >> cube.storagetable.partition.timeline.cache.HOURLY.dt.holes.size 0 >> >> cube.storagetable.partition.timeline.cache.HOURLY.dt.latest >> >> cube.storagetable.partition.timeline.cache.HOURLY.dt.storage.class >> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >> cube.storagetable.partition.timeline.cache.HOURLY.ot.first >> >> cube.storagetable.partition.timeline.cache.HOURLY.ot.holes.size 0 >> >> cube.storagetable.partition.timeline.cache.HOURLY.ot.latest >> >> cube.storagetable.partition.timeline.cache.HOURLY.ot.storage.class >> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >> cube.storagetable.partition.timeline.cache.HOURLY.pt.first >> >> cube.storagetable.partition.timeline.cache.HOURLY.pt.holes.size 0 >> >> cube.storagetable.partition.timeline.cache.HOURLY.pt.latest >> >> cube.storagetable.partition.timeline.cache.HOURLY.pt.storage.class >> org.apache.lens.cube.metadata.timeline.EndsAndHolesPartitionTimeline >> cube.storagetable.partition.timeline.cache.present true >> cube.storagetable.time.partcols pt,ot,dt >> transient_lastDdlTime 1469574936 >> # Storage Information >> SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe >> InputFormat: org.apache.hadoop.mapred.TextInputFormat >> OutputFormat: >> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat >> Compressed: No >> Num Buckets: 0 >> Bucket Columns: [] >> Sort Columns: [] >> Storage Desc Params: >> field.delim , >> serialization.format , >> Time taken: 6.435 seconds, Fetched: 73 row(s) >> >> >> Because the query is finally against the hive table *local_sales_aggr_fact1, >> which use the storage local:* >> >> <x_storage classname="org.apache.lens.cube.metadata.HDFSStorage" >> name="local" xmlns="uri:lens:cube:0.1" >> xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> xsi:schemaLocation="uri:lens:cube:0.1 cube-0.1.xsd "> >> <properties> >> *<property name="storage.url" value="file:///"/>* >> </properties> >> </x_storage> >> >> >> *e. So, I tried to change the value from "file:///" to >> "hdfs:/hadoop-host:port/", (similar to >> examples/resources/local-cluster-storage.xml), which didn't work.f. I also >> tried to change the fact table "examples/resources/sales-aggr-fact1.xml"'s >> location from "/tmp/examples/aggrfact1" to HDFS path, and then add >> partition, it failed with the error:* >> lens-shell>fact add partitions --fact_name sales_aggr_fact1 >> --storage_name local --path >> examples/resources/sales-aggr-fact1-local-parts.xml >> *Command failed javax.ws.rs.InternalServerErrorException: HTTP 500 >> Request failed.-- * >> >> >> Corresponding Server logs: >> >> *Caused by: java.lang.NoClassDefFoundError: org/htrace/Trace* >> at >> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:214) >> ~[hadoop-common-2.6.1.52.jar:na] >> at com.sun.proxy.$Proxy89.getFileInfo(Unknown Source) ~[na:na] >> at >> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752) >> ~[hadoop-hdfs-2.6.1.52.jar:na] >> >> >> g. I also tried to change the hive table's location from >> file:/tmp/examples/aggrfact1 to a HDFS location, and the Lens query >> returned the same error. >> >> >> For the second issue. I tried to query the hive table directory and found >> the data is not loaded, because in the examples, partitions' location is >> example/resource/..., which is not an absolute path from root '/', so, I >> changed it to an absolute path, and the hive query 'Select * from >> newdb.local_sales_aggr_fact1' is able to return the data, but the converted >> lens query failed in the mapreduce stage because it cannot find the local >> filesystem path I assigned. >> >> B*TW, I use the company's hadoop cluster, with hadoop version 2.6.1, I >> setup a hive on top of that using the version hive-0.13.1-inm downloaded >> from Lens website, and setup a mysql database as the metastore.* >> >> *Could you please help me resolve the above two issues? I really >> appreciate that.* >> >> >> *Tao Yan* >> Software Engineer >> Data Analytics Infrastructure Tools and Services >> >> >> >> 206.250.5345 >> [email protected] >> https://www.linkedin.com/in/taousc >> > > -- *Tao Yan* Software Engineer Data Analytics Infrastructure Tools and Services 206.250.5345 [email protected] https://www.linkedin.com/in/taousc
