Review Request 55605: HIVE-15166 - Provide beeline option to set the jline history max size

2017-01-16 Thread Eric Lin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55605/
---

Review request for hive and Aihua Xu.


Bugs: HIVE-15166
https://issues.apache.org/jira/browse/HIVE-15166


Repository: hive-git


Description
---

Currently Beeline does not provide an option to limit the max size for beeline 
history file, in the case that each query is very big, it will flood the 
history file and slow down beeline on start up and shutdown.


Diffs
-

  beeline/src/java/org/apache/hive/beeline/BeeLine.java 65818dd 
  beeline/src/java/org/apache/hive/beeline/BeeLineOpts.java 9f330e3 
  beeline/src/main/resources/BeeLine.properties 141f0c6 
  beeline/src/test/org/apache/hive/beeline/TestBeelineArgParsing.java d73d374 

Diff: https://reviews.apache.org/r/55605/diff/


Testing
---

Manual testing + a simple test case.


Thanks,

Eric Lin



[jira] [Created] (HIVE-15646) Column level lineage is not available for table Views

2017-01-16 Thread Pengcheng Xiong (JIRA)
Pengcheng Xiong created HIVE-15646:
--

 Summary: Column level lineage is not available for table Views
 Key: HIVE-15646
 URL: https://issues.apache.org/jira/browse/HIVE-15646
 Project: Hive
  Issue Type: Bug
Reporter: Pengcheng Xiong
Assignee: Pengcheng Xiong






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15645) Tez session pool may restart sessions in a wrong queue

2017-01-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HIVE-15645:
---

 Summary: Tez session pool may restart sessions in a wrong queue
 Key: HIVE-15645
 URL: https://issues.apache.org/jira/browse/HIVE-15645
 Project: Hive
  Issue Type: Bug
Reporter: Carter Shanklin
Assignee: Sergey Shelukhin






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15644) Collect JVM metrics via JvmPauseMonitor

2017-01-16 Thread Wei Zheng (JIRA)
Wei Zheng created HIVE-15644:


 Summary: Collect JVM metrics via JvmPauseMonitor
 Key: HIVE-15644
 URL: https://issues.apache.org/jira/browse/HIVE-15644
 Project: Hive
  Issue Type: Bug
  Components: Hive
Affects Versions: 2.2.0
Reporter: Wei Zheng


Similar to what Hadoop's JvmMetrics is doing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Precommit jenkins is failing

2017-01-16 Thread Thejas Nair
+ Sergio,
Any idea what might be causing this ? Will you be able to take a look ?

On Mon, Jan 16, 2017 at 12:42 PM, Deepak Jaiswal 
wrote:

> Is there anyone who is looking into this?
>
> On 1/13/17, 10:46 AM, "Wei Zheng"  wrote:
>
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 12.954 s
> [INFO] Finished at: 2017-01-13T18:39:48+00:00
> [INFO] Final Memory: 47M/705M
> [INFO] 
> 
> + local 'PTEST_CLASSPATH=/home/jenkins/jenkins-slave/
> workspace/PreCommit-HIVE-Build/hive/build/hive/
> testutils/ptest2/target/hive-ptest-1.0-classes.jar:/home/
> jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/
> testutils/ptest2/target/lib/*'
> + java -cp '/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-
> Build/hive/build/hive/testutils/ptest2/target/hive-
> ptest-1.0-classes.jar:/home/jenkins/jenkins-slave/
> workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/lib/*'
> org.apache.hive.ptest.api.client.PTestClient --command testStart
> --outputDir /home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-
> Build/hive/build/hive/testutils/ptest2/target --password ''
> --testHandle PreCommit-HIVE-Build-2939 --endpoint
> http://104.198.109.242:8080/hive-ptest-1.0 --logsEndpoint
> http://104.198.109.242/logs/ --profile master-mr2 --patch
> https://issues.apache.org/jira/secure/attachment/
> 12847390/HIVE-15621.1.patch --jira HIVE-15621
> Exception in thread "main" javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
>   at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
>   at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1904)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:279)
>   at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:273)
>   at sun.security.ssl.ClientHandshaker.serverCertificate(
> ClientHandshaker.java:1446)
>   at sun.security.ssl.ClientHandshaker.processMessage(
> ClientHandshaker.java:209)
>   at sun.security.ssl.Handshaker.processLoop(Handshaker.java:913)
>   at sun.security.ssl.Handshaker.process_record(Handshaker.
> java:849)
>   at sun.security.ssl.SSLSocketImpl.readRecord(
> SSLSocketImpl.java:1023)
>   at sun.security.ssl.SSLSocketImpl.performInitialHandshake(
> SSLSocketImpl.java:1332)
>   at sun.security.ssl.SSLSocketImpl.startHandshake(
> SSLSocketImpl.java:1359)
>   at sun.security.ssl.SSLSocketImpl.startHandshake(
> SSLSocketImpl.java:1343)
>   at sun.net.www.protocol.https.HttpsClient.afterConnect(
> HttpsClient.java:559)
>   at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnec
> tion.connect(AbstractDelegateHttpsURLConnection.java:185)
>   at sun.net.www.protocol.http.HttpURLConnection.getInputStream(
> HttpURLConnection.java:1301)
>   at sun.net.www.protocol.https.HttpsURLConnectionImpl.
> getInputStream(HttpsURLConnectionImpl.java:254)
>   at java.net.URL.openStream(URL.java:1041)
>   at com.google.common.io.Resources$UrlByteSource.
> openStream(Resources.java:72)
>   at com.google.common.io.ByteSource.read(ByteSource.java:257)
>   at com.google.common.io.Resources.toByteArray(Resources.java:99)
>   at org.apache.hive.ptest.api.client.PTestClient.testStart(
> PTestClient.java:126)
>   at org.apache.hive.ptest.api.client.PTestClient.main(
> PTestClient.java:320)
> Caused by: sun.security.validator.ValidatorException: PKIX path
> building failed: sun.security.provider.certpath.SunCertPathBuilderException:
> unable to find valid certification path to requested target
>   at sun.security.validator.PKIXValidator.doBuild(
> PKIXValidator.java:385)
>   at sun.security.validator.PKIXValidator.engineValidate(
> PKIXValidator.java:292)
>   at sun.security.validator.Validator.validate(Validator.java:260)
>   at sun.security.ssl.X509TrustManagerImpl.validate(
> X509TrustManagerImpl.java:326)
>   at sun.security.ssl.X509TrustManagerImpl.checkTrusted(
> X509TrustManagerImpl.java:231)
>   at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(
> X509TrustManagerImpl.java:126)
>   at sun.security.ssl.ClientHandshaker.serverCertificate(
> ClientHandshaker.java:1428)
>   ... 17 more
> Caused by: sun.security.provider.certpath.SunCertPathBuilderException:
> unable to find valid certification path to requested target
>   at 

Re: Precommit jenkins is failing

2017-01-16 Thread Deepak Jaiswal
Is there anyone who is looking into this?

On 1/13/17, 10:46 AM, "Wei Zheng"  wrote:

[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 12.954 s
[INFO] Finished at: 2017-01-13T18:39:48+00:00
[INFO] Final Memory: 47M/705M
[INFO] 

+ local 
'PTEST_CLASSPATH=/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar:/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/lib/*'
+ java -cp 
'/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/hive-ptest-1.0-classes.jar:/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target/lib/*'
 org.apache.hive.ptest.api.client.PTestClient --command testStart --outputDir 
/home/jenkins/jenkins-slave/workspace/PreCommit-HIVE-Build/hive/build/hive/testutils/ptest2/target
 --password '' --testHandle PreCommit-HIVE-Build-2939 --endpoint 
http://104.198.109.242:8080/hive-ptest-1.0 --logsEndpoint 
http://104.198.109.242/logs/ --profile master-mr2 --patch 
https://issues.apache.org/jira/secure/attachment/12847390/HIVE-15621.1.patch 
--jira HIVE-15621
Exception in thread "main" javax.net.ssl.SSLHandshakeException: 
sun.security.validator.ValidatorException: PKIX path building failed: 
sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
valid certification path to requested target
  at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
  at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1904)
  at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:279)
  at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:273)
  at 
sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1446)
  at 
sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:209)
  at sun.security.ssl.Handshaker.processLoop(Handshaker.java:913)
  at sun.security.ssl.Handshaker.process_record(Handshaker.java:849)
  at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1023)
  at 
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1332)
  at 
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1359)
  at 
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1343)
  at 
sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
  at 
sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
  at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1301)
  at 
sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
  at java.net.URL.openStream(URL.java:1041)
  at 
com.google.common.io.Resources$UrlByteSource.openStream(Resources.java:72)
  at com.google.common.io.ByteSource.read(ByteSource.java:257)
  at com.google.common.io.Resources.toByteArray(Resources.java:99)
  at 
org.apache.hive.ptest.api.client.PTestClient.testStart(PTestClient.java:126)
  at 
org.apache.hive.ptest.api.client.PTestClient.main(PTestClient.java:320)
Caused by: sun.security.validator.ValidatorException: PKIX path building 
failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to 
find valid certification path to requested target
  at 
sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:385)
  at 
sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
  at sun.security.validator.Validator.validate(Validator.java:260)
  at 
sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:326)
  at 
sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:231)
  at 
sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:126)
  at 
sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1428)
  ... 17 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: 
unable to find valid certification path to requested target
  at 
sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:196)
  at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:268)
  at 
sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:380)
  ... 23 more



Thanks,
Wei




[jira] [Created] (HIVE-15643) remove use of default charset in FastHiveDecimal

2017-01-16 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-15643:


 Summary: remove use of default charset in FastHiveDecimal
 Key: HIVE-15643
 URL: https://issues.apache.org/jira/browse/HIVE-15643
 Project: Hive
  Issue Type: Bug
Reporter: Owen O'Malley


HIVE-15335 introduced some new uses of String.getBytes(), which uses the 
default char set. These need to be replaced with the version that always uses 
UTF8.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15642) Replicate Insert Overwrite & Dynamic Partition Inserts

2017-01-16 Thread Vaibhav Gumashta (JIRA)
Vaibhav Gumashta created HIVE-15642:
---

 Summary: Replicate Insert Overwrite & Dynamic Partition Inserts
 Key: HIVE-15642
 URL: https://issues.apache.org/jira/browse/HIVE-15642
 Project: Hive
  Issue Type: Sub-task
  Components: repl
Reporter: Vaibhav Gumashta
Assignee: Vaibhav Gumashta






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15641) Hive/Druid integration: filter on timestamp not pushed to DruidQuery

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15641:
--

 Summary: Hive/Druid integration: filter on timestamp not pushed to 
DruidQuery
 Key: HIVE-15641
 URL: https://issues.apache.org/jira/browse/HIVE-15641
 Project: Hive
  Issue Type: Improvement
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


It seems we are missing opportunity push Filter operation to DruidQuery.

For instance, for the following query:
{code:sql}
EXPLAIN
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)

Stage-0
  Fetch Operator
limit:10
Stage-1
  Reducer 3 vectorized
  File Output Operator [FS_17]
Limit [LIM_16] (rows=1 width=0)
  Number of rows:10
  Select Operator [SEL_15] (rows=1 width=0)
Output:["_col0","_col1","_col2","_col3"]
  <-Reducer 2 [SIMPLE_EDGE] vectorized
SHUFFLE [RS_14]
  Group By Operator [GBY_13] (rows=1 width=0)

Output:["_col0","_col1","_col2","_col3"],aggregations:["max(VALUE._col0)","sum(VALUE._col1)"],keys:KEY._col0,
 KEY._col1
  <-Map 1 [SIMPLE_EDGE]
SHUFFLE [RS_5]
  PartitionCols:_col0, _col1
  Group By Operator [GBY_4] (rows=1 width=0)

Output:["_col0","_col1","_col2","_col3"],aggregations:["max(_col2)","sum(_col3)"],keys:_col0,
 _col1
Select Operator [SEL_2] (rows=1 width=0)
  Output:["_col0","_col1","_col2","_col3"]
  Filter Operator [FIL_12] (rows=1 width=0)
predicate:floor_day(__time) BETWEEN '1999-11-01 
00:00:00' AND '1999-11-10 00:00:00'
TableScan [TS_0] (rows=15888 width=0)
  
tpcds_druid_10@store_sales_sold_time_subset,store_sales_sold_time_subset,Tbl:PARTIAL,Col:NONE,Output:["__time","i_brand_id","ss_quantity","ss_wholesale_cost"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"i_item_id\",\"i_rec_start_date\",\"i_rec_end_date\",\"i_item_desc\",\"i_brand_id\",\"i_brand\",\"i_class_id\",\"i_class\",\"i_category_id\",\"i_category\",\"i_manufact_id\",\"i_manufact\",\"i_size\",\"i_formulation\",\"i_color\",\"i_units\",\"i_container\",\"i_manager_id\",\"i_product_name\",\"c_customer_id\",\"c_salutation\",\"c_first_name\",\"c_last_name\",\"c_preferred_cust_flag\",\"c_birth_day\",\"c_birth_month\",\"c_birth_year\",\"c_birth_country\",\"c_login\",\"c_email_address\",\"c_last_review_date\",\"ca_address_id\",\"ca_street_number\",\"ca_street_name\",\"ca_street_type\",\"ca_suite_number\",\"ca_city\",\"ca_county\",\"ca_state\",\"ca_zip\",\"ca_country\",\"ca_gmt_offset\",\"ca_location_type\",\"s_store_id\",\"s_rec_start_date\",\"s_rec_end_date\",\"s_store_name\",\"s_hours\",\"s_manager\",\"s_market_id\",\"s_geography_class\",\"s_market_desc\",\"s_market_manager\",\"s_division_id\",\"s_division_name\",\"s_company_id\",\"s_company_name\",\"s_street_number\",\"s_street_name\",\"s_street_type\",\"s_suite_number\",\"s_city\",\"s_county\",\"s_state\",\"s_zip\",\"s_country\",\"s_gmt_offset\"],\"metrics\":[\"ss_ticket_number\",\"ss_quantity\",\"ss_wholesale_cost\",\"ss_list_price\",\"ss_sales_price\",\"ss_ext_discount_amt\",\"ss_ext_sales_price\",\"ss_ext_wholesale_cost\",\"ss_ext_list_price\",\"ss_ext_tax\",\"ss_coupon_amt\",\"ss_net_paid\",\"ss_net_paid_inc_tax\",\"ss_net_profit\",\"i_current_price\",\"i_wholesale_cost\",\"s_number_employees\",\"s_floor_space\",\"s_tax_precentage\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15640) Hive/Druid integration: null handling for metrics

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15640:
--

 Summary: Hive/Druid integration: null handling for metrics
 Key: HIVE-15640
 URL: https://issues.apache.org/jira/browse/HIVE-15640
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


Null values for metrics in Druid and Hive are not handled the same way (_0.0_ 
vs _NULL_).

In Druid:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
6015006 1999-11-03 00:00:00 0.0 0.0
9011009 1999-11-05 00:00:00 0.0 0.0
8003009 1999-11-03 00:00:00 11.01.029713897705
100050141999-11-05 00:00:00 86.01.10023841858
6008007 1999-11-09 00:00:00 81.01.370047683716
6003003 1999-11-08 00:00:00 45.01.60023841858
8008009 1999-11-08 00:00:00 98.01.710381469727
8015003 1999-11-02 00:00:00 10.01.740095367432
8004008 1999-11-10 00:00:00 45.01.759904632568
8009009 1999-11-07 00:00:00 81.01.769809265137
{code}

In Hive:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset_hive
WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
00:00:00'
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
6015006 1999-11-03 00:00:00 NULLNULL
9011009 1999-11-05 00:00:00 NULLNULL
8003009 1999-11-03 00:00:00 11  1.03
100050141999-11-05 00:00:00 86  1.1
6008007 1999-11-09 00:00:00 81  1.37
6003003 1999-11-08 00:00:00 45  1.6
8008009 1999-11-08 00:00:00 98  1.71
8015003 1999-11-02 00:00:00 10  1.74
8004008 1999-11-10 00:00:00 45  1.76
8009009 1999-11-07 00:00:00 81  1.77
{code}

However, for Druid dimensions, NULL values seem to be handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15639) HI

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15639:
--

 Summary: HI
 Key: HIVE-15639
 URL: https://issues.apache.org/jira/browse/HIVE-15639
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


The scope of the ordering seems to be for each different granularity value 
instead of globally. 

{code:sql}
EXPLAIN
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY i_brand_id
LIMIT 10;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0","_col1","_col2","_col3"]
  TableScan [TS_0]

Output:["i_brand_id","__time","$f2","$f3"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"granularity\":\"DAY\",\"dimensions\":[\"i_brand_id\"],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"i_brand_id\",\"direction\":\"ascending\"}]},\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"ss_quantity\"},{\"type\":\"doubleSum\",\"name\":\"$f3\",\"fieldName\":\"ss_wholesale_cost\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15638) ArrayIndexOutOfBoundsException when output Columns for UDTF are pruned

2017-01-16 Thread Nemon Lou (JIRA)
Nemon Lou created HIVE-15638:


 Summary: ArrayIndexOutOfBoundsException when output Columns for 
UDTF are pruned 
 Key: HIVE-15638
 URL: https://issues.apache.org/jira/browse/HIVE-15638
 Project: Hive
  Issue Type: Bug
  Components: Query Planning
Affects Versions: 2.1.0, 1.3.0
Reporter: Nemon Lou


{noformat}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error 
while processing row [Error getting row data with exception 
java.lang.ArrayIndexOutOfBoundsException: 151
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:183)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.buildJSONString(SerDeUtils.java:364)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:200)
at 
org.apache.hadoop.hive.serde2.SerDeUtils.getJSONString(SerDeUtils.java:186)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.toErrorMessage(MapOperator.java:525)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:494)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:180)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1710)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:174)
 ]
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:499)
at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
... 8 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ArrayIndexOutOfBoundsException: 151
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:416)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:878)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
at 
org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:149)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:489)
... 9 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 151
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.readVInt(LazyBinaryUtils.java:314)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryUtils.checkObjectByteInfo(LazyBinaryUtils.java:183)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.parse(LazyBinaryStruct.java:142)
at 
org.apache.hadoop.hive.serde2.lazybinary.LazyBinaryStruct.getField(LazyBinaryStruct.java:202)
at 
org.apache.hadoop.hive.serde2.lazybinary.objectinspector.LazyBinaryStructObjectInspector.getStructFieldData(LazyBinaryStructObjectInspector.java:64)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator._evaluate(ExprNodeColumnEvaluator.java:94)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:77)
at 
org.apache.hadoop.hive.ql.exec.ExprNodeEvaluator.evaluate(ExprNodeEvaluator.java:65)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.populateCachedDistributionKeys(ReduceSinkOperator.java:443)
at 
org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.process(ReduceSinkOperator.java:350)
... 13 more
{noformat}

The way to reproduce :
DDL:
{noformat}
create table tb_a(data_dt string,key string,src string,data_id string,tag_id 
string, entity_src string);
create table tb_b(pos_tagging string,src string,data_id string);
create table tb_c(key string,start_time string,data_dt string);
insert into tb_a values('20160901','CPI','04','data_id','tag_id','entity_src');
insert into tb_b values('pos_tagging','04','data_id');
insert into tb_c values('data_id','start_time_','20160901');
create function hwrl as 'HotwordRelationUDTF' using jar 
'hdfs:///tmp/nemon/udf/hotword.jar';

{noformat}

UDF File :
{code}
import java.util.ArrayList;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import 

[jira] [Created] (HIVE-15637) Hive/Druid integration: wrong semantics of groupBy query limit with granularity

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15637:
--

 Summary: Hive/Druid integration: wrong semantics of groupBy query 
limit with granularity
 Key: HIVE-15637
 URL: https://issues.apache.org/jira/browse/HIVE-15637
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


Similar to HIVE-15635, but for GroupBy queries. Limit is applied per 
granularity unit, not globally for the query.

{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1 vectorized
  File Output Operator [FS_4]
Select Operator [SEL_3] (rows=15888 width=0)
  Output:["_col0","_col1","_col2","_col3"]
  TableScan [TS_0] (rows=15888 width=0)

tpcds_druid_10@store_sales_sold_time_subset,store_sales_sold_time_subset,Tbl:PARTIAL,Col:NONE,Output:["i_brand_id","__time","$f2","$f3"],properties:{"druid.query.json":"{\"queryType\":\"groupBy\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"granularity\":\"DAY\",\"dimensions\":[\"i_brand_id\"],\"limitSpec\":{\"type\":\"default\",\"limit\":10,\"columns\":[{\"dimension\":\"$f3\",\"direction\":\"ascending\"}]},\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"ss_quantity\"},{\"type\":\"doubleSum\",\"name\":\"$f3\",\"fieldName\":\"ss_wholesale_cost\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"groupBy"}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15636) Hive/Druid integration: wrong semantics of topN query limit with granularity

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15636:
--

 Summary: Hive/Druid integration: wrong semantics of topN query 
limit with granularity
 Key: HIVE-15636
 URL: https://issues.apache.org/jira/browse/HIVE-15636
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


Semantics of Druid topN query with limit and granularity is not equivalent to 
input SQL. In particular, limit is applied on each granularity value, not on 
the overall query.

Currently, the following query will be transformed into a topN query:
{code:sql}
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s DESC
LIMIT 10;
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Stage-1
  Map 1 vectorized
  File Output Operator [FS_4]
Select Operator [SEL_3] (rows=15888 width=0)
  Output:["_col0","_col1","_col2","_col3"]
  TableScan [TS_0] (rows=15888 width=0)

tpcds_druid_10@store_sales_sold_time_subset,store_sales_sold_time_subset,Tbl:PARTIAL,Col:NONE,Output:["i_brand_id","__time","$f2","$f3"],properties:{"druid.query.json":"{\"queryType\":\"topN\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"granularity\":\"DAY\",\"dimension\":\"i_brand_id\",\"metric\":\"$f3\",\"aggregations\":[{\"type\":\"longMax\",\"name\":\"$f2\",\"fieldName\":\"ss_quantity\"},{\"type\":\"doubleSum\",\"name\":\"$f3\",\"fieldName\":\"ss_wholesale_cost\"}],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"threshold\":10}","druid.query.type":"topN"}
{code}

It outputs 300 rows, 10 per day. In turn, the equivalent SQL query for a Druid 
topN query should be expressed as:
{code:sql}
SELECT rs.i_brand_id, rs.d, rs.m, rs.s
FROM (
SELECT i_brand_id, floor_day(`__time`) as d, max(ss_quantity) as m, 
sum(ss_wholesale_cost) as s,
   ROW_NUMBER() OVER (PARTITION BY floor_day(`__time`) ORDER BY 
sum(ss_wholesale_cost) DESC ) AS rownum
FROM store_sales_sold_time_subset
GROUP BY i_brand_id, floor_day(`__time`)
) rs
WHERE rownum <= 10;
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 53619: HIVE-15161 migrate ColumnStats to use jackson

2017-01-16 Thread Zoltan Haindrich

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53619/
---

(Updated Jan. 16, 2017, 10:55 a.m.)


Review request for hive.


Changes
---

addressed comments ; rbt#2 is patch#5 on jira


Bugs: HIVE-15161
https://issues.apache.org/jira/browse/HIVE-15161


Repository: hive-git


Description
---

* json.org has license issues
* jackson can provide a fully compatible alternative to it
* there are a few flakiness issues caused by the order of the map entries of 
the columns...this can be addressed, org.json api was unfriendly in this manner 
;)
* fully backward compatible change


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 
25c7508f51662773e913a176bee7c8bd223202d4 
  common/src/test/org/apache/hadoop/hive/common/TestStatsSetupConst.java 
7a7ad424a8e53ed89c79592ced86c7c38eaf4e04 

Diff: https://reviews.apache.org/r/53619/diff/


Testing
---

added unit test


Thanks,

Zoltan Haindrich



Re: Review Request 53619: HIVE-15161 migrate ColumnStats to use jackson

2017-01-16 Thread Zoltan Haindrich


> On Jan. 4, 2017, 10 p.m., pengcheng xiong wrote:
> > common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java, line 200
> > 
> >
> > what is the difference between NON_DEFAULT and the following NON_EMPTY?

this field is only serialized when its not false;

in the other case: the list is only serialized when its not empty


> On Jan. 4, 2017, 10 p.m., pengcheng xiong wrote:
> > common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java, line 215
> > 
> >
> > We still need to have a try catch block for "// For backward 
> > compatibility, if previous value can not be parsed to a json object, it 
> > will come here." Because we do not have a json object format in very old 
> > versions, this will throw exception but we would like to return false.

the backward compatibility checking code is inside the parseStatsAcc method - 
so the outcome of this method depends on whenever or not that is able to handle 
it


> On Jan. 4, 2017, 10 p.m., pengcheng xiong wrote:
> > common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java, line 223
> > 
> >
> > The same for the backward compatibility issue.

parseStatsAcc should take responsibility in this case too :)


> On Jan. 4, 2017, 10 p.m., pengcheng xiong wrote:
> > common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java, line 323
> > 
> >
> > startsWith sounds not as good as previous "try catch block"

i wanted to identify when the input is in a totally unexpected format - I 
assume it's more preferable to go ahead and throw away possibly bad data in 
case there are problems... ( and startswith was also a bit more strict than it 
should: ex ' {}' is also a valid json :)

i've replaced it with try/catch


> On Jan. 4, 2017, 10 p.m., pengcheng xiong wrote:
> > common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java, line 337
> > 
> >
> > use TRUE please.

ok


> On Jan. 4, 2017, 10 p.m., pengcheng xiong wrote:
> > common/src/test/org/apache/hadoop/hive/common/TestStatsSetupConst.java, 
> > line 90
> > 
> >
> > This makes me worry about the difference between golden files for tests 
> > when we try to release a product... Do you mean that order is not preserved 
> > when we print them out? Could u add more test cases for "described extended 
> > [tableName]"? Thanks.

sometimes the order is broken...jdk version difference can cause that i've 
seen some test results with that error a while back; json.org usually preserves 
addition order - i've made some inprovements in this area a while ago: to 
stabilize outputs in the statssetupconst class - this test is the "last step" 
to keep it that way - protecting it with a junit test.

It's not easy to create a qtest for this; since it possibly depend on external 
changes (like jdk version) - I would prefer to skip it; this test will make 
sure that we will not see any of those problems later on.


- Zoltan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53619/#review160534
---


On Dec. 9, 2016, 1:14 p.m., Zoltan Haindrich wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53619/
> ---
> 
> (Updated Dec. 9, 2016, 1:14 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15161
> https://issues.apache.org/jira/browse/HIVE-15161
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> * json.org has license issues
> * jackson can provide a fully compatible alternative to it
> * there are a few flakiness issues caused by the order of the map entries of 
> the columns...this can be addressed, org.json api was unfriendly in this 
> manner ;)
> * fully backward compatible change
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/common/StatsSetupConst.java 
> 25c7508f51662773e913a176bee7c8bd223202d4 
>   common/src/test/org/apache/hadoop/hive/common/TestStatsSetupConst.java 
> 7a7ad424a8e53ed89c79592ced86c7c38eaf4e04 
> 
> Diff: https://reviews.apache.org/r/53619/diff/
> 
> 
> Testing
> ---
> 
> added unit test
> 
> 
> Thanks,
> 
> Zoltan Haindrich
> 
>



[jira] [Created] (HIVE-15635) Hive/Druid integration: timeseries query shows all days, even if no data

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15635:
--

 Summary: Hive/Druid integration: timeseries query shows all days, 
even if no data
 Key: HIVE-15635
 URL: https://issues.apache.org/jira/browse/HIVE-15635
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


We should have consistent results on Druid vs Hive. However, following query is 
transformed into timeseries Druid query which yields different results in 
Druid, since it will show all values for the given time granularity, even if 
there is no data for the given _i\_brand\_id_.

In Druid:
{code:sql}
SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), 
sum(ss_wholesale_cost)
FROM store_sales_sold_time_subset
WHERE i_brand_id = 10001009
GROUP BY floor_day(`__time`)
ORDER BY `granularity`;
OK
1999-11-01 00:00:00 45  37.47
1999-11-02 00:00:00 -92233720368547758080.0
1999-11-03 00:00:00 -92233720368547758080.0
1999-11-04 00:00:00 39  61.52
1999-11-05 00:00:00 74  145.84
1999-11-06 00:00:00 62  14.5
1999-11-07 00:00:00 -92233720368547758080.0
1999-11-08 00:00:00 5   34.08
1999-11-09 00:00:00 -92233720368547758080.0
1999-11-10 00:00:00 -92233720368547758080.0
1999-11-11 00:00:00 -92233720368547758080.0
1999-11-12 00:00:00 66  67.22
1999-11-13 00:00:00 -92233720368547758080.0
1999-11-14 00:00:00 -92233720368547758080.0
1999-11-15 00:00:00 -92233720368547758080.0
1999-11-16 00:00:00 60  96.37
1999-11-17 00:00:00 50  79.11
1999-11-18 00:00:00 -92233720368547758080.0
1999-11-19 00:00:00 -92233720368547758080.0
1999-11-20 00:00:00 -92233720368547758080.0
1999-11-21 00:00:00 -92233720368547758080.0
1999-11-22 00:00:00 -92233720368547758080.0
1999-11-23 00:00:00 57  17.69
1999-11-24 00:00:00 -92233720368547758080.0
1999-11-25 00:00:00 -92233720368547758080.0
1999-11-26 00:00:00 -92233720368547758080.0
1999-11-27 00:00:00 86  91.59
1999-11-28 00:00:00 -92233720368547758080.0
1999-11-29 00:00:00 93  136.48
1999-11-30 00:00:00 -92233720368547758080.0
{code}

In Hive:
{code:sql}
SELECT floor_day(`__time`) as `granularity`, max(ss_quantity), 
sum(ss_wholesale_cost)
FROM store_sales_sold_time_subset_hive
WHERE i_brand_id = 10001009
GROUP BY floor_day(`__time`)
ORDER BY `granularity`;
OK
1999-11-01 00:00:00 45  37.47
1999-11-04 00:00:00 39  61.52
1999-11-05 00:00:00 74  145.84
1999-11-06 00:00:00 62  14.5
1999-11-08 00:00:00 5   34.08
1999-11-12 00:00:00 66  67.22
1999-11-16 00:00:00 60  96.36
1999-11-17 00:00:00 50  79.11
1999-11-23 00:00:00 57  17.688
1999-11-27 00:00:00 86  91.59
1999-11-29 00:00:00 93  136.48
{code}

Probably we should handle this in the _timeseries_ record reader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15634) Hive/Druid integration: Timestamp column inconsistent w/o Fetch optimization

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15634:
--

 Summary: Hive/Druid integration: Timestamp column inconsistent w/o 
Fetch optimization
 Key: HIVE-15634
 URL: https://issues.apache.org/jira/browse/HIVE-15634
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


When Fetch optimizer kicks in because we can push the full query to Druid, we 
obtain different values for the timestamp than when jobs are executed. This 
probably has to do with the timezone on the client side.

For instance, this can be observed with the following query:
{code:sql}
set hive.fetch.task.conversion=more;
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-10 00:00:00';
OK
1999-10-31 19:00:00
1999-11-01 19:00:00
1999-11-02 19:00:00
1999-11-03 19:00:00
1999-11-04 19:00:00
1999-11-05 19:00:00
1999-11-06 19:00:00
1999-11-07 19:00:00
1999-11-08 19:00:00

set hive.fetch.task.conversion=none;
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-10 00:00:00';
OK
1999-11-01 00:00:00
1999-11-02 00:00:00
1999-11-03 00:00:00
1999-11-04 00:00:00
1999-11-05 00:00:00
1999-11-06 00:00:00
1999-11-07 00:00:00
1999-11-08 00:00:00
1999-11-09 00:00:00
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15633) Hive/Druid integration: Exception when time filter is not in datasource range

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15633:
--

 Summary: Hive/Druid integration: Exception when time filter is not 
in datasource range
 Key: HIVE-15633
 URL: https://issues.apache.org/jira/browse/HIVE-15633
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez


When _metadataList.isEmpty()_ (L222 in DruidQueryBasedInputFormat) returns 
true, we throw an Exception. However, this is true if query filters on range 
that is not within datasource timestamp ranges. Thus, we should only throw the 
Exception if _metadataList_ is null.

Issue can be reproduced with the following query if timestamp values are all 
greater or equal than '1999-11-01 00:00:00':

{code:sql}
SELECT COUNT(`__time`)
FROM store_sales_sold_time_subset
WHERE `__time` < '1999-11-01 00:00:00';
{code}

{noformat}
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1484282558103_0067_2_00, 
diagnostics=[Vertex vertex_1484282558103_0067_2_00 [Map 1] killed/failed due 
to:ROOT_INPUT_INIT_FAILURE, Vertex Input: store_sales_sold_time_subset 
initializer failed, vertex=vertex_1484282558103_0067_2_00 [Map 1], 
java.io.IOException: Connected to Druid but could not retrieve datasource 
information
at 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.splitSelectQuery(DruidQueryBasedInputFormat.java:224)
at 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:140)
at 
org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:92)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:367)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:485)
at 
org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at 
org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HIVE-15632) Hive/Druid integration: Incorrect result - Limit on timestamp disappears

2017-01-16 Thread Jesus Camacho Rodriguez (JIRA)
Jesus Camacho Rodriguez created HIVE-15632:
--

 Summary: Hive/Druid integration: Incorrect result - Limit on 
timestamp disappears
 Key: HIVE-15632
 URL: https://issues.apache.org/jira/browse/HIVE-15632
 Project: Hive
  Issue Type: Bug
  Components: Druid integration
Affects Versions: 2.2.0
Reporter: Jesus Camacho Rodriguez
Assignee: Jesus Camacho Rodriguez
Priority: Critical


This can be observed with the following query:

{code:sql}
SELECT DISTINCT `__time`
FROM store_sales_sold_time_subset_hive
ORDER BY `__time` ASC
LIMIT 10;
{code}

Query is translated correctly to Druid _timeseries_, but _limit_ operator 
disappears.
{code}
OK
Plan optimized by CBO.

Stage-0
  Fetch Operator
limit:-1
Select Operator [SEL_1]
  Output:["_col0"]
  TableScan [TS_0]

Output:["__time"],properties:{"druid.query.json":"{\"queryType\":\"timeseries\",\"dataSource\":\"druid_tpcds_ss_sold_time_subset\",\"descending\":false,\"granularity\":\"NONE\",\"aggregations\":[],\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"]}","druid.query.type":"timeseries"}
{code}

Thus, result has more than 10 rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)