date:20170306

[jira] [Commented] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898070#comment-15898070
 ] 

Hive QA commented on HIVE-14901:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856329/HIVE-14901.9.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10324 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3970/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3970/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3970/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856329 - PreCommit-HIVE-Build

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.5.patch, 
> HIVE-14901.6.patch, HIVE-14901.7.patch, HIVE-14901.8.patch, 
> HIVE-14901.9.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16089) "trustStorePassword" is logged as part of jdbc connection url

2017-03-06 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898027#comment-15898027
 ] 

Peter Vary commented on HIVE-16089:
---

[~sfroehlich]: You could take a look at here: 
https://archive.cloudera.com/cdh5/cdh/5/hive-1.1.0-cdh5.7.0.CHANGES.txt or 
here: 
https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/bk_HDP_RelNotes/content/fixed_issues.html.
 Pick your favorite :)

> "trustStorePassword" is logged as part of jdbc connection url
> -
>
> Key: HIVE-16089
> URL: https://issues.apache.org/jira/browse/HIVE-16089
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.1.0
>Reporter: Sebastian Fröhlich
>  Labels: security
>
> h5. General Story
> The use case is to connect via the Apache Hive JDBC driver to a Hive where 
> SSL encryption is enabled.
> It was required to set the ssl-trust store password property 
> {{trustStorePassword}} in the jdbc connection url.
> If the property is passed via "properties" parameter into 
> {{Driver.connect(url, properties)}} this will not recognized.
> h5. Log message
> {code}
> 2017-03-03 09:57:58,385 [INFO] [InputInitializer {Map for sheets:[import] 
> (fce7cd11-d489-4a13-a3a9-4c81d2907c87)} #0] 
> |jdbc.Utils|: Resolved authority: :
> 2017-03-03 09:57:58,539 [INFO] [InputInitializer {Map for sheets:[import] 
> (fce7cd11-d489-4a13-a3a9-4c81d2907c87)} #0] |jdbc.HiveConnection|: Will try 
> to open client transport with JDBC Uri: 
> jdbc:hive2://:/;ssl=true;sslTrustStore=/tmp/hs2keystore.jks;trustStorePassword=
> {code}
> E.g. produced by code {{org.apache.hive.jdbc.HiveConnection#openTransport()}}
> h5. Suggested Behavior
> The property {{trustStorePassword}} could be part of the "properties" 
> parameter. This way the password is not part of the JDBC connection url.
> h5. Acceptance Criteria
> The ssl trust store password should not be logged as part of the JDBC 
> connection string.
> Support the trust store password via the properties parameter within connect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16129) log Tez DAG ID in places

2017-03-06 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-16129:
---


> log Tez DAG ID in places
> 
>
> Key: HIVE-16129
> URL: https://issues.apache.org/jira/browse/HIVE-16129
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> After TEZ-3550, we should be able to log Tez DAG ID early to have 
> queryId-dagId mapping when debugging



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16101) QTest failure BeeLine escape_comments after HIVE-16045

2017-03-06 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897992#comment-15897992
 ] 

Peter Vary commented on HIVE-16101:
---

[~ngangam]: Yes, the first failures were because of {{No space left on 
device}}. The second run was ok. [~kgyrtkirk] kindly reviewed the patch, so I 
think it could be committed.

Thanks,
Peter

> QTest failure BeeLine escape_comments after HIVE-16045
> --
>
> Key: HIVE-16101
> URL: https://issues.apache.org/jira/browse/HIVE-16101
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-16101.2.patch, HIVE-16101.patch
>
>
> HIVE-16045 committed immediately after HIVE-14459, and added two extra lines 
> to the output which is written there with another thread. We should remove 
> these lines before comparing the out file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16024) MSCK Repair Requires nonstrict hive.mapred.mode

2017-03-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897994#comment-15897994
 ] 

Sergio Peña commented on HIVE-16024:


As [~vihangk1] mentioned, seems this regression issue was introduced 
unintentionally by HIVE-13788. [~hsubramaniyan] Do you know if HIVE-13788 was 
attempting to prevent a user to use MSCK when hive.mapred.mode was in strict 
mode? If we use strict mode, then users won't be allowed to recover unknown 
partitions unless they mention the partition, but they're unknown.

> MSCK Repair Requires nonstrict hive.mapred.mode
> ---
>
> Key: HIVE-16024
> URL: https://issues.apache.org/jira/browse/HIVE-16024
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.2.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-16024.01.patch, HIVE-16024.02.patch, 
> HIVE-16024.03.patch, HIVE-16024.04.patch
>
>
> MSCK repair fails when hive.mapred.mode is set to strict
> HIVE-13788 modified the way we read up partitions for a table to improve 
> performance. Unfortunately it is using PartitionPruner to load the partitions 
> which in turn is checking hive.mapred.mode.
> The previous code did not check hive.mapred.mode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16128) Hive Import : change of partitioning order in case of multilevel partitioned table

2017-03-06 Thread Aditya Agarwal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aditya Agarwal reassigned HIVE-16128:
-


> Hive Import : change of partitioning order in case of multilevel partitioned 
> table 
> ---
>
> Key: HIVE-16128
> URL: https://issues.apache.org/jira/browse/HIVE-16128
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 0.13.0
>Reporter: Aditya Agarwal
>Assignee: Aditya Agarwal
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897928#comment-15897928
 ] 

Hive QA commented on HIVE-15160:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856323/HIVE-15160.07.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 366 failed/errored test(s), 10325 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=220)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[write_final_output_blobstore]
 (batchId=232)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization_partition]
 (batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization_project]
 (batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_char1] (batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_merge_2_orc] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[alter_varchar1] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_1] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_4] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_6] 
(batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_1] 
(batchId=18)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[authorization_view_2] 
(batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_partitioned] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_partitioned_native] 
(batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_schema_evolution_native]
 (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ba_table_union] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_table_bincolserde]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[binary_table_colserde] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cast_qualified_types] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_SortUnionTransposeRule]
 (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_input26] (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_cross_product_check_2]
 (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_1] (batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_2] (batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_nested_types] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[combine3] (batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_3] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlationoptimizer11] 
(batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cp_sel] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cross_product_check_2] 
(batchId=81)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cteViews] (batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_stats] 
(batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[decimal_udf] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_non_partitioned]
 (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_all_partitioned] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_tmp_table] 
(batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_where_no_match] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_where_non_partitioned]
 (batchId=36)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_where_partitioned]
 (batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[delete_whole_partition] 
(batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[distinct_windowing] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_basic2] 
(batchId=10)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[druid_topn] (batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[gen_udf_example_add10] 
(batchId=42)

[jira] [Comment Edited] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-03-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897927#comment-15897927
 ] 

Gopal V edited comment on HIVE-16113 at 3/6/17 7:51 PM:


The test failures are related - looks like some of the existing golden files 
are broken plans too.

Will update them after reviewing.


was (Author: gopalv):
The test failures are related - looks like some of the existing golden files 
are broken plans too.

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-03-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897927#comment-15897927
 ] 

Gopal V commented on HIVE-16113:


The test failures are related - looks like some of the existing golden files 
are broken plans too.

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16113) PartitionPruner::removeNonPartCols needs to handle AND/OR cases

2017-03-06 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897921#comment-15897921
 ] 

Sergey Shelukhin commented on HIVE-16113:
-

+1 assuming tests are unrelated

> PartitionPruner::removeNonPartCols needs to handle AND/OR cases
> ---
>
> Key: HIVE-16113
> URL: https://issues.apache.org/jira/browse/HIVE-16113
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16113.1.patch
>
>
> {code}
> create table daysales (customer int) partitioned by (dt string);
> insert into daysales partition(dt='2001-01-01') values(1);
> select * from daysales where nvl(dt='2001-01-01' and customer=1, false);
> 0 ROWS
> {code}
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L384
> {code}
> 2017-03-05T12:37:47,153  WARN [6f053d71-6ad6-4ad0-833d-337f2d499c82 main] 
> ppr.PartitionPruner: The expr = NVL(((dt = '2001-01-01') and null),false)
> {code}
> Because {{true and null => null}}, this turns into {{NVL(null, false)}} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16064) Allow ALL set quantifier with aggregate functions

2017-03-06 Thread Vineet Garg (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897916#comment-15897916
 ] 

Vineet Garg commented on HIVE-16064:


[~ashutoshc] It was intended but I have uploaded a new patch which has better 
change.

> Allow ALL set quantifier with aggregate functions
> -
>
> Key: HIVE-16064
> URL: https://issues.apache.org/jira/browse/HIVE-16064
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16064.1.patch, HIVE-16064.2.patch
>
>
> SQL:2011  allows  ALL with aggregate functions which is 
> equivalent to aggregate function without ALL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16064) Allow ALL set quantifier with aggregate functions

2017-03-06 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16064:
---
Attachment: HIVE-16064.2.patch

> Allow ALL set quantifier with aggregate functions
> -
>
> Key: HIVE-16064
> URL: https://issues.apache.org/jira/browse/HIVE-16064
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16064.1.patch, HIVE-16064.2.patch
>
>
> SQL:2011  allows  ALL with aggregate functions which is 
> equivalent to aggregate function without ALL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16064) Allow ALL set quantifier with aggregate functions

2017-03-06 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16064:
---
Status: Patch Available  (was: Open)

> Allow ALL set quantifier with aggregate functions
> -
>
> Key: HIVE-16064
> URL: https://issues.apache.org/jira/browse/HIVE-16064
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16064.1.patch, HIVE-16064.2.patch
>
>
> SQL:2011  allows  ALL with aggregate functions which is 
> equivalent to aggregate function without ALL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16064) Allow ALL set quantifier with aggregate functions

2017-03-06 Thread Vineet Garg (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16064:
---
Status: Open  (was: Patch Available)

> Allow ALL set quantifier with aggregate functions
> -
>
> Key: HIVE-16064
> URL: https://issues.apache.org/jira/browse/HIVE-16064
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Planning
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16064.1.patch, HIVE-16064.2.patch
>
>
> SQL:2011  allows  ALL with aggregate functions which is 
> equivalent to aggregate function without ALL.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-03-06 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897889#comment-15897889
 ] 

Prasanth Jayachandran commented on HIVE-16100:
--

I can see what the patch is trying to do wrt the clearing the siblings but the 
test case isn't showing the actual failure. Could you please update the test 
case so that we won't break it future? 

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16101) QTest failure BeeLine escape_comments after HIVE-16045

2017-03-06 Thread Naveen Gangam (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897876#comment-15897876
 ] 

Naveen Gangam commented on HIVE-16101:
--

[~pvary] Appears that it has been reviewed already. {{No space left on device}} 
was the failure. So to confirm, this patch need not be committed to the source 
code right? Thanks

> QTest failure BeeLine escape_comments after HIVE-16045
> --
>
> Key: HIVE-16101
> URL: https://issues.apache.org/jira/browse/HIVE-16101
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-16101.2.patch, HIVE-16101.patch
>
>
> HIVE-16045 committed immediately after HIVE-14459, and added two extra lines 
> to the output which is written there with another thread. We should remove 
> these lines before comparing the out file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15621) Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP

2017-03-06 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897860#comment-15897860
 ] 

Siddharth Seth commented on HIVE-15621:
---

HIVE-15644 is supposed to add these parameters back, directly, instead of 
relying on Hadoop.

If this was an API break, we could request Hadoop to fix the API.

> Use Hive's own JvmPauseMonitor instead of Hadoop's in LLAP
> --
>
> Key: HIVE-15621
> URL: https://issues.apache.org/jira/browse/HIVE-15621
> Project: Hive
>  Issue Type: Task
>  Components: llap
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 2.2.0
>
> Attachments: HIVE-15621.1.patch, HIVE-15621.2.patch, 
> HIVE-15621.3.patch, HIVE-15621.4.patch
>
>
> This is to avoid dependency on Hadoop's JvmPauseMonitor since Hive already 
> has its own implementation. HiveServer2 is already using Hive's .
> Need to follow up in HIVE-15644 to add the 3 missing JVM metrics for LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15857) Vectorization: Add string conversion case for UDFToInteger, etc

2017-03-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15857:

Attachment: HIVE-15857.04.patch

> Vectorization: Add string conversion case for UDFToInteger, etc
> ---
>
> Key: HIVE-15857
> URL: https://issues.apache.org/jira/browse/HIVE-15857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15857.01.patch, HIVE-15857.02.patch, 
> HIVE-15857.03.patch, HIVE-15857.04.patch
>
>
> Otherwise, VectorUDFAdaptor is used to convert a column from String to Int, 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15857) Vectorization: Add string conversion case for UDFToInteger, etc

2017-03-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15857:

Status: Patch Available  (was: In Progress)

> Vectorization: Add string conversion case for UDFToInteger, etc
> ---
>
> Key: HIVE-15857
> URL: https://issues.apache.org/jira/browse/HIVE-15857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15857.01.patch, HIVE-15857.02.patch, 
> HIVE-15857.03.patch, HIVE-15857.04.patch
>
>
> Otherwise, VectorUDFAdaptor is used to convert a column from String to Int, 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15857) Vectorization: Add string conversion case for UDFToInteger, etc

2017-03-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15857:

Status: In Progress  (was: Patch Available)

> Vectorization: Add string conversion case for UDFToInteger, etc
> ---
>
> Key: HIVE-15857
> URL: https://issues.apache.org/jira/browse/HIVE-15857
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-15857.01.patch, HIVE-15857.02.patch, 
> HIVE-15857.03.patch
>
>
> Otherwise, VectorUDFAdaptor is used to convert a column from String to Int, 
> etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-03-06 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Attachment: (was: HIVE-14901.9.patch)

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.5.patch, 
> HIVE-14901.6.patch, HIVE-14901.7.patch, HIVE-14901.8.patch, 
> HIVE-14901.9.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-03-06 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Status: Patch Available  (was: In Progress)

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.5.patch, 
> HIVE-14901.6.patch, HIVE-14901.7.patch, HIVE-14901.8.patch, 
> HIVE-14901.9.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-03-06 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Status: In Progress  (was: Patch Available)

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.5.patch, 
> HIVE-14901.6.patch, HIVE-14901.7.patch, HIVE-14901.8.patch, 
> HIVE-14901.9.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-03-06 Thread Norris Lee (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norris Lee updated HIVE-14901:
--
Attachment: HIVE-14901.9.patch

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.4.patch, HIVE-14901.5.patch, 
> HIVE-14901.6.patch, HIVE-14901.7.patch, HIVE-14901.8.patch, 
> HIVE-14901.9.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16065) Vectorization: Wrong Key/Value information used by Vectorizer

2017-03-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16065:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Vectorization: Wrong Key/Value information used by Vectorizer
> -
>
> Key: HIVE-16065
> URL: https://issues.apache.org/jira/browse/HIVE-16065
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16065.01.patch, HIVE-16065.07.patch, 
> HIVE-16065.08.patch, HIVE-16065.091.patch, HIVE-16065.09.patch
>
>
> Make Vectorizer class get reducer key/value information the same way 
> ExecReducer/ReduceRecordProcessor do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16065) Vectorization: Wrong Key/Value information used by Vectorizer

2017-03-06 Thread Matt McCline (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897848#comment-15897848
 ] 

Matt McCline commented on HIVE-16065:
-

Patch #91 committed to master.

> Vectorization: Wrong Key/Value information used by Vectorizer
> -
>
> Key: HIVE-16065
> URL: https://issues.apache.org/jira/browse/HIVE-16065
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16065.01.patch, HIVE-16065.07.patch, 
> HIVE-16065.08.patch, HIVE-16065.091.patch, HIVE-16065.09.patch
>
>
> Make Vectorizer class get reducer key/value information the same way 
> ExecReducer/ReduceRecordProcessor do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16065) Vectorization: Wrong Key/Value information used by Vectorizer

2017-03-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16065:

Fix Version/s: 2.2.0

> Vectorization: Wrong Key/Value information used by Vectorizer
> -
>
> Key: HIVE-16065
> URL: https://issues.apache.org/jira/browse/HIVE-16065
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE-16065.01.patch, HIVE-16065.07.patch, 
> HIVE-16065.08.patch, HIVE-16065.091.patch, HIVE-16065.09.patch
>
>
> Make Vectorizer class get reducer key/value information the same way 
> ExecReducer/ReduceRecordProcessor do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16105) LLAP: refactor executor pool to not depend on RejectedExecutionEx for preemption

2017-03-06 Thread Siddharth Seth (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897847#comment-15897847
 ] 

Siddharth Seth commented on HIVE-16105:
---

cc [~prasanth_j]

> LLAP: refactor executor pool to not depend on RejectedExecutionEx for 
> preemption
> 
>
> Key: HIVE-16105
> URL: https://issues.apache.org/jira/browse/HIVE-16105
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>
> There's a queue inside the threadpool consisting of one item (that's how we 
> set it up), which means that we can submit N+1 tasks and not get rejected, 
> with one task still not running and no preemption happening (note that 
> SyncQueue we pass in does not in fact block in TP, because TP calls offer not 
> put; and if it did, preemption would never trigger at all because the only 
> thread adding stuff to the TP would be blocked until the item was gone from 
> the queue, meaning that there'd never be a rejection). Having a threadpool 
> like this also limits our options to e.g. move the task that is being killed 
> out immediately to start another one (that itself is out of the scope of this 
> jira)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-12631) LLAP: support ORC ACID tables

2017-03-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897835#comment-15897835
 ] 

Eugene Koifman edited comment on HIVE-12631 at 3/6/17 7:04 PM:
---

[~teddy.choi]
Since this is only targeting acid 2.0, there should be 3 types of files (dirs):
base, delta and delete_delta.  There should not be any difference regarding 
caching base vs delta.

In fact, longer term we may even simplify this to just base and delete_delta so 
it may be better to just postpone the the delta caching part of this


was (Author: ekoifman):
[~teddy.choi]
Since this is only targeting acid 2.0, there should be 3 types of files (dirs):
base, delta and delete_delta.  There should not be any difference regarding 
caching base vs delta.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15920) Implement a blocking version of a command to compact

2017-03-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897844#comment-15897844
 ] 

Eugene Koifman commented on HIVE-15920:
---

all failed test have age > 1

> Implement a blocking version of a command to compact
> 
>
> Key: HIVE-15920
> URL: https://issues.apache.org/jira/browse/HIVE-15920
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15920.01.patch, HIVE-15920.02.patch
>
>
> currently 
> {noformat}
> alter table AcidTable compact 'major'
> {noformat} 
> is supported which enqueues a msg to compact.
> Would be nice for testing and script building to support 
> {noformat} 
> alter table AcidTable compact 'major' blocking
> {noformat} 
> perhaps another variation is to block until either compaction is done or 
> until cleaning is finished.
> DDLTask.compact() gets a request id back so it can then just block and wait 
> for it using some new API
> may also be useful to let users compact all partitions but only if  a 
> separate queue has been set up for compaction jobs.
> The later is because with a 1M partition table, this may create very many 
> jobs and saturate the cluster.
> This probably requires HIVE-12376 to make sure the compaction queue does the 
> throttling, not the number of worker threads



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-03-06 Thread Eugene Koifman (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897835#comment-15897835
 ] 

Eugene Koifman commented on HIVE-12631:
---

[~teddy.choi]
Since this is only targeting acid 2.0, there should be 3 types of files (dirs):
base, delta and delete_delta.  There should not be any difference regarding 
caching base vs delta.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16065) Vectorization: Wrong Key/Value information used by Vectorizer

2017-03-06 Thread Matt McCline (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16065:

Attachment: HIVE-16065.091.patch

Fixed up Spark golden files; Hive QA not re-run

> Vectorization: Wrong Key/Value information used by Vectorizer
> -
>
> Key: HIVE-16065
> URL: https://issues.apache.org/jira/browse/HIVE-16065
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16065.01.patch, HIVE-16065.07.patch, 
> HIVE-16065.08.patch, HIVE-16065.091.patch, HIVE-16065.09.patch
>
>
> Make Vectorizer class get reducer key/value information the same way 
> ExecReducer/ReduceRecordProcessor do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15920) Implement a blocking version of a command to compact

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897832#comment-15897832
 ] 

Hive QA commented on HIVE-15920:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856318/HIVE-15920.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10328 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3967/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3967/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3967/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856318 - PreCommit-HIVE-Build

> Implement a blocking version of a command to compact
> 
>
> Key: HIVE-15920
> URL: https://issues.apache.org/jira/browse/HIVE-15920
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15920.01.patch, HIVE-15920.02.patch
>
>
> currently 
> {noformat}
> alter table AcidTable compact 'major'
> {noformat} 
> is supported which enqueues a msg to compact.
> Would be nice for testing and script building to support 
> {noformat} 
> alter table AcidTable compact 'major' blocking
> {noformat} 
> perhaps another variation is to block until either compaction is done or 
> until cleaning is finished.
> DDLTask.compact() gets a request id back so it can then just block and wait 
> for it using some new API
> may also be useful to let users compact all partitions but only if  a 
> separate queue has been set up for compaction jobs.
> The later is because with a 1M partition table, this may create very many 
> jobs and saturate the cluster.
> This probably requires HIVE-12376 to make sure the compaction queue does the 
> throttling, not the number of worker threads



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897817#comment-15897817
 ] 

Ashutosh Chauhan commented on HIVE-16119:
-

I agree there is no need to disregard user config anymore.

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-15997) Resource leaks when query is cancelled

2017-03-06 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897742#comment-15897742
 ] 

Chaoyu Tang edited comment on HIVE-15997 at 3/6/17 6:30 PM:


Will TezTask be affected as well? Also I am not quite sure about this, for the 
code like this:
{code}  
try {
curatorFramework.delete().forPath(zLock.getPath());
  } catch (InterruptedException ie) {
curatorFramework.delete().forPath(zLock.getPath());
  }
{code}
catching InterruptedException will guarantee to clear the interrupted flag in 
the thread and calling the method second time will guarantee to succeed?



was (Author: ctang.ma):
Will TezTask be affected as well?

> Resource leaks when query is cancelled 
> ---
>
> Key: HIVE-15997
> URL: https://issues.apache.org/jira/browse/HIVE-15997
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15997.1.patch
>
>
> There may some resource leaks when query is cancelled.
> We see following stacks in the log:
> Possible files and folder leak:
> {noformat}
> 2017-02-02 06:23:25,410 WARN  hive.ql.Context: [HiveServer2-Background-Pool: 
> Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local 
> exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
> host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is: 
> "ychencdh511t-1.vpc.cloudera.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy25.delete(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy26.delete(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
>   at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405)
>   at org.apache.hadoop.hive.ql.Context.clear(Context.java:541)
>   at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109)
>   at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
>   at 
>

[jira] [Updated] (HIVE-16109) TestDbTxnManager generates a huge hive.log

2017-03-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16109:
--
Affects Version/s: 2.2.0

> TestDbTxnManager generates a huge hive.log
> --
>
> Key: HIVE-16109
> URL: https://issues.apache.org/jira/browse/HIVE-16109
> Project: Hive
>  Issue Type: Bug
>  Components: Tests, Transactions
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-16109.01.patch, HIVE-16109.02.patch
>
>
> Pre-commit jobs are failing currently due to out of disk space. The issue is 
> happening due to huge size of hive.log when TestDbTxnManager test fails or 
> times-out. When this test fails or times-out Ptest tries to persist these 
> logs for debugging. Since this test has been timing out frequently, this 
> accumulates a lot of these log files and eventually Ptest server runs of disk 
> space. Each run of TestDbTxnManager is generating ~30 GB of hive.log. I tried 
> to run it locally and it quickly reached 7 GB until I had to cancel it.
> The issue seems to be coming from this code block in TxnHandler.java
> {noformat}
> if(LOG.isDebugEnabled()) {
> LOG.debug("Locks to check(full): ");
> for(LockInfo info : locks) {
>   LOG.debug("  " + info);
> }
>   }
> {noformat}
> We should either change it trace or change the log mode of this test to INFO 
> so that it generates smaller log files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16109) TestDbTxnManager generates a huge hive.log

2017-03-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16109:
--
Component/s: Transactions
 Tests

> TestDbTxnManager generates a huge hive.log
> --
>
> Key: HIVE-16109
> URL: https://issues.apache.org/jira/browse/HIVE-16109
> Project: Hive
>  Issue Type: Bug
>  Components: Tests, Transactions
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-16109.01.patch, HIVE-16109.02.patch
>
>
> Pre-commit jobs are failing currently due to out of disk space. The issue is 
> happening due to huge size of hive.log when TestDbTxnManager test fails or 
> times-out. When this test fails or times-out Ptest tries to persist these 
> logs for debugging. Since this test has been timing out frequently, this 
> accumulates a lot of these log files and eventually Ptest server runs of disk 
> space. Each run of TestDbTxnManager is generating ~30 GB of hive.log. I tried 
> to run it locally and it quickly reached 7 GB until I had to cancel it.
> The issue seems to be coming from this code block in TxnHandler.java
> {noformat}
> if(LOG.isDebugEnabled()) {
> LOG.debug("Locks to check(full): ");
> for(LockInfo info : locks) {
>   LOG.debug("  " + info);
> }
>   }
> {noformat}
> We should either change it trace or change the log mode of this test to INFO 
> so that it generates smaller log files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897755#comment-15897755
 ] 

Vihang Karajgaonkar edited comment on HIVE-16119 at 3/6/17 6:18 PM:


On a second thought, I wonder if we really need to increase the pool size 
disregarding user configs. The reason it was added in the first place was due 
to an issue with the recursive call. Since that is no more the case 
(HIVE-15879), we should probably keep using the poolSize given by the config. 
Disregarding the user given config also has a side effect mentioned in 
HIVE-16014 where the mismatched pool size becomes a bottle neck, so increasing 
the pool size here will not necessarily mean that query performance is improved 
proportionally.


was (Author: vihangk1):
On a second thought, I wonder if we really need to increase the pool size 
disregarding user configs. The reason it was added in the first place was due 
to an issue with the recursive call. Since that is no more the case 
(HIVE-15879), we should probably keep using the poolSize given by the user.

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897755#comment-15897755
 ] 

Vihang Karajgaonkar commented on HIVE-16119:


On a second thought, I wonder if we really need to increase the pool size 
disregarding user configs. The reason it was added in the first place was due 
to an issue with the recursive call. Since that is no more the case 
(HIVE-15879), we should probably keep using the poolSize given by the user.

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Open  (was: Patch Available)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15903) Compute table stats when user computes column stats

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15903:
---
Status: Patch Available  (was: Open)

> Compute table stats when user computes column stats
> ---
>
> Key: HIVE-15903
> URL: https://issues.apache.org/jira/browse/HIVE-15903
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15903.01.patch, HIVE-15903.02.patch, 
> HIVE-15903.03.patch, HIVE-15903.04.patch, HIVE-15903.05.patch, 
> HIVE-15903.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15903) Compute table stats when user computes column stats

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15903:
---
Status: Open  (was: Patch Available)

> Compute table stats when user computes column stats
> ---
>
> Key: HIVE-15903
> URL: https://issues.apache.org/jira/browse/HIVE-15903
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15903.01.patch, HIVE-15903.02.patch, 
> HIVE-15903.03.patch, HIVE-15903.04.patch, HIVE-15903.05.patch, 
> HIVE-15903.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-13567) Auto-gather column stats - phase 2

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-13567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-13567:
---
Status: Patch Available  (was: Open)

> Auto-gather column stats - phase 2
> --
>
> Key: HIVE-13567
> URL: https://issues.apache.org/jira/browse/HIVE-13567
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-13567.01.patch
>
>
> in phase 2, we are going to set auto-gather column on as default. This needs 
> to update golden files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Attachment: HIVE-15160.07.patch

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Patch Available  (was: Open)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15160) Can't order by an unselected column

2017-03-06 Thread Pengcheng Xiong (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15160:
---
Status: Open  (was: Patch Available)

> Can't order by an unselected column
> ---
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15160.01.patch, HIVE-15160.02.patch, 
> HIVE-15160.04.patch, HIVE-15160.05.patch, HIVE-15160.06.patch, 
> HIVE-15160.07.patch
>
>
> If a grouping key hasn't been selected, Hive complains. For comparison, 
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select  i_item_desc
>,i_category
>,i_class
>,i_current_price
>,sum(cs_ext_sales_price) as itemrevenue
>,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
>(partition by i_class) as revenueratio
>  from catalog_sales
>  ,item
>  ,date_dim
>  where cs_item_sk = i_item_sk
>and i_category in ('Jewelry', 'Sports', 'Books')
>and cs_sold_date_sk = d_date_sk
>  and d_date between cast('2001-01-12' as date)
>   and (cast('2001-01-12' as date) + 30 days)
>  group by i_item_id
>  ,i_item_desc
>  ,i_category
>  ,i_class
>  ,i_current_price
>  order by i_category
>  ,i_class
>  ,i_item_id
>  ,i_item_desc
>  ,revenueratio
> limit 100;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Vihang Karajgaonkar (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897749#comment-15897749
 ] 

Vihang Karajgaonkar commented on HIVE-16119:


Thanks for the patch [~kgyrtkirk]. Always, a good idea to remove redundant 
code. Just one comment from my side. The conditional check {{poolSize = 
poolSize == 0 ? poolSize : Math.max(poolSize, getMinPoolSize());}} is not 
needed since it is executing in the else block. We could just replace it with 
{{poolSize = Math.max(poolSize, getMinPoolSize());}}

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15997) Resource leaks when query is cancelled

2017-03-06 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897742#comment-15897742
 ] 

Chaoyu Tang commented on HIVE-15997:


Will TezTask be affected as well?

> Resource leaks when query is cancelled 
> ---
>
> Key: HIVE-15997
> URL: https://issues.apache.org/jira/browse/HIVE-15997
> Project: Hive
>  Issue Type: Bug
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-15997.1.patch
>
>
> There may some resource leaks when query is cancelled.
> We see following stacks in the log:
> Possible files and folder leak:
> {noformat}
> 2017-02-02 06:23:25,410 WARN  hive.ql.Context: [HiveServer2-Background-Pool: 
> Thread-61]: Error Removing Scratch: java.io.IOException: Failed on local 
> exception: java.nio.channels.ClosedByInterruptException; Host Details : local 
> host is: "ychencdh511t-1.vpc.cloudera.com/172.26.11.50"; destination host is: 
> "ychencdh511t-1.vpc.cloudera.com":8020; 
>   at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1476)
>   at org.apache.hadoop.ipc.Client.call(Client.java:1409)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>   at com.sun.proxy.$Proxy25.delete(Unknown Source)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>   at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>   at com.sun.proxy.$Proxy26.delete(Unknown Source)
>   at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671)
>   at org.apache.hadoop.hive.ql.Context.removeScratchDir(Context.java:405)
>   at org.apache.hadoop.hive.ql.Context.clear(Context.java:541)
>   at org.apache.hadoop.hive.ql.Driver.releaseContext(Driver.java:2109)
>   at org.apache.hadoop.hive.ql.Driver.closeInProcess(Driver.java:2150)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1472)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1212)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1796)
>   at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.nio.channels.ClosedByInterruptException
>   at 
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
>   at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:681)
>   at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
>   at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:615)
>   at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:714)
>   at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
>   at

[jira] [Commented] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897735#comment-15897735
 ] 

Hive QA commented on HIVE-16122:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856294/HIVE-16122.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3966/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3966/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3966/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-03-06 17:56:11.275
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3966/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-03-06 17:56:11.277
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 4904ab7 HIVE-16034: Hive/Druid integration: Fix type inference 
for Decimal DruidOutputFormat (Jesus Camacho Rodriguez, reviewed by Ashutosh 
Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 4904ab7 HIVE-16034: Hive/Druid integration: Fix type inference 
for Decimal DruidOutputFormat (Jesus Camacho Rodriguez, reviewed by Ashutosh 
Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-03-06 17:56:12.343
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: 
a/druid-handler/src/java/org/apache/hadoop/hive/druid/io/HiveDruidSplit.java: 
No such file or directory
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856294 - PreCommit-HIVE-Build

> NPE Hive Druid split introduced by HIVE-15928
> -
>
> Key: HIVE-16122
> URL: https://issues.apache.org/jira/browse/HIVE-16122
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
> Attachments: HIVE-16122.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15920) Implement a blocking version of a command to compact

2017-03-06 Thread Eugene Koifman (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15920:
--
Attachment: HIVE-15920.02.patch

> Implement a blocking version of a command to compact
> 
>
> Key: HIVE-15920
> URL: https://issues.apache.org/jira/browse/HIVE-15920
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15920.01.patch, HIVE-15920.02.patch
>
>
> currently 
> {noformat}
> alter table AcidTable compact 'major'
> {noformat} 
> is supported which enqueues a msg to compact.
> Would be nice for testing and script building to support 
> {noformat} 
> alter table AcidTable compact 'major' blocking
> {noformat} 
> perhaps another variation is to block until either compaction is done or 
> until cleaning is finished.
> DDLTask.compact() gets a request id back so it can then just block and wait 
> for it using some new API
> may also be useful to let users compact all partitions but only if  a 
> separate queue has been set up for compaction jobs.
> The later is because with a 1M partition table, this may create very many 
> jobs and saturate the cluster.
> This probably requires HIVE-12376 to make sure the compaction queue does the 
> throttling, not the number of worker threads



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16089) "trustStorePassword" is logged as part of jdbc connection url

2017-03-06 Thread JIRA


[ 
https://issues.apache.org/jira/browse/HIVE-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897680#comment-15897680
 ] 

Sebastian Fröhlich commented on HIVE-16089:
---

[~zsombor.klara],
Thank you for the information. This is helpful.
It would be great if you could bring the fix also down to Hive 1.1.x as a 
security fix. Not many commercial Hadoop vendors using Hive 1.2.1 in their 
commercial Hadoop distributions. So the upgrade to Hive 1.2.1+ is not a real 
option for us.
But maybe this issue will be fixed separately in the impacted commercial 
distributed Hive versions.

> "trustStorePassword" is logged as part of jdbc connection url
> -
>
> Key: HIVE-16089
> URL: https://issues.apache.org/jira/browse/HIVE-16089
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.1.0
>Reporter: Sebastian Fröhlich
>  Labels: security
>
> h5. General Story
> The use case is to connect via the Apache Hive JDBC driver to a Hive where 
> SSL encryption is enabled.
> It was required to set the ssl-trust store password property 
> {{trustStorePassword}} in the jdbc connection url.
> If the property is passed via "properties" parameter into 
> {{Driver.connect(url, properties)}} this will not recognized.
> h5. Log message
> {code}
> 2017-03-03 09:57:58,385 [INFO] [InputInitializer {Map for sheets:[import] 
> (fce7cd11-d489-4a13-a3a9-4c81d2907c87)} #0] 
> |jdbc.Utils|: Resolved authority: :
> 2017-03-03 09:57:58,539 [INFO] [InputInitializer {Map for sheets:[import] 
> (fce7cd11-d489-4a13-a3a9-4c81d2907c87)} #0] |jdbc.HiveConnection|: Will try 
> to open client transport with JDBC Uri: 
> jdbc:hive2://:/;ssl=true;sslTrustStore=/tmp/hs2keystore.jks;trustStorePassword=
> {code}
> E.g. produced by code {{org.apache.hive.jdbc.HiveConnection#openTransport()}}
> h5. Suggested Behavior
> The property {{trustStorePassword}} could be part of the "properties" 
> parameter. This way the password is not part of the JDBC connection url.
> h5. Acceptance Criteria
> The ssl trust store password should not be logged as part of the JDBC 
> connection string.
> Support the trust store password via the properties parameter within connect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897643#comment-15897643
 ] 

Hive QA commented on HIVE-16119:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856282/HIVE-16119.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3964/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3964/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3964/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856282 - PreCommit-HIVE-Build

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16102) Grouping sets do not conform to SQL standard

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897550#comment-15897550
 ] 

Hive QA commented on HIVE-16102:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856277/HIVE-16102.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cte_1] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_grouping_operators]
 (batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_grouping_sets] 
(batchId=77)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[multi_count_distinct_null]
 (batchId=138)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_1] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[groupby_grouping_id2]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_grouping_sets]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[multi_count_distinct]
 (batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_cube1] 
(batchId=96)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_grouping_id2]
 (batchId=110)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3963/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3963/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3963/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856277 - PreCommit-HIVE-Build

> Grouping sets do not conform to SQL standard
> 
>
> Key: HIVE-16102
> URL: https://issues.apache.org/jira/browse/HIVE-16102
> Project: Hive
>  Issue Type: Bug
>  Components: Operators, Parser
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-16102.01.patch, HIVE-16102.patch
>
>
> [~ashutoshc] realized that the implementation of GROUPING__ID in Hive was not 
> returning values as specified by SQL standard and other execution engines.
> After digging into this, I found out that the implementation was bogus, as 
> internally it was changing between big-endian/little-endian representation of 
> GROUPING__ID indistinctly, and in some cases conversions in both directions 
> were cancelling each other.
> In the documentation in 
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
>  we can already find the problem, even if we did not spot it at first.
> {quote}
> The following query: SELECT key, value, GROUPING__ID, count(\*) from T1 GROUP 
> BY key, value WITH ROLLUP
> will have the following results.
> | NULL | NULL | 0 | 6 |
> | 1 | NULL | 1 | 2 |
> | 1 | NULL | 3 | 1 |
> | 1 | 1 | 3 | 1 |
> ...
> {quote}
> Observe that value for GROUPING__ID in first row should be `3`, while for 
> third and fourth rows, it should be `0`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16127) Separate database initialization from actual query run in TestBeeLineDriver

2017-03-06 Thread Peter Vary (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary reassigned HIVE-16127:
-


> Separate database initialization from actual query run in TestBeeLineDriver
> ---
>
> Key: HIVE-16127
> URL: https://issues.apache.org/jira/browse/HIVE-16127
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
>
> Improve the TestBeeLineDriver, so when running multiple tests, then reuse the 
> default database for multiple runs. This helps to keep the runtimes in check.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897511#comment-15897511
 ] 

Jesus Camacho Rodriguez commented on HIVE-16122:


[~bslim], I have been taking a look and we pass the hosts to the constructor of 
the superclass, which should set the locations? Could you share the stacktrace?

> NPE Hive Druid split introduced by HIVE-15928
> -
>
> Key: HIVE-16122
> URL: https://issues.apache.org/jira/browse/HIVE-16122
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
> Attachments: HIVE-16122.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897475#comment-15897475
 ] 

Jesus Camacho Rodriguez commented on HIVE-16122:


+1

> NPE Hive Druid split introduced by HIVE-15928
> -
>
> Key: HIVE-16122
> URL: https://issues.apache.org/jira/browse/HIVE-16122
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
> Attachments: HIVE-16122.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16116) Beeline throws NPE when beeline.hiveconfvariables={} in beeline.properties

2017-03-06 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897472#comment-15897472
 ] 

Peter Vary commented on HIVE-16116:
---

Errors not related.
For the BeeLine one (TestBeeLineDriver.testCliDriver escape_comments) see 
HIVE-16101.

Thanks for the patch [~rajesh.balamohan]!

+1 (non-binding)

> Beeline throws NPE when beeline.hiveconfvariables={} in beeline.properties
> --
>
> Key: HIVE-16116
> URL: https://issues.apache.org/jira/browse/HIVE-16116
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-16116.1.patch, HIVE-16116.2.patch
>
>
> Env: hive master
> Steps to reproduce:
> 1. clear previous beeline.properties (rm -rf ~/.beeline/beeline.properties)
> 2. Launch beeline, "!save" and exit. This would create new 
> "~/.beeline/beeline.properties", which would have 
> "beeline.hiveconfvariables={}"
> 3. Launch "beeline --hiveconf hive.tmp.dir=/tmp". This would throw NPE
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hive.beeline.BeeLine.setHiveConfVar(BeeLine.java:885)
> at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:832)
> at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:775)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1009)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897466#comment-15897466
 ] 

slim bouguerra commented on HIVE-16122:
---

[~ashutoshc] and [~jcamachorodriguez] can you please look at this.


> NPE Hive Druid split introduced by HIVE-15928
> -
>
> Key: HIVE-16122
> URL: https://issues.apache.org/jira/browse/HIVE-16122
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
> Attachments: HIVE-16122.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16124) Drop the segments data as soon it is pushed to HDFS

2017-03-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16124:
--
Status: Patch Available  (was: Open)

> Drop the segments data as soon it is pushed to HDFS
> ---
>
> Key: HIVE-16124
> URL: https://issues.apache.org/jira/browse/HIVE-16124
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>
> Drop the pushed segments from the indexer as soon as the HDFS push is done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16123) Let user pick the granularity of bucketing.

2017-03-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16123:
--
Status: Patch Available  (was: Open)

> Let user pick the granularity of bucketing.
> ---
>
> Key: HIVE-16123
> URL: https://issues.apache.org/jira/browse/HIVE-16123
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>
> Currently we index the data with granularity of none which puts lot of 
> pressure on the indexer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16123) Let user pick the granularity of bucketing.

2017-03-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16123:
--
Summary: Let user pick the granularity of bucketing.  (was: Let user chose 
the granularity of bucketing.)

> Let user pick the granularity of bucketing.
> ---
>
> Key: HIVE-16123
> URL: https://issues.apache.org/jira/browse/HIVE-16123
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>
> Currently we index the data with granularity of none which puts lot of 
> pressure on the indexer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16122:
--
Attachment: HIVE-16122.patch

> NPE Hive Druid split introduced by HIVE-15928
> -
>
> Key: HIVE-16122
> URL: https://issues.apache.org/jira/browse/HIVE-16122
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
> Attachments: HIVE-16122.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16122) NPE Hive Druid split introduced by HIVE-15928

2017-03-06 Thread slim bouguerra (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-16122:
--
Status: Patch Available  (was: Open)

> NPE Hive Druid split introduced by HIVE-15928
> -
>
> Key: HIVE-16122
> URL: https://issues.apache.org/jira/browse/HIVE-16122
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16116) Beeline throws NPE when beeline.hiveconfvariables={} in beeline.properties

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897453#comment-15897453
 ] 

Hive QA commented on HIVE-16116:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856276/HIVE-16116.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10329 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3962/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3962/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3962/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856276 - PreCommit-HIVE-Build

> Beeline throws NPE when beeline.hiveconfvariables={} in beeline.properties
> --
>
> Key: HIVE-16116
> URL: https://issues.apache.org/jira/browse/HIVE-16116
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-16116.1.patch, HIVE-16116.2.patch
>
>
> Env: hive master
> Steps to reproduce:
> 1. clear previous beeline.properties (rm -rf ~/.beeline/beeline.properties)
> 2. Launch beeline, "!save" and exit. This would create new 
> "~/.beeline/beeline.properties", which would have 
> "beeline.hiveconfvariables={}"
> 3. Launch "beeline --hiveconf hive.tmp.dir=/tmp". This would throw NPE
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hive.beeline.BeeLine.setHiveConfVar(BeeLine.java:885)
> at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:832)
> at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:775)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1009)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16089) "trustStorePassword" is logged as part of jdbc connection url

2017-03-06 Thread Barna Zsombor Klara (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897450#comment-15897450
 ] 

Barna Zsombor Klara commented on HIVE-16089:


Thank you for reporting the bug [~sfroehlich].
One part of this issue, the logging out of the jdbc connection string, has 
already been fixed in HIVE-12235. Could you maybe upgrade to a version of hive 
already containing the fix? (so Hive 1.2.1+)

> "trustStorePassword" is logged as part of jdbc connection url
> -
>
> Key: HIVE-16089
> URL: https://issues.apache.org/jira/browse/HIVE-16089
> Project: Hive
>  Issue Type: Bug
>  Components: JDBC
>Affects Versions: 1.1.0
>Reporter: Sebastian Fröhlich
>  Labels: security
>
> h5. General Story
> The use case is to connect via the Apache Hive JDBC driver to a Hive where 
> SSL encryption is enabled.
> It was required to set the ssl-trust store password property 
> {{trustStorePassword}} in the jdbc connection url.
> If the property is passed via "properties" parameter into 
> {{Driver.connect(url, properties)}} this will not recognized.
> h5. Log message
> {code}
> 2017-03-03 09:57:58,385 [INFO] [InputInitializer {Map for sheets:[import] 
> (fce7cd11-d489-4a13-a3a9-4c81d2907c87)} #0] 
> |jdbc.Utils|: Resolved authority: :
> 2017-03-03 09:57:58,539 [INFO] [InputInitializer {Map for sheets:[import] 
> (fce7cd11-d489-4a13-a3a9-4c81d2907c87)} #0] |jdbc.HiveConnection|: Will try 
> to open client transport with JDBC Uri: 
> jdbc:hive2://:/;ssl=true;sslTrustStore=/tmp/hs2keystore.jks;trustStorePassword=
> {code}
> E.g. produced by code {{org.apache.hive.jdbc.HiveConnection#openTransport()}}
> h5. Suggested Behavior
> The property {{trustStorePassword}} could be part of the "properties" 
> parameter. This way the password is not part of the JDBC connection url.
> h5. Acceptance Criteria
> The ssl trust store password should not be logged as part of the JDBC 
> connection string.
> Support the trust store password via the properties parameter within connect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16096) Predicate `__time` In ("date", "date") or Between "date" and "date" are not pushed to druid.

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897424#comment-15897424
 ] 

Jesus Camacho Rodriguez commented on HIVE-16096:


OK, then it can be tackled in a different Calcite issue; I did not mean to 
assign it to you, I just thought you had already fixed it.

> Predicate `__time` In ("date", "date")  or Between  "date" and "date" are not 
> pushed to druid.
> --
>
> Key: HIVE-16096
> URL: https://issues.apache.org/jira/browse/HIVE-16096
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>  Labels: calcite, druid
>
> {code}
>  explain select * from login_druid where `__time` in ("2003-1-1", "2004-1-1" 
> );
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:(__time) IN ('2003-1-1', '2004-1-1')
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}
> Between case
> {code}
>  explain select * from login_druid where `__time` between "2003-1-1" and 
> "2004-1-1" ;
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:__time BETWEEN '2003-1-1' AND '2004-1-1'
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16049) upgrade to jetty 9

2017-03-06 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897425#comment-15897425
 ] 

Sean Busbey edited comment on HIVE-16049 at 3/6/17 2:51 PM:


-01

  - updated dependencies for some additional jetty ones needed in hive-service
  - removed non-compatible sources of javax.servlet
  - updated to jdk 8 (updated to latest apache parent pom in the process)

I didn't see a [RESULT] for the vote Thejas started about jdk8+ yet, but it 
looks like it has consensus.

This patch passes doing {{mvn -DskipTests install}} at the top level followed 
by {{mvn verify}} of all the changed modules (common, hcatalog, llap-server, 
service, spark-client).

If there's additional testing folks would like to see beyond whatever the 
precommit process will check, let me know.


was (Author: busbey):
-01

  - updated dependencies for some additional jetty ones needed in hive-service
  - removed non-compatible sources of javax.servlet
  - updated to jdk 8 (updated to latest apache parent pom in the process)

I didn't see a [RESULT] for the vote Thejas started about jdk8+ yet, but it 
looks like it has consensus.

This patch passes doing {{mvn -DskipTests install}} at the top level followed 
by {{mvn verify}} of all the changed modules (common, hcatalog, llap-server, 
service, spark-client).

If there's additional testing beyond whatever the precommit process will check, 
let me know.

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16049) upgrade to jetty 9

2017-03-06 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-16049:
---
Release Note: JDK8+ is now required. Embedded web services now rely on 
Jetty 9; downstream users who rely on Hive's classpath for their Jetty jars 
will need to update their use for the change.
  Status: Patch Available  (was: In Progress)

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16049) upgrade to jetty 9

2017-03-06 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HIVE-16049:
---
Attachment: HIVE-16049.1.patch

-01

  - updated dependencies for some additional jetty ones needed in hive-service
  - removed non-compatible sources of javax.servlet
  - updated to jdk 8 (updated to latest apache parent pom in the process)

I didn't see a [RESULT] for the vote Thejas started about jdk8+ yet, but it 
looks like it has consensus.

This patch passes doing {{mvn -DskipTests install}} at the top level followed 
by {{mvn verify}} of all the changed modules (common, hcatalog, llap-server, 
service, spark-client).

If there's additional testing beyond whatever the precommit process will check, 
let me know.

> upgrade to jetty 9
> --
>
> Key: HIVE-16049
> URL: https://issues.apache.org/jira/browse/HIVE-16049
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sean Busbey
>Assignee: Sean Busbey
> Attachments: HIVE-16049.0.patch, HIVE-16049.1.patch
>
>
> Jetty 7 has been deprecated for a couple of years now. Hadoop and HBase have 
> both updated to Jetty 9 for their next major releases, which will complicate 
> classpath concerns.
> Proactively update to Jetty 9 in the few places we use a web server.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16096) Predicate `__time` In ("date", "date") or Between "date" and "date" are not pushed to druid.

2017-03-06 Thread slim bouguerra (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897417#comment-15897417
 ] 

slim bouguerra commented on HIVE-16096:
---

[~jcamachorodriguez] it is not, the time dimension will need more logic within 
the bound or we can do it as part of the interval.


> Predicate `__time` In ("date", "date")  or Between  "date" and "date" are not 
> pushed to druid.
> --
>
> Key: HIVE-16096
> URL: https://issues.apache.org/jira/browse/HIVE-16096
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: calcite, druid
>
> {code}
>  explain select * from login_druid where `__time` in ("2003-1-1", "2004-1-1" 
> );
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:(__time) IN ('2003-1-1', '2004-1-1')
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}
> Between case
> {code}
>  explain select * from login_druid where `__time` between "2003-1-1" and 
> "2004-1-1" ;
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:__time BETWEEN '2003-1-1' AND '2004-1-1'
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16096) Predicate `__time` In ("date", "date") or Between "date" and "date" are not pushed to druid.

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897389#comment-15897389
 ] 

Jesus Camacho Rodriguez commented on HIVE-16096:


[~bslim], is this one related to HIVE-16025/CALCITE-1655, or is there some 
specific logic involved because it is a filter on the time dimension?

> Predicate `__time` In ("date", "date")  or Between  "date" and "date" are not 
> pushed to druid.
> --
>
> Key: HIVE-16096
> URL: https://issues.apache.org/jira/browse/HIVE-16096
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: calcite, druid
>
> {code}
>  explain select * from login_druid where `__time` in ("2003-1-1", "2004-1-1" 
> );
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:(__time) IN ('2003-1-1', '2004-1-1')
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}
> Between case
> {code}
>  explain select * from login_druid where `__time` between "2003-1-1" and 
> "2004-1-1" ;
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:__time BETWEEN '2003-1-1' AND '2004-1-1'
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15584) Early bail out when we use CTAS and Druid source already exists

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897383#comment-15897383
 ] 

Jesus Camacho Rodriguez commented on HIVE-15584:


[~bslim], this has been solved, right? If it has been, could you close it as 
duplicate and link it? Thanks

> Early bail out when we use CTAS and Druid source already exists
> ---
>
> Key: HIVE-15584
> URL: https://issues.apache.org/jira/browse/HIVE-15584
> Project: Hive
>  Issue Type: Sub-task
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: slim bouguerra
>Priority: Minor
>
> If we create a Druid source from Hive with CTAS, but a Druid source with the 
> same name already exists, we fail (as expected).
> However, we bail out when the query for creating the query results has 
> already been executed.
> We should bail out earlier so we do not execute the query (and thus, launch 
> the Tez job, etc).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16096) Predicate `__time` In ("date", "date") or Between "date" and "date" are not pushed to druid.

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16096:
--

Assignee: slim bouguerra

> Predicate `__time` In ("date", "date")  or Between  "date" and "date" are not 
> pushed to druid.
> --
>
> Key: HIVE-16096
> URL: https://issues.apache.org/jira/browse/HIVE-16096
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>  Labels: calcite, druid
>
> {code}
>  explain select * from login_druid where `__time` in ("2003-1-1", "2004-1-1" 
> );
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:(__time) IN ('2003-1-1', '2004-1-1')
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}
> Between case
> {code}
>  explain select * from login_druid where `__time` between "2003-1-1" and 
> "2004-1-1" ;
> OK
> Plan optimized by CBO.
> Stage-0
>   Fetch Operator
> limit:-1
> Select Operator [SEL_2]
>   Output:["_col0","_col1","_col2"]
>   Filter Operator [FIL_4]
> predicate:__time BETWEEN '2003-1-1' AND '2004-1-1'
> TableScan [TS_0]
>   
> Output:["__time","userid","num_l"],properties:{"druid.query.json":"{\"queryType\":\"select\",\"dataSource\":\"druid_user_login\",\"descending\":false,\"intervals\":[\"1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z\"],\"dimensions\":[\"userid\"],\"metrics\":[\"num_l\"],\"granularity\":\"all\",\"pagingSpec\":{\"threshold\":16384},\"context\":{\"druid.query.fetch\":false}}","druid.query.type":"select"}
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15641) Hive/Druid integration: filter on timestamp not pushed to DruidQuery

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897380#comment-15897380
 ] 

Jesus Camacho Rodriguez commented on HIVE-15641:


[~nishantbangarwa], apparently I had already logged a similar issue. I think 
maybe there is a CAST on floor_day because of the STRING type of the BETWEEN 
bounds? Maybe adding a explicit CAST on the date strings as follows would fix 
the issue and Filter would be pushed, could you let me know? Thanks

{code:sql}
EXPLAIN
SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
sum(ss_wholesale_cost) as s
FROM store_sales_sold_time_subset
WHERE floor_day(`__time`) BETWEEN CAST('1999-11-01 00:00:00' AS TIMESTAMP) AND 
CAST('1999-11-10 00:00:00' AS TIMESTAMP)
GROUP BY i_brand_id, floor_day(`__time`)
ORDER BY s
LIMIT 10;
{code}

> Hive/Druid integration: filter on timestamp not pushed to DruidQuery
> 
>
> Key: HIVE-15641
> URL: https://issues.apache.org/jira/browse/HIVE-15641
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> It seems we are missing opportunity push Filter operation to DruidQuery.
> For instance, for the following query:
> {code:sql}
> EXPLAIN
> SELECT i_brand_id, floor_day(`__time`), max(ss_quantity), 
> sum(ss_wholesale_cost) as s
> FROM store_sales_sold_time_subset
> WHERE floor_day(`__time`) BETWEEN '1999-11-01 00:00:00' AND '1999-11-10 
> 00:00:00'
> GROUP BY i_brand_id, floor_day(`__time`)
> ORDER BY s
> LIMIT 10;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:10
> Stage-1
>   Reducer 3 vectorized
>   File Output Operator [FS_17]
> Limit [LIM_16] (rows=1 width=0)
>   Number of rows:10
>   Select Operator [SEL_15] (rows=1 width=0)
> Output:["_col0","_col1","_col2","_col3"]
>   <-Reducer 2 [SIMPLE_EDGE] vectorized
> SHUFFLE [RS_14]
>   Group By Operator [GBY_13] (rows=1 width=0)
> 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["max(VALUE._col0)","sum(VALUE._col1)"],keys:KEY._col0,
>  KEY._col1
>   <-Map 1 [SIMPLE_EDGE]
> SHUFFLE [RS_5]
>   PartitionCols:_col0, _col1
>   Group By Operator [GBY_4] (rows=1 width=0)
> 
> Output:["_col0","_col1","_col2","_col3"],aggregations:["max(_col2)","sum(_col3)"],keys:_col0,
>  _col1
> Select Operator [SEL_2] (rows=1 width=0)
>   Output:["_col0","_col1","_col2","_col3"]
>   Filter Operator [FIL_12] (rows=1 width=0)
> predicate:floor_day(__time) BETWEEN '1999-11-01 
> 00:00:00' AND '1999-11-10 00:00:00'
> TableScan [TS_0] (rows=15888 width=0)
>   
>

[jira] [Assigned] (HIVE-16121) Add flag to allow approximate results coming from Druid

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16121:
--


> Add flag to allow approximate results coming from Druid
> ---
>
> Key: HIVE-16121
> URL: https://issues.apache.org/jira/browse/HIVE-16121
> Project: Hive
>  Issue Type: Improvement
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Druid allows approximate results for some kind of operations and queries 
> (count distinct, top n, decimal type, ...). There are some flags in Calcite 
> to control this behavior; we should expose these flags in Hive so user can 
> set their value.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15708) Upgrade calcite version to 1.12

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897358#comment-15897358
 ] 

Hive QA commented on HIVE-15708:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856265/HIVE-15708.15.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10323 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[materialized_view_create_rewrite_multi_db]
 (batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reduce_deduplicate_extended2]
 (batchId=55)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_ppd_decimal]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=231)
org.apache.hive.jdbc.TestJdbcDriver2.testPrepareSetTimestamp (batchId=216)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3961/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3961/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3961/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856265 - PreCommit-HIVE-Build

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16034) Hive/Druid integration: Fix type inference for Decimal DruidOutputFormat

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16034:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for reviewing [~ashutoshc]!

> Hive/Druid integration: Fix type inference for Decimal DruidOutputFormat
> 
>
> Key: HIVE-16034
> URL: https://issues.apache.org/jira/browse/HIVE-16034
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Fix For: 2.2.0
>
> Attachments: HIVE-16034.01.patch, HIVE-16034.patch
>
>
> We are extracting the type name by String, which might cause issues, e.g., 
> for Decimal, where type includes precision and scale. Instead, we should 
> check the PrimitiveCategory enum.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16119:

Status: Patch Available  (was: Open)

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-16119:

Attachment: HIVE-16119.1.patch

#1 - removed singlethread path; some renames; updated the test because there 
was a slight possibility that it will overlook in some cases

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-16119.1.patch
>
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16119) HiveMetaStoreChecker - singleThread/parallel logic duplication

2017-03-06 Thread Zoltan Haindrich (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich reassigned HIVE-16119:
---

Assignee: Zoltan Haindrich

> HiveMetaStoreChecker - singleThread/parallel logic duplication
> --
>
> Key: HIVE-16119
> URL: https://issues.apache.org/jira/browse/HIVE-16119
> Project: Hive
>  Issue Type: Improvement
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
>
> It looks to me that the main logic is duplicated, because of multithereading 
> support:
> * {{HiveMetaStoreChecker#PathDepthInfoCallable#processPathDepthInfo}}
> * {{HiveMetaStoreChecker#checkPartitionDirsSingleThreaded}}
> It might be possible to remove the singleThreaded methods by using a special 
> executor for single thread support: {{MoreExecutors.sameThreadExecutor()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (HIVE-16120) Remove empty grouping sets restriction

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16120:
--


> Remove empty grouping sets restriction
> --
>
> Key: HIVE-16120
> URL: https://issues.apache.org/jira/browse/HIVE-16120
> Project: Hive
>  Issue Type: Improvement
>  Components: Parser
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Queries with empty grouping sets, such as the following:
> {code:sql}
> SELECT a FROM T1 GROUP BY a GROUPING SETS (());
> {code}
> are not allowed in Hive. The restriction was added in HIVE-3471, together 
> with some negative tests. However, the reason why this restriction is 
> included is not described in the JIRA case, and the review board link (where 
> there might be some additional information) does not work anymore. After 
> running some tests myself, empty grouping sets seems to be working perfectly 
> even when it is on its own; thus, it seems we could lift this restriction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16102) Grouping sets do not conform to SQL standard

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16102:
---
Target Version/s: 1.3.0, 2.2.0

> Grouping sets do not conform to SQL standard
> 
>
> Key: HIVE-16102
> URL: https://issues.apache.org/jira/browse/HIVE-16102
> Project: Hive
>  Issue Type: Bug
>  Components: Operators, Parser
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-16102.01.patch, HIVE-16102.patch
>
>
> [~ashutoshc] realized that the implementation of GROUPING__ID in Hive was not 
> returning values as specified by SQL standard and other execution engines.
> After digging into this, I found out that the implementation was bogus, as 
> internally it was changing between big-endian/little-endian representation of 
> GROUPING__ID indistinctly, and in some cases conversions in both directions 
> were cancelling each other.
> In the documentation in 
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
>  we can already find the problem, even if we did not spot it at first.
> {quote}
> The following query: SELECT key, value, GROUPING__ID, count(\*) from T1 GROUP 
> BY key, value WITH ROLLUP
> will have the following results.
> | NULL | NULL | 0 | 6 |
> | 1 | NULL | 1 | 2 |
> | 1 | NULL | 3 | 1 |
> | 1 | 1 | 3 | 1 |
> ...
> {quote}
> Observe that value for GROUPING__ID in first row should be `3`, while for 
> third and fourth rows, it should be `0`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16102) Grouping sets do not conform to SQL standard

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16102:
---
Attachment: HIVE-16102.01.patch

> Grouping sets do not conform to SQL standard
> 
>
> Key: HIVE-16102
> URL: https://issues.apache.org/jira/browse/HIVE-16102
> Project: Hive
>  Issue Type: Bug
>  Components: Operators, Parser
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-16102.01.patch, HIVE-16102.patch
>
>
> [~ashutoshc] realized that the implementation of GROUPING__ID in Hive was not 
> returning values as specified by SQL standard and other execution engines.
> After digging into this, I found out that the implementation was bogus, as 
> internally it was changing between big-endian/little-endian representation of 
> GROUPING__ID indistinctly, and in some cases conversions in both directions 
> were cancelling each other.
> In the documentation in 
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
>  we can already find the problem, even if we did not spot it at first.
> {quote}
> The following query: SELECT key, value, GROUPING__ID, count(\*) from T1 GROUP 
> BY key, value WITH ROLLUP
> will have the following results.
> | NULL | NULL | 0 | 6 |
> | 1 | NULL | 1 | 2 |
> | 1 | NULL | 3 | 1 |
> | 1 | 1 | 3 | 1 |
> ...
> {quote}
> Observe that value for GROUPING__ID in first row should be `3`, while for 
> third and fourth rows, it should be `0`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16116) Beeline throws NPE when beeline.hiveconfvariables={} in beeline.properties

2017-03-06 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-16116:

Attachment: HIVE-16116.2.patch

Thanks [~pvary]. Yes, it affects {{setHiveVariables}} as well. Uploading the 
revised patch.

> Beeline throws NPE when beeline.hiveconfvariables={} in beeline.properties
> --
>
> Key: HIVE-16116
> URL: https://issues.apache.org/jira/browse/HIVE-16116
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-16116.1.patch, HIVE-16116.2.patch
>
>
> Env: hive master
> Steps to reproduce:
> 1. clear previous beeline.properties (rm -rf ~/.beeline/beeline.properties)
> 2. Launch beeline, "!save" and exit. This would create new 
> "~/.beeline/beeline.properties", which would have 
> "beeline.hiveconfvariables={}"
> 3. Launch "beeline --hiveconf hive.tmp.dir=/tmp". This would throw NPE
> {noformat}
> Exception in thread "main" java.lang.NullPointerException
> at org.apache.hive.beeline.BeeLine.setHiveConfVar(BeeLine.java:885)
> at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:832)
> at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:775)
> at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1009)
> at 
> org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519)
> at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-15708) Upgrade calcite version to 1.12

2017-03-06 Thread Remus Rusanu (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-15708:

Attachment: HIVE-15708.15.patch

15.patch Consume CALCITE-1641 interface changes pushed in 1.12-SNAPSHOT, fix 
druid plan diff 

> Upgrade calcite version to 1.12
> ---
>
> Key: HIVE-15708
> URL: https://issues.apache.org/jira/browse/HIVE-15708
> Project: Hive
>  Issue Type: Task
>  Components: CBO, Logical Optimizer
>Affects Versions: 2.2.0
>Reporter: Ashutosh Chauhan
>Assignee: Remus Rusanu
> Attachments: HIVE-15708.01.patch, HIVE-15708.02.patch, 
> HIVE-15708.03.patch, HIVE-15708.04.patch, HIVE-15708.05.patch, 
> HIVE-15708.06.patch, HIVE-15708.07.patch, HIVE-15708.08.patch, 
> HIVE-15708.09.patch, HIVE-15708.10.patch, HIVE-15708.11.patch, 
> HIVE-15708.12.patch, HIVE-15708.13.patch, HIVE-15708.14.patch, 
> HIVE-15708.15.patch, HIVE-15708.15.patch
>
>
> Currently we are on 1.10 Need to upgrade calcite version to 1.11



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16115) Stop printing progress info from operation logs with beeline progress bar

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897195#comment-15897195
 ] 

Hive QA commented on HIVE-16115:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856248/HIVE-16115.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10328 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
 (batchId=33)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
org.apache.hive.beeline.TestBeeLineWithArgs.testQueryProgress (batchId=212)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3960/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3960/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3960/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856248 - PreCommit-HIVE-Build

> Stop printing progress info from operation logs with beeline progress bar
> -
>
> Key: HIVE-16115
> URL: https://issues.apache.org/jira/browse/HIVE-16115
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-16115.1.patch
>
>
> when in progress bar is enabled, we should not print the progress information 
> via the operations logs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15956) StackOverflowError when drop lots of partitions

2017-03-06 Thread Niklaus Xiao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897176#comment-15897176
 ] 

Niklaus Xiao commented on HIVE-15956:
-

Tests failure not related.

> StackOverflowError when drop lots of partitions
> ---
>
> Key: HIVE-15956
> URL: https://issues.apache.org/jira/browse/HIVE-15956
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Niklaus Xiao
>Assignee: Niklaus Xiao
> Attachments: HIVE-15956.patch
>
>
> Repro steps:
> 1. Create partitioned table and add 1 partitions
> {code}
> create table test_partition(id int) partitioned by (dt int);
> alter table test_partition add partition(dt=1);
> alter table test_partition add partition(dt=3);
> alter table test_partition add partition(dt=4);
> ...
> alter table test_partition add partition(dt=1);
> {code}
> 2. Drop 9000 partitions:
> {code}
> alter table test_partition drop partition(dt<9000);
> {code}
> Step 2 will fail with StackOverflowError:
> {code}
> Exception in thread "pool-7-thread-161" java.lang.StackOverflowError
> at 
> org.datanucleus.query.expression.ExpressionCompiler.isOperator(ExpressionCompiler.java:819)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:190)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:192)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:192)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> {code}
> {code}
> Exception in thread "pool-7-thread-198" java.lang.StackOverflowError
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:83)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Work started] (HIVE-15556) Replicate views

2017-03-06 Thread Sankar Hariappan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-15556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-15556 started by Sankar Hariappan.
---
> Replicate views
> ---
>
> Key: HIVE-15556
> URL: https://issues.apache.org/jira/browse/HIVE-15556
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15956) StackOverflowError when drop lots of partitions

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-15956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897130#comment-15897130
 ] 

Hive QA commented on HIVE-15956:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856246/HIVE-15956.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3959/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3959/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3959/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856246 - PreCommit-HIVE-Build

> StackOverflowError when drop lots of partitions
> ---
>
> Key: HIVE-15956
> URL: https://issues.apache.org/jira/browse/HIVE-15956
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Niklaus Xiao
>Assignee: Niklaus Xiao
> Attachments: HIVE-15956.patch
>
>
> Repro steps:
> 1. Create partitioned table and add 1 partitions
> {code}
> create table test_partition(id int) partitioned by (dt int);
> alter table test_partition add partition(dt=1);
> alter table test_partition add partition(dt=3);
> alter table test_partition add partition(dt=4);
> ...
> alter table test_partition add partition(dt=1);
> {code}
> 2. Drop 9000 partitions:
> {code}
> alter table test_partition drop partition(dt<9000);
> {code}
> Step 2 will fail with StackOverflowError:
> {code}
> Exception in thread "pool-7-thread-161" java.lang.StackOverflowError
> at 
> org.datanucleus.query.expression.ExpressionCompiler.isOperator(ExpressionCompiler.java:819)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:190)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:192)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileOrAndExpression(ExpressionCompiler.java:192)
> at 
> org.datanucleus.query.expression.ExpressionCompiler.compileExpression(ExpressionCompiler.java:179)
> {code}
> {code}
> Exception in thread "pool-7-thread-198" java.lang.StackOverflowError
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:83)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> at 
> org.datanucleus.query.expression.DyadicExpression.bind(DyadicExpression.java:87)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16102) Grouping sets do not conform to SQL standard

2017-03-06 Thread Jesus Camacho Rodriguez (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897111#comment-15897111
 ] 

Jesus Camacho Rodriguez commented on HIVE-16102:


[~ashutoshc], I have been investigating a little bit more the issue you pointed 
to.

One of the problems is that the message does not even describe clearly what the 
error is about: _having only empty groups in the grouping sets is not allowed_. 
This was added in HIVE-3471, together with some negative tests. However, the 
reason why this restriction is included is not described in the JIRA case, and 
the review board link (where there might be some additional information) does 
not work anymore. I added tests myself and empty grouping sets seems to be 
working perfectly even when it is on its own; I also do not see any reason why 
this should not be the case.

I am thinking that 1) I will revert that part of the patch and just change the 
error message, this will fix every issue while also being less risky to 
backport, and 2) then in a follow-up JIRA I will lift the restriction, as there 
is no need to backport that and we can just include it in next release.

Concerning HIVE_GROUPING_SETS_EXPR_NOT_IN_GROUPBY error, that one makes sense, 
since it seeks to prevent cases such as:
{code:sql}
SELECT a FROM T1 GROUP BY a GROUPING SETS (a, b);
{code}
where _b_ is not part of the group by expression.

> Grouping sets do not conform to SQL standard
> 
>
> Key: HIVE-16102
> URL: https://issues.apache.org/jira/browse/HIVE-16102
> Project: Hive
>  Issue Type: Bug
>  Components: Operators, Parser
>Affects Versions: 1.3.0, 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Critical
> Attachments: HIVE-16102.patch
>
>
> [~ashutoshc] realized that the implementation of GROUPING__ID in Hive was not 
> returning values as specified by SQL standard and other execution engines.
> After digging into this, I found out that the implementation was bogus, as 
> internally it was changing between big-endian/little-endian representation of 
> GROUPING__ID indistinctly, and in some cases conversions in both directions 
> were cancelling each other.
> In the documentation in 
> https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation,+Cube,+Grouping+and+Rollup
>  we can already find the problem, even if we did not spot it at first.
> {quote}
> The following query: SELECT key, value, GROUPING__ID, count(\*) from T1 GROUP 
> BY key, value WITH ROLLUP
> will have the following results.
> | NULL | NULL | 0 | 6 |
> | 1 | NULL | 1 | 2 |
> | 1 | NULL | 3 | 1 |
> | 1 | 1 | 3 | 1 |
> ...
> {quote}
> Observe that value for GROUPING__ID in first row should be `3`, while for 
> third and fourth rows, it should be `0`.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-03-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897084#comment-15897084
 ] 

Gopal V edited comment on HIVE-16100 at 3/6/17 10:52 AM:
-

The scenario is more along this

{code}
TS -> FIL -> SEL -> FS
 |
 + -> FS
{code}


{code}
Stage-4
  Stats-Aggr Operator
Stage-0
  Move Operator
table:{"name:":"testing.over1k_part4_0"}
Stage-3
  Dependency Collection{}
Stage-2
  Map 1 vectorized, llap
  File Output Operator [FS_10]
table:{"name:":"testing.over1k_part4_0"}
Select Operator [SEL_9] (rows=1 width=0)
  Output:["_col0","_col1"]
  Filter Operator [FIL_8] (rows=1 width=0)
predicate:(s like 'bob%')
TableScan [TS_0] (rows=1 width=0)
  
testing@over1k,over1k,Tbl:PARTIAL,Col:NONE,Output:["i","s"]
  File Output Operator [FS_11]
table:{"name:":"testing.over1k_part4_1"}
 Please refer to the previous Select Operator [SEL_9]
{code}

SEL_9 -> FS_11 
SEL_9 -> FS_10 

making the FS op have 2 parameters.

However, I see that the test-case passes even without the patch - because the 
backtracking is not cleared. Looks like the FS_11 -> SEL_9 parent relationship 
isn't modified by the optimizer.


was (Author: gopalv):
The scenario is more along this

{code}
TS -> FIL -> SEL -> FS
  |
  + -> FS
{code}

{code}
Stage-4
  Stats-Aggr Operator
Stage-0
  Move Operator
table:{"name:":"testing.over1k_part4_0"}
Stage-3
  Dependency Collection{}
Stage-2
  Map 1 vectorized, llap
  File Output Operator [FS_10]
table:{"name:":"testing.over1k_part4_0"}
Select Operator [SEL_9] (rows=1 width=0)
  Output:["_col0","_col1"]
  Filter Operator [FIL_8] (rows=1 width=0)
predicate:(s like 'bob%')
TableScan [TS_0] (rows=1 width=0)
  
testing@over1k,over1k,Tbl:PARTIAL,Col:NONE,Output:["i","s"]
  File Output Operator [FS_11]
table:{"name:":"testing.over1k_part4_1"}
 Please refer to the previous Select Operator [SEL_9]
{code}

SEL_9 -> FS_11 
SEL_9 -> FS_10 

making the FS op have 2 parameters.

However, I see that the test-case passes even without the patch - because the 
backtracking is not cleared. Looks like the FS_11 -> SEL_9 parent relationship 
isn't modified by the optimizer.

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-03-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897084#comment-15897084
 ] 

Gopal V edited comment on HIVE-16100 at 3/6/17 10:51 AM:
-

The scenario is more along this

{code}
TS -> FIL -> SEL -> FS
  |
  + -> FS
{code}

{code}
Stage-4
  Stats-Aggr Operator
Stage-0
  Move Operator
table:{"name:":"testing.over1k_part4_0"}
Stage-3
  Dependency Collection{}
Stage-2
  Map 1 vectorized, llap
  File Output Operator [FS_10]
table:{"name:":"testing.over1k_part4_0"}
Select Operator [SEL_9] (rows=1 width=0)
  Output:["_col0","_col1"]
  Filter Operator [FIL_8] (rows=1 width=0)
predicate:(s like 'bob%')
TableScan [TS_0] (rows=1 width=0)
  
testing@over1k,over1k,Tbl:PARTIAL,Col:NONE,Output:["i","s"]
  File Output Operator [FS_11]
table:{"name:":"testing.over1k_part4_1"}
 Please refer to the previous Select Operator [SEL_9]
{code}

SEL_9 -> FS_11 
SEL_9 -> FS_10 

making the FS op have 2 parameters.

However, I see that the test-case passes even without the patch - because the 
backtracking is not cleared. Looks like the FS_11 -> SEL_9 parent relationship 
isn't modified by the optimizer.


was (Author: gopalv):
The scenario is more along this

{code}
TS -> FIL -> SEL -> FS
   |
  + -> FS
{code}

{code}
Stage-4
  Stats-Aggr Operator
Stage-0
  Move Operator
table:{"name:":"testing.over1k_part4_0"}
Stage-3
  Dependency Collection{}
Stage-2
  Map 1 vectorized, llap
  File Output Operator [FS_10]
table:{"name:":"testing.over1k_part4_0"}
Select Operator [SEL_9] (rows=1 width=0)
  Output:["_col0","_col1"]
  Filter Operator [FIL_8] (rows=1 width=0)
predicate:(s like 'bob%')
TableScan [TS_0] (rows=1 width=0)
  
testing@over1k,over1k,Tbl:PARTIAL,Col:NONE,Output:["i","s"]
  File Output Operator [FS_11]
table:{"name:":"testing.over1k_part4_1"}
 Please refer to the previous Select Operator [SEL_9]
{code}

SEL_9 -> FS_11 
SEL_9 -> FS_10 

making the FS op have 2 parameters.

However, I see that the test-case passes even without the patch - because the 
backtracking is not cleared. Looks like the FS_11 -> SEL_9 parent relationship 
isn't modified by the optimizer.

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators

2017-03-06 Thread Gopal V (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897084#comment-15897084
 ] 

Gopal V commented on HIVE-16100:


The scenario is more along this

{code}
TS -> FIL -> SEL -> FS
   |
  + -> FS
{code}

{code}
Stage-4
  Stats-Aggr Operator
Stage-0
  Move Operator
table:{"name:":"testing.over1k_part4_0"}
Stage-3
  Dependency Collection{}
Stage-2
  Map 1 vectorized, llap
  File Output Operator [FS_10]
table:{"name:":"testing.over1k_part4_0"}
Select Operator [SEL_9] (rows=1 width=0)
  Output:["_col0","_col1"]
  Filter Operator [FIL_8] (rows=1 width=0)
predicate:(s like 'bob%')
TableScan [TS_0] (rows=1 width=0)
  
testing@over1k,over1k,Tbl:PARTIAL,Col:NONE,Output:["i","s"]
  File Output Operator [FS_11]
table:{"name:":"testing.over1k_part4_1"}
 Please refer to the previous Select Operator [SEL_9]
{code}

SEL_9 -> FS_11 
SEL_9 -> FS_10 

making the FS op have 2 parameters.

However, I see that the test-case passes even without the patch - because the 
backtracking is not cleared. Looks like the FS_11 -> SEL_9 parent relationship 
isn't modified by the optimizer.

> Dynamic Sorted Partition optimizer loses sibling operators
> --
>
> Key: HIVE-16100
> URL: https://issues.apache.org/jira/browse/HIVE-16100
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 1.2.1, 2.2.0, 2.1.1
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, 
> HIVE-16100.2.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173
> {code}
>   // unlink connection between FS and its parent
>   fsParent = fsOp.getParentOperators().get(0);
>   fsParent.getChildOperators().clear();
> {code}
> The optimizer discards any cases where the fsParent has another SEL child 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16101) QTest failure BeeLine escape_comments after HIVE-16045

2017-03-06 Thread Peter Vary (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897063#comment-15897063
 ] 

Peter Vary commented on HIVE-16101:
---

Failures are not related.

> QTest failure BeeLine escape_comments after HIVE-16045
> --
>
> Key: HIVE-16101
> URL: https://issues.apache.org/jira/browse/HIVE-16101
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-16101.2.patch, HIVE-16101.patch
>
>
> HIVE-16045 committed immediately after HIVE-14459, and added two extra lines 
> to the output which is written there with another thread. We should remove 
> these lines before comparing the out file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16101) QTest failure BeeLine escape_comments after HIVE-16045

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897061#comment-15897061
 ] 

Hive QA commented on HIVE-16101:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856244/HIVE-16101.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery 
(batchId=218)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3958/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3958/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3958/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856244 - PreCommit-HIVE-Build

> QTest failure BeeLine escape_comments after HIVE-16045
> --
>
> Key: HIVE-16101
> URL: https://issues.apache.org/jira/browse/HIVE-16101
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-16101.2.patch, HIVE-16101.patch
>
>
> HIVE-16045 committed immediately after HIVE-14459, and added two extra lines 
> to the output which is written there with another thread. We should remove 
> these lines before comparing the out file



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16115) Stop printing progress info from operation logs with beeline progress bar

2017-03-06 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16115:
---
Attachment: HIVE-16115.1.patch

> Stop printing progress info from operation logs with beeline progress bar
> -
>
> Key: HIVE-16115
> URL: https://issues.apache.org/jira/browse/HIVE-16115
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-16115.1.patch
>
>
> when in progress bar is enabled, we should not print the progress information 
> via the operations logs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16115) Stop printing progress info from operation logs with beeline progress bar

2017-03-06 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anishek updated HIVE-16115:
---
Status: Patch Available  (was: In Progress)

> Stop printing progress info from operation logs with beeline progress bar
> -
>
> Key: HIVE-16115
> URL: https://issues.apache.org/jira/browse/HIVE-16115
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-16115.1.patch
>
>
> when in progress bar is enabled, we should not print the progress information 
> via the operations logs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Work started] (HIVE-16115) Stop printing progress info from operation logs with beeline progress bar

2017-03-06 Thread anishek (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16115 started by anishek.
--
> Stop printing progress info from operation logs with beeline progress bar
> -
>
> Key: HIVE-16115
> URL: https://issues.apache.org/jira/browse/HIVE-16115
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-16115.1.patch
>
>
> when in progress bar is enabled, we should not print the progress information 
> via the operations logs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16115) Stop printing progress info from operation logs with beeline progress bar

2017-03-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896996#comment-15896996
 ] 

ASF GitHub Bot commented on HIVE-16115:
---

GitHub user anishek opened a pull request:

https://github.com/apache/hive/pull/155

HIVE-16115: Stop printing progress info from operation logs with beeline 
progress bar

also fixing the issue where the session hive configuration was not read 
correctly for hive.server2.in.place.progress

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anishek/hive HIVE-16115

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/155.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #155


commit 48425d463c2f75f868a68a7f92ddcf57cc852192
Author: Anishek Agarwal 
Date:   2017-03-06T09:45:30Z

HIVE-16115: Stop printing progress info from operation logs with beeline 
progress bar

also fixing the issue where the session hive configuration was not read 
correctly for hive.server2.in.place.progress




> Stop printing progress info from operation logs with beeline progress bar
> -
>
> Key: HIVE-16115
> URL: https://issues.apache.org/jira/browse/HIVE-16115
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Affects Versions: 2.2.0
>Reporter: anishek
>Assignee: anishek
>Priority: Minor
> Fix For: 2.2.0
>
>
> when in progress bar is enabled, we should not print the progress information 
> via the operations logs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16065) Vectorization: Wrong Key/Value information used by Vectorizer

2017-03-06 Thread Hive QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15896971#comment-15896971
 ] 

Hive QA commented on HIVE-16065:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12856238/HIVE-16065.09.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10327 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=229)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=224)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=224)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_between_in] 
(batchId=119)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] 
(batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_shufflejoin]
 (batchId=126)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3957/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3957/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3957/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12856238 - PreCommit-HIVE-Build

> Vectorization: Wrong Key/Value information used by Vectorizer
> -
>
> Key: HIVE-16065
> URL: https://issues.apache.org/jira/browse/HIVE-16065
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16065.01.patch, HIVE-16065.07.patch, 
> HIVE-16065.08.patch, HIVE-16065.09.patch
>
>
> Make Vectorizer class get reducer key/value information the same way 
> ExecReducer/ReduceRecordProcessor do.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

< 1 2 3 >

101 - 200 of 210 matches

Mail list logo