[jira] [Commented] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106035#comment-16106035
 ] 

Hive QA commented on HIVE-17201:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879389/HIVE-17201.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sample_islocalmode_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6181/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6181/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6181/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879389 - PreCommit-HIVE-Build

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16982) WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration option

2017-07-28 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16982:
--
Labels: TODOC3.0 newbie patch  (was: newbie patch)

> WebUI "Show Query" tab prints "UNKNOWN" instead of explaining configuration 
> option
> --
>
> Key: HIVE-16982
> URL: https://issues.apache.org/jira/browse/HIVE-16982
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration, Web UI
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: TODOC3.0, newbie, patch
> Fix For: 3.0.0
>
> Attachments: HIVE-16982.3.patch
>
>
> In the Hive WebUI / Drilldown: the Show Query tab always displays "UNKNOWN."
> If the user wants to see the query plan here, they should set configuration 
> hive.log.explain.output to true. The user should be made aware of this option:
> 1) in WebUI / Drilldown / Show Query and
> 2) in HiveConf.java, line 2232.
> This configuration's description reads:
> "Whether to log explain output for every query
> When enabled, will log EXPLAIN EXTENDED output for the query at INFO log4j 
> log level."
> this should be added:
> "...and in the WebUI / Show Query tab."



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106018#comment-16106018
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879340/HIVE-12631.25.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11020 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] 
(batchId=38)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[external_table_ppd] 
(batchId=93)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6180/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6180/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6180/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879340 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.24.patch, 
> HIVE-12631.25.patch, HIVE-12631.2.patch, HIVE-12631.3.patch, 
> HIVE-12631.4.patch, HIVE-12631.5.patch, HIVE-12631.6.patch, 
> HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16985) LLAP IO: enable SMB join in elevator after the former is fixed

2017-07-28 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16985:
--
Attachment: HIVE-16985.1.patch

[~sershe] Can you please review? Enabled the IO elevator code.

> LLAP IO: enable SMB join in elevator after the former is fixed
> --
>
> Key: HIVE-16985
> URL: https://issues.apache.org/jira/browse/HIVE-16985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16985.1.patch
>
>
> We currently skip the IO elevator when we encounter an SMB join (see 
> HIVE-16761). However, it might work with elevator with the code commented out 
> in HIVE-16761. Need to look again after HIVE-16965 is fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17195) Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105991#comment-16105991
 ] 

Hive QA commented on HIVE-17195:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879338/HIVE-17195.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 33 failed/errored test(s), 11017 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[add_part_multiple] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] 
(batchId=82)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
 (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_partitioned] 
(batchId=11)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_bitmap] 
(batchId=59)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_bitmap_auto_partitioned]
 (batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_bitmap_rc] 
(batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_compact] 
(batchId=20)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_compact_2] 
(batchId=25)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[infer_bucket_sort_multi_insert]
 (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input12] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input13] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[input_part2] (batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[load_dyn_part1] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[metadata_only_queries_with_filters]
 (batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_date2] 
(batchId=17)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp2] 
(batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[pcr] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[stats4] (batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_udf] 
(batchId=75)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union18] (batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union34] (batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[union_stats] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_multi_insert] 
(batchId=82)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6179/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6179/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6179/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 33 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879338 - PreCommit-HIVE-Build

> Long chain of tasks created by REPL LOAD shouldn't cause stack corruption.
> --
>
> Key: HIVE-17195
> URL: https://issues.apache.org/jira/browse/HIVE-17195
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DAG, DR, Executor, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17195.01.patch
>
>
> Currently, long chain REPL LOAD tasks lead to huge recursive calls when try 
> to traverse the DAG.
> For example, getMRTasks, getTezTasks, getSparkTasks and iterateTasks methods 
> run recursively to traverse the DAG.
> Need to modify this traversal logic to 

[jira] [Work started] (HIVE-16985) LLAP IO: enable SMB join in elevator after the former is fixed

2017-07-28 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16985 started by Deepak Jaiswal.
-
> LLAP IO: enable SMB join in elevator after the former is fixed
> --
>
> Key: HIVE-16985
> URL: https://issues.apache.org/jira/browse/HIVE-16985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> We currently skip the IO elevator when we encounter an SMB join (see 
> HIVE-16761). However, it might work with elevator with the code commented out 
> in HIVE-16761. Need to look again after HIVE-16965 is fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work stopped] (HIVE-16985) LLAP IO: enable SMB join in elevator after the former is fixed

2017-07-28 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16985 stopped by Deepak Jaiswal.
-
> LLAP IO: enable SMB join in elevator after the former is fixed
> --
>
> Key: HIVE-16985
> URL: https://issues.apache.org/jira/browse/HIVE-16985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> We currently skip the IO elevator when we encounter an SMB join (see 
> HIVE-16761). However, it might work with elevator with the code commented out 
> in HIVE-16761. Need to look again after HIVE-16965 is fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-16985) LLAP IO: enable SMB join in elevator after the former is fixed

2017-07-28 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16985 started by Deepak Jaiswal.
-
> LLAP IO: enable SMB join in elevator after the former is fixed
> --
>
> Key: HIVE-16985
> URL: https://issues.apache.org/jira/browse/HIVE-16985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> We currently skip the IO elevator when we encounter an SMB join (see 
> HIVE-16761). However, it might work with elevator with the code commented out 
> in HIVE-16761. Need to look again after HIVE-16965 is fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16985) LLAP IO: enable SMB join in elevator after the former is fixed

2017-07-28 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-16985:
--
Status: Patch Available  (was: In Progress)

> LLAP IO: enable SMB join in elevator after the former is fixed
> --
>
> Key: HIVE-16985
> URL: https://issues.apache.org/jira/browse/HIVE-16985
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
>
> We currently skip the IO elevator when we encounter an SMB join (see 
> HIVE-16761). However, it might work with elevator with the code commented out 
> in HIVE-16761. Need to look again after HIVE-16965 is fixed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-07-28 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal resolved HIVE-16791.
---
Resolution: Fixed

Fixed with HIVE-16965

> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-07-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105972#comment-16105972
 ] 

Sergey Shelukhin commented on HIVE-15665:
-

The test failures are counter changes for the llap tests. Will update with the 
next iteration.

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.01.patch, HIVE-15665.02.patch, 
> HIVE-15665.03.patch, HIVE-15665.04.patch, HIVE-15665.05.patch, 
> HIVE-15665.06.patch, HIVE-15665.07.patch, HIVE-15665.08.patch, 
> HIVE-15665.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16945) Add method to compare Operators

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105961#comment-16105961
 ] 

Hive QA commented on HIVE-16945:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879330/HIVE-16945.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth 
(batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=241)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=241)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6178/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6178/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6178/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879330 - PreCommit-HIVE-Build

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Rui Li
> Attachments: HIVE-16945.1.patch, HIVE-16945.2.patch
>
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17205) add functional support

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17205:
--
Status: Patch Available  (was: Open)

> add functional support
> --
>
> Key: HIVE-17205
> URL: https://issues.apache.org/jira/browse/HIVE-17205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17205.01.patch
>
>
> make sure unbucketed tables can be marked transactional=true
> make insert/update/delete/compaction work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17205) add functional support

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17205:
--
Attachment: HIVE-17205.01.patch

> add functional support
> --
>
> Key: HIVE-17205
> URL: https://issues.apache.org/jira/browse/HIVE-17205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17205.01.patch
>
>
> make sure unbucketed tables can be marked transactional=true
> make insert/update/delete/compaction work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-16077:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

patch 8 committed to master (3.0)
thanks Prasanth for the review

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17206) make a version of Compactor specific to unbucketed tables

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17206:
-


> make a version of Compactor specific to unbucketed tables
> -
>
> Key: HIVE-17206
> URL: https://issues.apache.org/jira/browse/HIVE-17206
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> current Compactor will work but is not optimized/flexible enough



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17204) support un-bucketed tables in acid

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17204:
--
Description: this is only supported for "stored as ORC TBLPROPERTIES 
('transactional'='true', 'transactional_properties'='default')" introduced in 
HIVE-14035  (was: this is only supported for "stored as ORC TBLPROPERTIES 
('transactional'='true', 'transactional_properties'='default')")

> support un-bucketed tables in acid
> --
>
> Key: HIVE-17204
> URL: https://issues.apache.org/jira/browse/HIVE-17204
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> this is only supported for "stored as ORC TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='default')" introduced in 
> HIVE-14035



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17204) support un-bucketed tables in acid

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17204:
--
Description: this is only supported for "stored as ORC TBLPROPERTIES 
('transactional'='true', 'transactional_properties'='default')"

> support un-bucketed tables in acid
> --
>
> Key: HIVE-17204
> URL: https://issues.apache.org/jira/browse/HIVE-17204
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> this is only supported for "stored as ORC TBLPROPERTIES 
> ('transactional'='true', 'transactional_properties'='default')"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17205) add functional support

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17205:
-


> add functional support
> --
>
> Key: HIVE-17205
> URL: https://issues.apache.org/jira/browse/HIVE-17205
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> make sure unbucketed tables can be marked transactional=true
> make insert/update/delete/compaction work



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17204) support un-bucketed tables in acid

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17204:
-


> support un-bucketed tables in acid
> --
>
> Key: HIVE-17204
> URL: https://issues.apache.org/jira/browse/HIVE-17204
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-16908:

Status: Open  (was: Patch Available)

Resubmitting, to run tests.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-16908:

Status: Patch Available  (was: Open)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-16908:

Attachment: HIVE-16908.3.patch

Not my finest moment, but the following is the least obtrusive change to fix 
the problem, methinks. 

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch, 
> HIVE-16908.3.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105880#comment-16105880
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

I'm taking a crack at this failure. Assigning this to myself, for the moment, 
for the sake of uploading a patch.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-16908:
---

Assignee: Mithun Radhakrishnan  (was: Sunitha Beeram)

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17194) JDBC: Implement Gzip servlet filter

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105878#comment-16105878
 ] 

Hive QA commented on HIVE-17194:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879319/HIVE-17194.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6177/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6177/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6177/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-07-28 23:59:20.663
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-6177/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-07-28 23:59:20.666
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 5fc4900 HIVE-16965: SMB join may produce incorrect results 
(Deepak Jaiswal, reviewed by gopalv)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 5fc4900 HIVE-16965: SMB join may produce incorrect results 
(Deepak Jaiswal, reviewed by gopalv)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-07-28 23:59:26.882
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: common/src/java/org/apache/hive/http/HttpServer.java:33
error: common/src/java/org/apache/hive/http/HttpServer.java: patch does not 
apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879319 - PreCommit-HIVE-Build

> JDBC: Implement Gzip servlet filter
> ---
>
> Key: HIVE-17194
> URL: https://issues.apache.org/jira/browse/HIVE-17194
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2, JDBC
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-17194.1.patch
>
>
> {code}
> POST /cliservice HTTP/1.1
> Content-Type: application/x-thrift
> Accept: application/x-thrift
> User-Agent: Java/THttpClient/HC
> Authorization: Basic YW5vbnltb3VzOmFub255bW91cw==
> Content-Length: 71
> Host: localhost:10007
> Connection: Keep-Alive
> Accept-Encoding: gzip,deflate
> X-XSRF-HEADER: true
> {code}
> The Beeline client clearly sends out HTTP compression headers which are 
> ignored by the HTTP service layer in HS2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in Hive On Spark

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105873#comment-16105873
 ] 

Hive QA commented on HIVE-16948:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879315/HIVE-16948.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6176/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6176/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6176/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879315 - PreCommit-HIVE-Build

> Invalid explain when running dynamic partition pruning query in Hive On Spark
> -
>
> Key: HIVE-16948
> URL: https://issues.apache.org/jira/browse/HIVE-16948
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
> Attachments: HIVE-16948_1.patch, HIVE-16948.2.patch, HIVE-16948.patch
>
>
> in 
> [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107]
>  in spark_dynamic_partition_pruning.q
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all 
> select distinct(ds) as ds from srcpart) s where s.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>   Edges:
> Reducer 11 <- Map 10 (GROUP, 1)
> Reducer 13 <- Map 12 (GROUP, 1)
>   DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>   Vertices:
> Map 10 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
> Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
> Group By Operator
>   aggregations: max(ds)
>   mode: hash
>   outputColumnNames: _col0
>   Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
>   Reduce Output Operator
> sort order: 
> Statistics: Num rows: 1 Data size: 184 Basic stats: 
> COMPLETE Column stats: NONE
> value expressions: _col0 (type: string)
> Map 12 
> Map Operator Tree:
> TableScan
>   alias: srcpart
>   Statistics: Num rows: 1 Data size: 23248 Basic stats: 
> PARTIAL Column stats: NONE
>   Select Operator
> expressions: ds (type: string)
> outputColumnNames: ds
>

[jira] [Commented] (HIVE-13989) Extended ACLs are not handled according to specification

2017-07-28 Thread Vaibhav Gumashta (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105861#comment-16105861
 ] 

Vaibhav Gumashta commented on HIVE-13989:
-

[~cdrome] Thanks for the patch. I have a couple of questions on the overall 
approach (doc I'm using for reference: 
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-access-control#permissions-on-new-files-and-folders).
1. It appears for child directories, HDFS should correctly transfer the default 
ACLs. However, I understand that in Hive we want to avoid the HDFS permissions 
umasking (the traditional file permissions and not ACLs). Would it make sense 
to first let HDFS create the child directory (so that it transfers the 
default/access ACLs) and then set the desired permissions?
2. This comment will be relevant if we decide to manage ACL transfer from 
parent to child: referring the above doc, it seems when transferring access 
ACLs, the rwx on other should be removed if it exist. We might need to consider 
that in the code.



> Extended ACLs are not handled according to specification
> 
>
> Key: HIVE-13989
> URL: https://issues.apache.org/jira/browse/HIVE-13989
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-13989.1-branch-1.patch, HIVE-13989.1.patch, 
> HIVE-13989-branch-1.patch, HIVE-13989-branch-2.2.patch, 
> HIVE-13989-branch-2.2.patch, HIVE-13989-branch-2.2.patch
>
>
> Hive takes two approaches to working with extended ACLs depending on whether 
> data is being produced via a Hive query or HCatalog APIs. A Hive query will 
> run an FsShell command to recursively set the extended ACLs for a directory 
> sub-tree. HCatalog APIs will attempt to build up the directory sub-tree 
> programmatically and runs some code to set the ACLs to match the parent 
> directory.
> Some incorrect assumptions were made when implementing the extended ACLs 
> support. Refer to https://issues.apache.org/jira/browse/HDFS-4685 for the 
> design documents of extended ACLs in HDFS. These documents model the 
> implementation after the POSIX implementation on Linux, which can be found at 
> http://www.vanemery.com/Linux/ACL/POSIX_ACL_on_Linux.html.
> The code for setting extended ACLs via HCatalog APIs is found in 
> HdfsUtils.java:
> {code}
> if (aclEnabled) {
>   aclStatus =  sourceStatus.getAclStatus();
>   if (aclStatus != null) {
> LOG.trace(aclStatus.toString());
> aclEntries = aclStatus.getEntries();
> removeBaseAclEntries(aclEntries);
> //the ACL api's also expect the tradition user/group/other permission 
> in the form of ACL
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.USER, 
> sourcePerm.getUserAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.GROUP, 
> sourcePerm.getGroupAction()));
> aclEntries.add(newAclEntry(AclEntryScope.ACCESS, AclEntryType.OTHER, 
> sourcePerm.getOtherAction()));
>   }
> }
> {code}
> We found that DEFAULT extended ACL rules were not being inherited properly by 
> the directory sub-tree, so the above code is incomplete because it 
> effectively drops the DEFAULT rules. The second problem is with the call to 
> {{sourcePerm.getGroupAction()}}, which is incorrect in the case of extended 
> ACLs. When extended ACLs are used the GROUP permission is replaced with the 
> extended ACL mask. So the above code will apply the wrong permissions to the 
> GROUP. Instead the correct GROUP permissions now need to be pulled from the 
> AclEntry as returned by {{getAclStatus().getEntries()}}. See the 
> implementation of the new method {{getDefaultAclEntries}} for details.
> Similar issues exist with the HCatalog API. None of the API accounts for 
> setting extended ACLs on the directory sub-tree. The changes to the HCatalog 
> API allow the extended ACLs to be passed into the required methods similar to 
> how basic permissions are passed in. When building the directory sub-tree the 
> extended ACLs of the table directory are inherited by all sub-directories, 
> including the DEFAULT rules.
> Replicating the problem:
> Create a table to write data into (I will use acl_test as the destination and 
> words_text as the source) and set the ACLs as follows:
> {noformat}
> $ hdfs dfs -setfacl -m 
> default:user::rwx,default:group::r-x,default:mask::rwx,default:user:hdfs:rwx,group::r-x,user:hdfs:rwx
>  /user/cdrome/hive/acl_test
> $ hdfs dfs -ls -d /user/cdrome/hive/acl_test
> drwxrwx---+  - cdrome hdfs  0 2016-07-13 20:36 
> /user/cdrome/hive/acl_test
> $ hdfs dfs -getfacl -R /user/cdrome/hive/acl_test
> # file: /user/cdrome/hive/acl_test
> # owner: cdrome
> # group: 

[jira] [Commented] (HIVE-17174) LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105803#comment-16105803
 ] 

Hive QA commented on HIVE-17174:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879309/HIVE-17174.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=228)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6175/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6175/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6175/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879309 - PreCommit-HIVE-Build

> LLAP: ShuffleHandler: optimize fadvise calls for broadcast edge
> ---
>
> Key: HIVE-17174
> URL: https://issues.apache.org/jira/browse/HIVE-17174
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-17174.1.patch, HIVE-17174.2.patch
>
>
> Currently, once the data is transferred `fadvise` call is invoked to throw 
> away the pages. This may not be very helpful in broadcast, as it would tend 
> to transfer the same data to multiple downstream tasks. 
> e.g Q50 at 1 TB scale
> {noformat}
>   Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE)
> Map 6 <- Reducer 2 (BROADCAST_EDGE), Reducer 3 (BROADCAST_EDGE), 
> Reducer 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 3 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 4 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Reducer 7 <- Map 1 (CUSTOM_SIMPLE_EDGE), Map 10 (BROADCAST_EDGE), Map 
> 11 (BROADCAST_EDGE), Map 6 (CUSTOM_SIMPLE_EDGE)
> Reducer 8 <- Reducer 7 (SIMPLE_EDGE)
> Reducer 9 <- Reducer 8 (SIMPLE_EDGE)
> Status: Running (Executing on YARN cluster with App id 
> application_1490656001509_6084)
> --
> VERTICES  MODESTATUS  TOTAL  COMPLETED  RUNNING  PENDING  
> FAILED  KILLED
> --
> Map 5 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 1 ..  llap SUCCEEDED 11 1100  
>  0   0
> Reducer 4 ..  llap SUCCEEDED  1  100  
>  0   0
> Reducer 2 ..  llap SUCCEEDED  1  100  
>  0   0
> Reducer 3 ..  llap SUCCEEDED  1  100  
>  0   0
> Map 6 ..  llap SUCCEEDED13913900  
>  0   0
> Map 10 .  llap SUCCEEDED  1  100  
>  0   0
> Map 11 .  llap SUCCEEDED  1  100  
>  0   0
> Reducer 7 ..  llap SUCCEEDED83483400  
>  0   0
> Reducer 8 ..  llap SUCCEEDED 24 2400  
>  0   0
> Reducer 9 ..  llap SUCCEEDED  1  100  
>  0   0
> --
> e.g count of evictions on files
> 139 
> 

[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-16965:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Committed to master

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Fix For: 3.0.0
>
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17191) Add InterfaceAudience and InterfaceStability annotations for StorageHandler APIs

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105676#comment-16105676
 ] 

Hive QA commented on HIVE-17191:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879304/HIVE-17191.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=226)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6174/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6174/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6174/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879304 - PreCommit-HIVE-Build

> Add InterfaceAudience and InterfaceStability annotations for StorageHandler 
> APIs
> 
>
> Key: HIVE-17191
> URL: https://issues.apache.org/jira/browse/HIVE-17191
> Project: Hive
>  Issue Type: Sub-task
>  Components: StorageHandler
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17191.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105613#comment-16105613
 ] 

Alan Gates commented on HIVE-17167:
---

Sort of, but not quite like that.  The constructor to MetastoreConf is private 
so that an instance of this object can never be instantiated.  The intent is to 
use Configuration instead, since this allows passing in a HiveConf and it will 
"just work".  So doing:
{code}
Configuration conf = MetastoreConf.newMetastoreConf();
MetastoreConf.setVar(conf, ConfVars.X, "val");
{code}
will work and avoid the issue you are concerned about.

The loophole is that a user could do:
{code}
Configuration conf = MetastoreConf.newMetastoreConf();
conf.set("metastore.myconfig.name", "x");
conf.set("hive.metastore.myconfig.name", "y");
{code}

So the rule is, as long as the users use the MetastoreConf methods, all is 
good.  If not, things can go sideways on them.  For clients this should be ok 
as they should only be interacting with the config by calling the setMetaConf 
methods in IMetaStoreClient, which will do the right thing.  For hook writers 
and Hive developers, they will have to be aware of this subtlety if they want 
to set metastore configuration variables.  That is unfortunate.

If MetastoreConf subclasses Configuration (as HiveConf does) then it will not 
be able to operate on a HiveConf object as is.  It would have to construct a 
new instance of Configuration from HiveConf, which is expensive.  It seems to 
me better to optimize for interoperability.

> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17202) Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17202:

Attachment: HIVE-17202.1.patch

> Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs
> --
>
> Key: HIVE-17202
> URL: https://issues.apache.org/jira/browse/HIVE-17202
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17202.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Attachment: (was: HIVE-17190.patch)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.2.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17202) Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17202:

Status: Patch Available  (was: Open)

> Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs
> --
>
> Key: HIVE-17202
> URL: https://issues.apache.org/jira/browse/HIVE-17202
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17202.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Attachment: HIVE-17190.2.patch

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.2.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Status: Patch Available  (was: Open)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.2.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Summary: Schema changes for bitvectors for unpartitioned tables  (was: 
Don't store bitvectors for unpartitioned table)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17190) Schema changes for bitvectors for unpartitioned tables

2017-07-28 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17190:

Description: Missed in HIVE-16997  (was: Since current ones can't be 
intersected, there is no advantage of storing them for unpartitioned tables.)

> Schema changes for bitvectors for unpartitioned tables
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Missed in HIVE-16997



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-4362) Allow Hive unit tests to run against fully-distributed cluster

2017-07-28 Thread Mark Grover (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Grover reassigned HIVE-4362:
-

Assignee: (was: Mark Grover)

> Allow Hive unit tests to run against fully-distributed cluster
> --
>
> Key: HIVE-4362
> URL: https://issues.apache.org/jira/browse/HIVE-4362
> Project: Hive
>  Issue Type: Improvement
>  Components: Testing Infrastructure
>Affects Versions: 0.10.0
>Reporter: Mark Grover
>
> It seems like Hive unit tests can run in (Hadoop) local mode or miniMR mode. 
> It would be nice (especially for projects like Apache Bigtop) to be able to 
> run Hive tests in fully distributed mode.
> This JIRA tracks the introduction of such functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105597#comment-16105597
 ] 

Alan Gates commented on HIVE-17189:
---

Adding a test that exercises the "new" code paths would be good, especially for 
the alter_table call.

Other than that, +1.

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17192) Add InterfaceAudience and InterfaceStability annotations for Stats Collection APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17192:

Status: Patch Available  (was: Open)

> Add InterfaceAudience and InterfaceStability annotations for Stats Collection 
> APIs
> --
>
> Key: HIVE-17192
> URL: https://issues.apache.org/jira/browse/HIVE-17192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17192.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17192) Add InterfaceAudience and InterfaceStability annotations for Stats Collection APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17192:

Attachment: HIVE-17192.1.patch

> Add InterfaceAudience and InterfaceStability annotations for Stats Collection 
> APIs
> --
>
> Key: HIVE-17192
> URL: https://issues.apache.org/jira/browse/HIVE-17192
> Project: Hive
>  Issue Type: Sub-task
>  Components: Statistics
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17192.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105554#comment-16105554
 ] 

Hive QA commented on HIVE-17139:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879301/HIVE-17139.4.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce_3]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_grouping]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6173/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6173/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6173/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879301 - PreCommit-HIVE-Build

> Conditional expressions optimization: skip the expression evaluation if the 
> condition is not satisfied for vectorization engine.
> 
>
> Key: HIVE-17139
> URL: https://issues.apache.org/jira/browse/HIVE-17139
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ke Jia
>Assignee: Ke Jia
> Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch, 
> HIVE-17139.3.patch, HIVE-17139.4.patch
>
>
> The case when and if statement execution for Hive vectorization is not 
> optimal, which all the conditional and else expressions are evaluated for 
> current implementation. The optimized approach is to update the selected 
> array of batch parameter after the conditional expression is executed. Then 
> the else expression will only do the selected rows instead of all.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105541#comment-16105541
 ] 

Sergio Peña commented on HIVE-17129:


Got it.
+1

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-07-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105539#comment-16105539
 ] 

Sergio Peña commented on HIVE-16357:


I'm not sure why either. I don't think we should trigger events with failed 
operations, but the code was already there. Perhaps we could fix this and 
trigger the events only when it succeeds ?

> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16357.01.patch, HIVE-16357.02.patch
>
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
>   at 

[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration

2017-07-28 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105538#comment-16105538
 ] 

slim bouguerra commented on HIVE-17160:
---

[~sseth] the old path was not working, so i guess there is not need to add any 
new method.

> Adding kerberos Authorization to the Druid hive integration
> ---
>
> Key: HIVE-17160
> URL: https://issues.apache.org/jira/browse/HIVE-17160
> Project: Hive
>  Issue Type: New Feature
>  Components: Druid integration
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-17160.patch
>
>
> This goal of this feature is to allow hive querying a secured druid cluster 
> using kerberos credentials.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16357) Failed folder creation when creating a new table is reported incorrectly

2017-07-28 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105523#comment-16105523
 ] 

Sahil Takiar commented on HIVE-16357:
-

I think [~pvary] brings up a valid question. Why do we fire events even if the 
operation failed? [~spena] any ideas?

It seems {{NotificationListener}} always checks the status of a given event 
before firing a notification for it.

> Failed folder creation when creating a new table is reported incorrectly
> 
>
> Key: HIVE-16357
> URL: https://issues.apache.org/jira/browse/HIVE-16357
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-16357.01.patch, HIVE-16357.02.patch
>
>
> If the directory for a Hive table could not be created, them the HMS will 
> throw a metaexception:
> {code}
>  if (tblPath != null) {
>   if (!wh.isDir(tblPath)) {
> if (!wh.mkdirs(tblPath, true)) {
>   throw new MetaException(tblPath
>   + " is not a directory or unable to create one");
> }
> madeDir = true;
>   }
> }
> {code}
> However in the finally block we always try to call the 
> DbNotificationListener, which in turn will also throw an exception because 
> the directory is missing, overwriting the initial exception with a 
> FileNotFoundException.
> Actual stacktrace seen by the caller:
> {code}
> 2017-04-03T05:58:00,128 ERROR [pool-7-thread-2] metastore.RetryingHMSHandler: 
> MetaException(message:java.lang.RuntimeException: 
> java.io.FileNotFoundException: File file:/.../0 does not exist)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:6074)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1496)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>   at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>   at com.sun.proxy.$Proxy28.create_table_with_environment_context(Unknown 
> Source)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11125)
>   at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:11109)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at 
> org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File 
> file:/.../0 does not exist
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener$FileIterator.(DbNotificationListener.java:203)
>   at 
> org.apache.hive.hcatalog.listener.DbNotificationListener.onCreateTable(DbNotificationListener.java:137)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1463)
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1482)
>   ... 20 more
> Caused by: java.io.FileNotFoundException: File file:/.../0 does not exist
>   at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:429)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1515)
>   at 

[jira] [Commented] (HIVE-17167) Create metastore specific configuration tool

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105495#comment-16105495
 ] 

Vihang Karajgaonkar commented on HIVE-17167:


Hi [~alangates] Thanks for the patch. Quick question regarding the patch: Looks 
like it is possible to store two different values for MetaConf.Confvars.varname 
and MetaConf. Confvars.hivename. For example: user could do the following:

{noformat}
MetaConf metaConf = newMetaConf();
metaConf.set("metastore.myconfig.name", "X");
metaConf.set("hive.metastore.myconfig.name", "Y");
{noformat}

In this case will MetaConf.get(metaConf, "metastore.myconfig.name") and 
MetaConf.get(metaConf, "hive.metastore.myconfig.name") return two different 
values? Shouldn't the set call check if a corresponding equivalent key is set 
as well and if yes, overwrite it as well?


> Create metastore specific configuration tool
> 
>
> Key: HIVE-17167
> URL: https://issues.apache.org/jira/browse/HIVE-17167
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17167.patch
>
>
> As part of making the metastore a separately releasable module we need 
> configuration tools that are specific to that module.  It cannot use or 
> extend HiveConf as that is in hive common.  But it must take a HiveConf 
> object and be able to operate on it.
> The best way to achieve this is using Hadoop's Configuration object (which 
> HiveConf extends) together with enums and static methods.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17190) Don't store bitvectors for unpartitioned table

2017-07-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105492#comment-16105492
 ] 

Ashutosh Chauhan commented on HIVE-17190:
-

Yes, it will help with auto-gather column stats where we collect stats for 
newly inserted data. We can update bit vectors for those scenarios to get a 
better ndv estimate.
So, yes we shall store bitvectors for unpartitioned tables too. Currently, some 
of upgrade scripts don't cover that. Will use this jira for that and change 
description accordingly.

> Don't store bitvectors for unpartitioned table
> --
>
> Key: HIVE-17190
> URL: https://issues.apache.org/jira/browse/HIVE-17190
> Project: Hive
>  Issue Type: Test
>  Components: Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-17190.patch
>
>
> Since current ones can't be intersected, there is no advantage of storing 
> them for unpartitioned tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17203) Add InterfaceAudience and InterfaceStability annotations for HCat APIs

2017-07-28 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105476#comment-16105476
 ] 

Sahil Takiar commented on HIVE-17203:
-

Right now I'm thinking of marking as the following classes as Public:

* Most of the classes under {{org.apache.hive.hcatalog.api}}
* Maybe a few more classes under the hive-hcatalog-core package

> Add InterfaceAudience and InterfaceStability annotations for HCat APIs
> --
>
> Key: HIVE-17203
> URL: https://issues.apache.org/jira/browse/HIVE-17203
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17179) Add InterfaceAudience and InterfaceStability annotations for Hook APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17179:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review [~aihuaxu]. Merged to master.

> Add InterfaceAudience and InterfaceStability annotations for Hook APIs
> --
>
> Key: HIVE-17179
> URL: https://issues.apache.org/jira/browse/HIVE-17179
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hooks
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17179.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path

2017-07-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17138:
--
Description: 
For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a 
specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) 
is set up in createBucketForFileIdx() which creates the actual bucketN file (as 
of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows).  
This causes empty (i.e.ORC metadata only) bucket files to be created for 
multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 
row.  For example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored 
as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
with mapreduce.job.reduces = 1 or 2 
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that 
needs to land there is seen.  Thus it never creates empty buckets no mater what 
the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert 
path will matter since base and delta are the only files split computation, etc 
looks at.  delete_delta is only for Acid internals so there is never any reason 
for create empty files there.


Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()

  was:
For bucketed tables, FileSinkOperator is expected (in some cases)  to produce a 
specific number of files even if they are empty.
FileSinkOperator.closeOp(boolean abort) has logic to create files even if empty.

This doesn't property work for Acid path.  For Insert, the OrcRecordUpdater(s) 
is set up in createBucketForFileIdx() which creates the actual bucketN file (as 
of HIVE-14007, it does it regardless of whether RecordUpdater sees any rows).  
This causes empty (i.e.ORC metadata only) bucket files to be created for 
multiFileSpray=true if a particular FileSinkOperator.process() sees at least 1 
row.  For example,
{noformat}
create table fourbuckets (a int, b int) clustered by (a) into 4 buckets stored 
as orc TBLPROPERTIES ('transactional'='true');
insert into fourbuckets values(0,1),(1,1);
with mapreduce.job.reduces = 1 or 2 
{noformat}

For Update/Delete path, OrcRecordWriter is created lazily when the 1st row that 
needs to land there is seen.  Thus it never creates empty buckets no mater what 
the value of _skipFiles_ in closeOp(boolean).

Once Split Update does the split early (in operator pipeline) only the Insert 
path will matter since base and delta are the only files split computation, etc 
looks at.  delete_delta is only for Acid internals so there is never any reason 
for create empty files there.



> FileSinkOperator doesn't create empty files for acid path
> -
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdater sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created for multiFileSpray=true if a particular 
> FileSinkOperator.process() sees at least 1 row.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2 
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.
> Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105428#comment-16105428
 ] 

Hive QA commented on HIVE-16965:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879285/HIVE-16965.8.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6172/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6172/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6172/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879285 - PreCommit-HIVE-Build

> SMB join may produce incorrect results
> --
>
> Key: HIVE-16965
> URL: https://issues.apache.org/jira/browse/HIVE-16965
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Deepak Jaiswal
> Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, 
> HIVE-16965.3.patch, HIVE-16965.4.patch, HIVE-16965.5.patch, 
> HIVE-16965.6.patch, HIVE-16965.7.patch, HIVE-16965.8.patch
>
>
> Running the following on MiniTez
> {noformat}
> set hive.mapred.mode=nonstrict;
> SET hive.vectorized.execution.enabled=true;
> SET hive.exec.orc.default.buffer.size=32768;
> SET hive.exec.orc.default.row.index.stride=1000;
> SET hive.optimize.index.filter=true;
> set hive.fetch.task.conversion=none;
> set hive.exec.dynamic.partition.mode=nonstrict;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q 
> smallint)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> CREATE TABLE orc_b (id bigint, cfloat float)
>   CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc;
> insert into table orc_a partition (y=2000, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_a partition (y=2001, q)
> select cbigint, cdouble, csmallint % 10 from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc;
> insert into table orc_b 
> select cbigint, cfloat from alltypesorc
>   where cbigint is not null and csmallint > 0 order by cbigint asc limit 200;
> set hive.cbo.enable=false;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=10;
> explain
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q;
> DROP TABLE orc_a;
> DROP TABLE orc_b;
> {noformat}
> Produces different results for the two selects. The SMB one looks incorrect. 
> cc [~djaiswal] [~hagleitn]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers

2017-07-28 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105429#comment-16105429
 ] 

Prasanth Jayachandran commented on HIVE-16077:
--

+1. Looks good to me. Only comment is the one you had already mentioned in the 
patch related to closing on abort. Will be good to add that too. 

> UPDATE/DELETE fails with numBuckets > numReducers
> -
>
> Key: HIVE-16077
> URL: https://issues.apache.org/jira/browse/HIVE-16077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.1.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, 
> HIVE-16077.03.patch, HIVE-16077.08.patch
>
>
> don't think we have such tests for Acid path
> check if they exist for non-acid path
> way to record expected files on disk in ptest/qfile
> https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105399#comment-16105399
 ] 

Zoltan Haindrich commented on HIVE-17201:
-

I feel that disabling unit tests is a bad ideait's not the tests which have 
been broken - I perfectly understand that the code have used these things 
incorrectly...but in the earlier state it was working...
Because these failures mark that the current code does not perform as expected 
- I think we should instead consider reverting the original change (HIVE-16844)

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-28 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105400#comment-16105400
 ] 

Sahil Takiar commented on HIVE-16998:
-

Final patch LGTM pending results from Hive QA.

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-28 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Patch Available  (was: Open)

Re-uploading same patch after renaming. Hopefully ptests will run this time.

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-28 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Attachment: HIVE-16811.3.patch

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-28 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105383#comment-16105383
 ] 

Janaki Lahorani commented on HIVE-16998:


Resolved merge conflicts with HIVE-17087.  Addressed comments from [~stakiar]

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16811) Estimate statistics in absence of stats

2017-07-28 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-16811:
---
Status: Open  (was: Patch Available)

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins

2017-07-28 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-16998:
---
Attachment: HIVE16998.5.patch

> Add config to enable HoS DPP only for map-joins
> ---
>
> Key: HIVE-16998
> URL: https://issues.apache.org/jira/browse/HIVE-16998
> Project: Hive
>  Issue Type: Sub-task
>  Components: Logical Optimizer, Spark
>Reporter: Sahil Takiar
>Assignee: Janaki Lahorani
> Attachments: HIVE16998.1.patch, HIVE16998.2.patch, HIVE16998.3.patch, 
> HIVE16998.4.patch, HIVE16998.5.patch
>
>
> HoS DPP will split a given operator tree in two under the following 
> conditions: it has detected that the query can benefit from DPP, and the 
> filter is not a map-join (see SplitOpTreeForDPP).
> This can hurt performance if the the non-partitioned side of the join 
> involves a complex operator tree - e.g. the query {{select count(*) from 
> srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all 
> select min(srcpart.ds) from srcpart)}} will require running the subquery 
> twice, once in each Spark job.
> Queries with map-joins don't get split into two operator trees and thus don't 
> suffer from this drawback. Thus, it would be nice to have a config key that 
> just enables DPP on HoS for map-joins.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16908) Failures in TestHcatClient due to HIVE-16844

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105373#comment-16105373
 ] 

Mithun Radhakrishnan commented on HIVE-16908:
-

[~sbeeram]: I have raised HIVE-17201 to temporarily disable these three tests, 
to clean up the tests for others. This is only temporary, until we fix the 
tests properly.

bq. If the target metastore instance were accessed through a different 
classloader...
I made an initial pass at doing this. I don't have a proper solution yet. Will 
update.

> Failures in TestHcatClient due to HIVE-16844
> 
>
> Key: HIVE-16908
> URL: https://issues.apache.org/jira/browse/HIVE-16908
> Project: Hive
>  Issue Type: Bug
>Reporter: Sunitha Beeram
>Assignee: Sunitha Beeram
> Attachments: HIVE-16908.1.patch, HIVE-16908.2.patch
>
>
> Some of the tests in TestHCatClient.java, for ex:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}
> are failing due to HIVE-16844. HIVE-16844 fixes a connection leak when a new 
> configuration object is set on the ObjectStore. TestHCatClient fires up a 
> second instance of metastore thread with a different conf object that results 
> in the PersistenceMangaerFactory closure and hence tests fail. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17201:

Status: Patch Available  (was: Open)

Submitting, to check the tests.

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-17201:

Attachment: HIVE-17201.1.patch

The following should clean up the build, for now.

> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17201.1.patch
>
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17203) Add InterfaceAudience and InterfaceStability annotations for HCat APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17203:
---


> Add InterfaceAudience and InterfaceStability annotations for HCat APIs
> --
>
> Key: HIVE-17203
> URL: https://issues.apache.org/jira/browse/HIVE-17203
> Project: Hive
>  Issue Type: Sub-task
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17202) Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar reassigned HIVE-17202:
---


> Add InterfaceAudience and InterfaceStability annotations for HMS Listener APIs
> --
>
> Key: HIVE-17202
> URL: https://issues.apache.org/jira/browse/HIVE-17202
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-28 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105359#comment-16105359
 ] 

Sahil Takiar commented on HIVE-17129:
-

Technically users can use whatever Hive classes they want, as long as the class 
declaration is public, e.g. {{public class CreateTableEvent}}. The annotations 
don't actually stop users from using a class, they are just there to inform 
users what classes they *should* (or shouldn't) be using.

If {{MetaStoreEventListener}} should be public, then I suggest we make 
{{ListenerEvent}} and all classes used by {{MetaStoreEventListener}} public 
too. Generally, if an interface should be marked as Public, then all classes 
used by the interface should also be Public. For example, if we make 
{{MetaStoreEventListener}} then {{ConfigChangeEvent}}, {{CreateTableEvent}}, 
{{DropTableEvent}}, etc. should be Public too.

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16974) Change the sort key for the schema tool validator to be

2017-07-28 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105354#comment-16105354
 ] 

Aihua Xu commented on HIVE-16974:
-

Thanks for explanation. I think it's already a good improvement now.

+1.

> Change the sort key for the schema tool validator to be 
> 
>
> Key: HIVE-16974
> URL: https://issues.apache.org/jira/browse/HIVE-16974
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-16974.patch, HIVE-16974.patch
>
>
> In HIVE-16729, we introduced ordering of results/failures returned by 
> schematool's validators. This allows fault injection testing to expect 
> results that can be verified. However, they were sorted on NAME values which 
> in the HMS schema can be NULL. So if the introduced fault has a NULL/BLANK 
> name column value, the result could be different depending on the backend 
> database(if they sort NULLs first or last).
> So I think it is better to sort on a non-null column value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105355#comment-16105355
 ] 

Vihang Karajgaonkar commented on HIVE-17189:


[~ashutoshc] [~pxiong] Can you please take a look? Thanks!

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Fix Version/s: 2.4.0
   3.0.0

pushed to branch-2 as well.

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17201) (Temporarily) Disable failing tests in TestHCatClient

2017-07-28 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan reassigned HIVE-17201:
---


> (Temporarily) Disable failing tests in TestHCatClient
> -
>
> Key: HIVE-17201
> URL: https://issues.apache.org/jira/browse/HIVE-17201
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog, Tests
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
>
> This is with regard to the recent test-failures in {{TestHCatClient}}. 
> While [~sbeeram] and I joust over the best way to rephrase the failing tests 
> (in HIVE-16908), perhaps it's best that we temporarily disable the following 
> failing tests:
> {noformat}
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
>  (batchId=177)
> org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
> (batchId=177)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16974) Change the sort key for the schema tool validator to be

2017-07-28 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105332#comment-16105332
 ] 

Naveen Gangam commented on HIVE-16974:
--

Thanks for the suggestion [~aihuaxu]. I have tested out the fix with {{order by 
NAME, ID}}.
We are back to the problem we started with, which is having nulls first on 
certain DBs vs Nulls last on certain DBs.
on mysql
{code}
SD_ID in TBLS should not be NULL for Table Name=null, Table ID=101, Table 
Type=EXTERNAL_TABLE
SD_ID in TBLS should not be NULL for Table Name=table1, Table ID=100, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table2, Table ID=106, Table 
Type=EXTERNAL_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=102, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=107, Table 
Type=MANAGED_TABLE
{code}
or others
{code}
SD_ID in TBLS should not be NULL for Table Name=table1, Table ID=100, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table2, Table ID=106, Table 
Type=EXTERNAL_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=102, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=table3, Table ID=107, Table 
Type=MANAGED_TABLE
SD_ID in TBLS should not be NULL for Table Name=null, Table ID=101, Table 
Type=EXTERNAL_TABLE
{code}

The other option is to change the ordering to {{order by ID, NAME}} which is 
pretty similar to the output with just {{order by ID}} for search purposes. 

In both cases, we still print out the NAME value of the entity so I do not 
think it is much of a value add to add the second column for ordering.

Hope this helps. Thanks

> Change the sort key for the schema tool validator to be 
> 
>
> Key: HIVE-16974
> URL: https://issues.apache.org/jira/browse/HIVE-16974
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2
>Affects Versions: 3.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-16974.patch, HIVE-16974.patch
>
>
> In HIVE-16729, we introduced ordering of results/failures returned by 
> schematool's validators. This allows fault injection testing to expect 
> results that can be verified. However, they were sorted on NAME values which 
> in the HMS schema can be NULL. So if the introduced fault has a NULL/BLANK 
> name column value, the result could be different depending on the backend 
> database(if they sort NULLs first or last).
> So I think it is better to sort on a non-null column value.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105319#comment-16105319
 ] 

Vihang Karajgaonkar commented on HIVE-16759:


The patch doesn't apply cleanly on branch-2. There are some conflicts. Hi 
[~janulatha] Can you please provide a patch for branch-2 as well? Thanks!

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17184) Unexpected new line in beeline output when running with -f option

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17184:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Pushed to master. Thanks for the review [~pvary]

> Unexpected new line in beeline output when running with -f option
> -
>
> Key: HIVE-17184
> URL: https://issues.apache.org/jira/browse/HIVE-17184
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-17184.01.patch
>
>
> When running in -f mode on BeeLine I see an extra new line getting added at 
> the end of the results.
> {noformat}
> vihang-MBP:bin vihang$ beeline -f /tmp/query.sql 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$ beeline -e "select * from test;" 2>/dev/null
> +--+---+
> | test.id  | test.val  |
> +--+---+
> | 1| one   |
> | 2| two   |
> | 1| three |
> +--+---+
> vihang-MBP:bin vihang$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17129) Increase usage of InterfaceAudience and InterfaceStability annotations

2017-07-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105311#comment-16105311
 ] 

Sergio Peña commented on HIVE-17129:


{{MetaStoreEventListener}} is used by other components, so this must be public. 
Regarding {{ListenerEvent}}, I don't know. It is not used directly, but is used 
by inherited objects that {{MetaStoreEventListener}} uses, such as 
{{CreateTableEvent}}.

How does this work? If you mark {{ListenerEvent}} as private, then can users 
use {{CreateTableEvent}} for instance?

> Increase usage of InterfaceAudience and InterfaceStability annotations 
> ---
>
> Key: HIVE-17129
> URL: https://issues.apache.org/jira/browse/HIVE-17129
> Project: Hive
>  Issue Type: Improvement
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
>
> The {{InterfaceAudience}} and {{InterfaceStability}} annotations were added a 
> while ago to mark certain classes as available for public use. However, they 
> were only added to a few classes. The annotations are largely missing for 
> major APIs such as the SerDe and UDF APIs. We should update these interfaces 
> to use these annotations.
> When done in conjunction with HIVE-17130, we should have an automated way to 
> prevent backwards incompatible changes to Hive APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105304#comment-16105304
 ] 

Sergio Peña commented on HIVE-16759:


[~vihangk1] could you commit this to branch-2 as well?

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16759:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105294#comment-16105294
 ] 

Vihang Karajgaonkar commented on HIVE-16759:


Pushed to master. Thanks for your contribution [~janulatha]

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17189) Fix backwards incompatibility in HiveMetaStoreClient

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105293#comment-16105293
 ] 

Hive QA commented on HIVE-17189:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879263/HIVE-17189.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11007 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=242)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6171/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6171/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6171/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879263 - PreCommit-HIVE-Build

> Fix backwards incompatibility in HiveMetaStoreClient
> 
>
> Key: HIVE-17189
> URL: https://issues.apache.org/jira/browse/HIVE-17189
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-17189.01.patch
>
>
> HIVE-12730 adds the ability to edit the basic stats using {{alter table}} and 
> {{alter partition}} commands. However, it changes the signature of @public 
> interface of MetastoreClient and removes some methods which breaks backwards 
> compatibility. This can be fixed easily by re-introducing the removed methods 
> and making them call into newly added method 
> {{alter_table_with_environment_context}}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17131:

Fix Version/s: 2.4.0

> Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
> ---
>
> Key: HIVE-17131
> URL: https://issues.apache.org/jira/browse/HIVE-17131
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 2.4.0
>
> Attachments: HIVE-17131.1.branch-2.patch, HIVE-17131.1.patch
>
>
> Adding InterfaceAudience and InterfaceStability annotations for the core 
> SerDe APIs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17087:

Fix Version/s: 3.0.0

> Remove unnecessary HoS DPP trees during map-join conversion
> ---
>
> Key: HIVE-17087
> URL: https://issues.apache.org/jira/browse/HIVE-17087
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, 
> HIVE-17087.3.patch, HIVE-17087.4.patch, HIVE-17087.5.patch
>
>
> Ran the following query in the {{TestSparkCliDriver}}:
> {code:sql}
> set hive.spark.dynamic.partition.pruning=true;
> set hive.auto.convert.join=true;
> create table partitioned_table1 (col int) partitioned by (part_col int);
> create table partitioned_table2 (col int) partitioned by (part_col int);
> create table regular_table (col int);
> insert into table regular_table values (1);
> alter table partitioned_table1 add partition (part_col = 1);
> insert into table partitioned_table1 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> alter table partitioned_table2 add partition (part_col = 1);
> insert into table partitioned_table2 partition (part_col = 1) values (1), 
> (2), (3), (4), (5), (6), (7), (8), (9), (10);
> explain select * from partitioned_table1, partitioned_table2 where 
> partitioned_table1.part_col = partitioned_table2.part_col;
> {code}
> and got the following explain plan:
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-3 depends on stages: Stage-2
>   Stage-1 depends on stages: Stage-3
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 3 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Select Operator
>   expressions: _col1 (type: int)
>   outputColumnNames: _col0
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark Partition Pruning Sink Operator
>   partition key expr: part_col
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   target column name: part_col
>   target work: Map 2
>   Stage: Stage-3
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 2 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table2
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Spark HashTable Sink Operator
>   keys:
> 0 _col1 (type: int)
> 1 _col1 (type: int)
> Local Work:
>   Map Reduce Local Work
>   Stage: Stage-1
> Spark
>  A masked pattern was here 
>   Vertices:
> Map 1 
> Map Operator Tree:
> TableScan
>   alias: partitioned_table1
>   Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
>   Select Operator
> expressions: col (type: int), part_col (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 10 Data size: 11 Basic stats: 
> COMPLETE Column stats: NONE
> Map Join Operator
>   condition map:
>Inner Join 0 to 1
>   keys:
> 0 _col1 (type: int)
> 1 

[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Fix Version/s: 3.0.0

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-28 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105290#comment-16105290
 ] 

Sahil Takiar commented on HIVE-17153:
-

Thanks for the review [~lirui]. Merged this into master.

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17153) Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]

2017-07-28 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-17153:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Flaky test: TestMiniSparkOnYarnCliDriver[spark_dynamic_partition_pruning]
> -
>
> Key: HIVE-17153
> URL: https://issues.apache.org/jira/browse/HIVE-17153
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark, Test
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-17153.1.patch, HIVE-17153.2.patch
>
>
> {code}
> Client Execution succeeded but contained differences (error code = 1) after 
> executing spark_dynamic_partition_pruning.q 
> 3703c3703
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3717c3717
> <   target work: Map 1
> ---
> >   target work: Map 4
> 3746c3746
> <   target work: Map 4
> ---
> >   target work: Map 1
> 3760c3760
> <   target work: Map 1
> ---
> >   target work: Map 4
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-07-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105274#comment-16105274
 ] 

Sergio Peña commented on HIVE-16886:


Btw, the getNextNotification() method fetches all notifications after an 
EVENT_ID > X, so if we already fetched EVENT_ID = 5098, then the 
getNextNotification won't fetch those events after 5098 that are less than 
5098. Seems we can get it better with NL_ID

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> Assert.assertEquals(2, eventResponse.getEvents().get(1).getEventId());
>   }
> {noformat}
> The last assertion fails expecting an event ID 1 instead of 2. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16886) HMS log notifications may have duplicated event IDs if multiple HMS are running concurrently

2017-07-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105271#comment-16105271
 ] 

Sergio Peña commented on HIVE-16886:


[~anishek] [~thejas] While running some tests with duplicated events IDs in HMS 
HA mode, I see that the NL_ID is never duplicated and is always consecutive and 
in order. Do you know why we're not using this ID instead? Seems more 
consistent and better to use.

[~akolb] FYI

{noformat}
[hive1]> select NL_ID, EVENT_ID, EVENT_TIME, EVENT_TYPE, DB_NAME from 
NOTIFICATION_LOG where NL_ID >= 5431 and NL_ID <= 5440;
+---+--++-++
| NL_ID | EVENT_ID | EVENT_TIME | EVENT_TYPE  | DB_NAME 
   |
+---+--++-++
|  5431 | 5094 | 1501109698 | CREATE_DATABASE | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5432 | 5097 | 1501109698 | CREATE_TABLE| 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5433 | 5098 | 1501109699 | ADD_PARTITION   | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5434 | 5101 | 1501109791 | DROP_TABLE  | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5435 | 5104 | 1501109792 | DROP_DATABASE   | 
metastore_test_db_HIVE_HIVEMETASTORE_2 |
|  5436 | 5096 | 1501109698 | CREATE_DATABASE | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5437 | 5097 | 1501109698 | CREATE_TABLE| 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5438 | 5100 | 1501109699 | ADD_PARTITION   | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5439 | 5102 | 1501109791 | DROP_TABLE  | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
|  5440 | 5105 | 1501109792 | DROP_DATABASE   | 
metastore_test_db_HIVE_HIVEMETASTORE_1 |
+---+--++-++
{noformat}

> HMS log notifications may have duplicated event IDs if multiple HMS are 
> running concurrently
> 
>
> Key: HIVE-16886
> URL: https://issues.apache.org/jira/browse/HIVE-16886
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Metastore
>Reporter: Sergio Peña
>
> When running multiple Hive Metastore servers and DB notifications are 
> enabled, I could see that notifications can be persisted with a duplicated 
> event ID. 
> This does not happen when running multiple threads in a single HMS node due 
> to the locking acquired on the DbNotificationsLog class, but multiple HMS 
> could cause conflicts.
> The issue is in the ObjectStore#addNotificationEvent() method. The event ID 
> fetched from the datastore is used for the new notification, incremented in 
> the server itself, then persisted or updated back to the datastore. If 2 
> servers read the same ID, then these 2 servers write a new notification with 
> the same ID.
> The event ID is not unique nor a primary key.
> Here's a test case using the TestObjectStore class that confirms this issue:
> {noformat}
> @Test
>   public void testConcurrentAddNotifications() throws ExecutionException, 
> InterruptedException {
> final int NUM_THREADS = 2;
> CountDownLatch countIn = new CountDownLatch(NUM_THREADS);
> CountDownLatch countOut = new CountDownLatch(1);
> HiveConf conf = new HiveConf();
> conf.setVar(HiveConf.ConfVars.METASTORE_EXPRESSION_PROXY_CLASS, 
> MockPartitionExpressionProxy.class.getName());
> ExecutorService executorService = 
> Executors.newFixedThreadPool(NUM_THREADS);
> FutureTask tasks[] = new FutureTask[NUM_THREADS];
> for (int i=0; i   final int n = i;
>   tasks[i] = new FutureTask(new Callable() {
> @Override
> public Void call() throws Exception {
>   ObjectStore store = new ObjectStore();
>   store.setConf(conf);
>   NotificationEvent dbEvent =
>   new NotificationEvent(0, 0, 
> EventMessage.EventType.CREATE_DATABASE.toString(), "CREATE DATABASE DB" + n);
>   System.out.println("ADDING NOTIFICATION");
>   countIn.countDown();
>   countOut.await();
>   store.addNotificationEvent(dbEvent);
>   System.out.println("FINISH NOTIFICATION");
>   return null;
> }
>   });
>   executorService.execute(tasks[i]);
> }
> countIn.await();
> countOut.countDown();
> for (int i = 0; i < NUM_THREADS; ++i) {
>   tasks[i].get();
> }
> NotificationEventResponse eventResponse = 
> objectStore.getNextNotification(new NotificationEventRequest());
> Assert.assertEquals(2, eventResponse.getEventsSize());
> Assert.assertEquals(1, eventResponse.getEvents().get(0).getEventId());
> // This fails because the next notification has an event ID = 1
> 

[jira] [Commented] (HIVE-16759) Add table type information to HMS log notifications

2017-07-28 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105115#comment-16105115
 ] 

Janaki Lahorani commented on HIVE-16759:


The test failures are not related to this patch.  The following are tracked as 
part of HIVE-15058.
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)

The following are tracked as part of HIVE-16908.
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)

> Add table type information to HMS log notifications
> ---
>
> Key: HIVE-16759
> URL: https://issues.apache.org/jira/browse/HIVE-16759
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.1
>Reporter: Sergio Peña
>Assignee: Janaki Lahorani
> Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch, 
> HIVE16759.3.patch, HIVE16759.4.patch
>
>
> The DB notifications used by HiveMetaStore should include the table type for 
> all notifications that include table events, such as create, drop and alter 
> table.
> This would be useful for consumers to identify views vs tables.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17008) DbNotificationListener should skip failed events

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105105#comment-16105105
 ] 

Hive QA commented on HIVE-17008:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879261/HIVE-17008.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6170/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6170/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6170/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879261 - PreCommit-HIVE-Build

> DbNotificationListener should skip failed events
> 
>
> Key: HIVE-17008
> URL: https://issues.apache.org/jira/browse/HIVE-17008
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Reporter: Dan Burkert
>Assignee: Dan Burkert
> Attachments: HIVE-17008.0.patch, HIVE-17008.1.patch, 
> HIVE-17008.2.patch
>
>
> When dropping a non-existent database, the HMS will still fire registered 
> {{DROP_DATABASE}} event listeners.  This results in an NPE when the listeners 
> attempt to deref the {{null}} database parameter.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17057) Flaky test: TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema

2017-07-28 Thread Janaki Lahorani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105090#comment-16105090
 ] 

Janaki Lahorani commented on HIVE-17057:


Thanks [~pgolash].

> Flaky test: 
> TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema
> -
>
> Key: HIVE-17057
> URL: https://issues.apache.org/jira/browse/HIVE-17057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>Assignee: PRASHANT GOLASH
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17057) Flaky test: TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema

2017-07-28 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani resolved HIVE-17057.

Resolution: Duplicate

> Flaky test: 
> TestHCatClient.testTableSchemaPropagation,testPartitionRegistrationWithCustomSchema,testPartitionSpecRegistrationWithCustomSchema
> -
>
> Key: HIVE-17057
> URL: https://issues.apache.org/jira/browse/HIVE-17057
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>Assignee: PRASHANT GOLASH
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17001) Insert overwrite table doesn't clean partition directory on HDFS if partition is missing from HMS

2017-07-28 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara resolved HIVE-17001.

Resolution: Won't Fix

> Insert overwrite table doesn't clean partition directory on HDFS if partition 
> is missing from HMS
> -
>
> Key: HIVE-17001
> URL: https://issues.apache.org/jira/browse/HIVE-17001
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2, Metastore
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-17001.01.patch
>
>
> Insert overwrite table should clear existing data before creating the new 
> data files.
> For a partitioned table we will clean any folder of existing partitions on 
> HDFS, however if the partition folder exists only on HDFS and the partition 
> definition is missing in HMS, the folder is not cleared.
> Reproduction steps:
> 1. CREATE TABLE test( col1 string) PARTITIONED BY (ds string);
> 2. INSERT INTO test PARTITION(ds='p1') values ('a');
> 3. Copy the data to a different folder with different name.
> 4. ALTER TABLE test DROP PARTITION (ds='p1');
> 5. Recreate the partition directory, copy and rename the data file back
> 6. INSERT OVERWRITE TABLE test PARTITION(ds='p1') values ('b');
> 7. SELECT * from test;
> will result in 2 records being returned instead of 1.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17169) Avoid extra call to KeyProvider::getMetadata()

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16105001#comment-16105001
 ] 

Hive QA commented on HIVE-17169:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12878839/HIVE-17169.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 11013 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6169/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6169/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6169/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12878839 - PreCommit-HIVE-Build

> Avoid extra call to KeyProvider::getMetadata()
> --
>
> Key: HIVE-17169
> URL: https://issues.apache.org/jira/browse/HIVE-17169
> Project: Hive
>  Issue Type: Bug
>  Components: Shims
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-17169.1.patch
>
>
> Here's the code from {{Hadoop23Shims}}:
> {code:title=Hadoop23Shims.java|borderStyle=solid}
> @Override
> public int comparePathKeyStrength(Path path1, Path path2) throws 
> IOException {
>   EncryptionZone zone1, zone2;
>   zone1 = hdfsAdmin.getEncryptionZoneForPath(path1);
>   zone2 = hdfsAdmin.getEncryptionZoneForPath(path2);
>   if (zone1 == null && zone2 == null) {
> return 0;
>   } else if (zone1 == null) {
> return -1;
>   } else if (zone2 == null) {
> return 1;
>   }
>   return compareKeyStrength(zone1.getKeyName(), zone2.getKeyName());
> }
> private int compareKeyStrength(String keyname1, String keyname2) throws 
> IOException {
>   KeyProvider.Metadata meta1, meta2;
>   if (keyProvider == null) {
> throw new IOException("HDFS security key provider is not configured 
> on your server.");
>   }
>   meta1 = keyProvider.getMetadata(keyname1);
>   meta2 = keyProvider.getMetadata(keyname2);
>   if (meta1.getBitLength() < meta2.getBitLength()) {
> return -1;
>   } else if (meta1.getBitLength() == meta2.getBitLength()) {
> return 0;
>   } else {
> return 1;
>   }
> }
>   }
> {code}
> It turns out that {{EncryptionZone}} already has the cipher's bit-length 
> stored in a member variable. One shouldn't need an additional name-node call 
> ({{KeyProvider::getMetadata()}}) only to fetch it again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104919#comment-16104919
 ] 

Vlad Gudikov edited comment on HIVE-17148 at 7/28/17 1:08 PM:
--

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
*test1(a1 int, a2 int)* and *test2(b1)*. When we execute following query 
*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to 
predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.



was (Author: allgoodok):
ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query *select * 
from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to predicates 
for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST 

[jira] [Comment Edited] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104919#comment-16104919
 ] 

Vlad Gudikov edited comment on HIVE-17148 at 7/28/17 1:07 PM:
--

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query *select * 
from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);* we get to predicates 
for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.



was (Author: allgoodok):
ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query strong 
text*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);*strong 
text* we get to predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   

[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-07-28 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104919#comment-16104919
 ] 

Vlad Gudikov commented on HIVE-17148:
-

ROOT-CAUSE:
The problem was with the predicates that were created according to 
HiveJoinAddNotNullRule. This rule is creating predicates from fields that take 
part in join filter, no matter if this fields are used as parameters of 
functions or not.

SOLUTION:
Create predicate based on functions that take part in filters as well as 
fields. The point is to check if left part and right part of the filter is not 
null, not just fields that are part of the join filter. I.e we have to tables 
test1(a1 int, a2 int) and test2(b1). When we execute following query strong 
text*select * from ct1 c1 inner join ct2 c2 on (COALESCE(a1,b1)=a2);*strong 
text* we get to predicates for filter operator:
b1 is not null --- right part 
a1 is not null and a2 is not null -- left part

Applying predicate for left part of join will result in data loss as we exclude 
rows with null fields. COALESCE is a good example for this case as the main 
purpose of COALESCE function is to get not null values from tables. To fix the 
data loss we need to check that coalesce won't bring us null values as we can't 
join nulls. My fix will check that left part and right part will look like:

b1 is not null -- right part (still checking fields on null condition)
COALESCE(a1,a2) is not null (checking that whole function won't bring us null 
values)

In next patch I'm going to change related failed tests with the fixed stage 
plans.


> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17006) LLAP: Parquet caching

2017-07-28 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104890#comment-16104890
 ] 

Hive QA commented on HIVE-17006:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12879254/HIVE-17006.01.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 11014 tests 
executed
*Failed tests:*
{noformat}
TestPerfCliDriver - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_parquet_types]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=168)
org.apache.hadoop.hive.metastore.TestHiveMetaStoreStatsMerge.testStatsMerge 
(batchId=206)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=179)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=179)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6168/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6168/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6168/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12879254 - PreCommit-HIVE-Build

> LLAP: Parquet caching
> -
>
> Key: HIVE-17006
> URL: https://issues.apache.org/jira/browse/HIVE-17006
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-17006.01.patch, HIVE-17006.patch, 
> HIVE-17006.WIP.patch
>
>
> There are multiple options to do Parquet caching in LLAP:
> 1) Full elevator (too intrusive for now).
> 2) Page based cache like ORC (requires some changes to Parquet or 
> copy-pasted).
> 3) Cache disk data on column chunk level as is.
> Given that Parquet reads at column chunk granularity, (2) is not as useful as 
> for ORC, but still a good idea. I messaged the dev list about it but didn't 
> get a response, we may follow up later.
> For now, do (3). 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-07-28 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.25.patch

Fixed a NullPointerException bug and null partition values bugs.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, 
> HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, 
> HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, 
> HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, 
> HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, 
> HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.24.patch, 
> HIVE-12631.25.patch, HIVE-12631.2.patch, HIVE-12631.3.patch, 
> HIVE-12631.4.patch, HIVE-12631.5.patch, HIVE-12631.6.patch, 
> HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16901) Distcp optimization - One distcp per ReplCopyTask

2017-07-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104832#comment-16104832
 ] 

ASF GitHub Bot commented on HIVE-16901:
---

Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/200


> Distcp optimization - One distcp per ReplCopyTask 
> --
>
> Key: HIVE-16901
> URL: https://issues.apache.org/jira/browse/HIVE-16901
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16901.01.patch, HIVE-16901.02.patch, 
> HIVE-16901.03.patch, HIVE-16901.04.patch
>
>
> Currently, if a ReplCopyTask is created to copy a list of files, then distcp 
> is invoked for each and every file. Instead, need to pass the list of source 
> files to be copied to distcp tool which basically copies the files in 
> parallel and hence gets lot of performance gain.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16750) Support change management for rename table/partition.

2017-07-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16104831#comment-16104831
 ] 

ASF GitHub Bot commented on HIVE-16750:
---

Github user sankarh closed the pull request at:

https://github.com/apache/hive/pull/199


> Support change management for rename table/partition.
> -
>
> Key: HIVE-16750
> URL: https://issues.apache.org/jira/browse/HIVE-16750
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16750.01.patch, HIVE-16750.02.patch, 
> HIVE-16750.03.patch
>
>
> Currently, rename table/partition updates the data location by renaming the 
> directory which is equivalent to moving files to new path and delete old 
> path. So, this should trigger move of files into $CMROOT.
> Scenario:
> 1. Create a table (T1)
> 2. Insert a record
> 3. Rename the table(T1 -> T2)
> 4. Repl Dump till Insert.
> 5. Repl Load from the dump.
> 6. Target DB should have table T1 with the record.
> Similar scenario with rename partition as well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >