[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Resolved with 
https://git-wip-us.apache.org/repos/asf?p=hive.git;a=commit;h=8aee8d4f2b124fcfa093724b4de0a54287a8084f

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Fix For: 3.0.0
>
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Status: Patch Available  (was: Open)

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Attachment: HIVE-16775.01.patch

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16775) Skip HiveFilterAggregateTransposeRule when filter is always false

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16775:
---
Summary: Skip HiveFilterAggregateTransposeRule when filter is always false  
(was: Augment ASTConverter for TPCDS queries)

> Skip HiveFilterAggregateTransposeRule when filter is always false
> -
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16775.01.patch
>
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16498) [Tez] ReduceRecordProcessor has no check to see if all the operators are done or not and is reading complete data

2017-05-30 Thread Adesh Kumar Rao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030618#comment-16030618
 ] 

Adesh Kumar Rao commented on HIVE-16498:


cc [~sershe] [~vikram.dixit] Can you please review it?

> [Tez] ReduceRecordProcessor has no check to see if all the operators are done 
> or not and is reading complete data
> -
>
> Key: HIVE-16498
> URL: https://issues.apache.org/jira/browse/HIVE-16498
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Adesh Kumar Rao
> Fix For: 1.2.0
>
> Attachments: HIVE-16498.1.patch
>
>
> ReducerRecordProcessor is not checking if the reducer (Operator) is done or 
> not and this causes reading of useless data.
> It can be reproduced by a reduce side join.
> The data for large_table is generated by following shell script and a table 
> can be created from the file `large.txt`
> {code:java}
> for (( j=1 ; j <=20; j++))
> do
>   for (( i=1; i <= 100; i++ ))
>   do
> echo "$i,$j" >> large.txt
>   done
> done
> {code}
> {code:java}
> create external table large_table ( i int, j int) row format delimited fields 
> terminated by ',' location "hdfs://";
> set hive.auto.convert.join=false; -- So that reduce side join is used instead 
> of MapJoin
> select * from large_table a join large_table b on a,j = b.j limit 100;
> {code}
> The above join query is stuck reading all the data from table (because of no 
> check) and does not seem to finish in real time as compared to MR or even Tez 
> with MapJoin enabled.
> For reference, the same query takes around 5-6 minutes on MR and 2-3 minutes 
> in case of MapJoin on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-30 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030612#comment-16030612
 ] 

Vihang Karajgaonkar commented on HIVE-16771:


Test failures are unrelated and have been failing for other builds as well. I 
the one below are failing in 
https://builds.apache.org/job/PreCommit-HIVE-Build/5483 as well.

org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23]
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14]
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]

Create HIVE-16796 for TestBeeLineDriver failure which has been failing 
consistently.

Below are known failures.
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)

TestTextFileHCatStorer.testWriteSmallint is failing with a disk error. 

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch, HIVE-16771.04.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16498) [Tez] ReduceRecordProcessor has no check to see if all the operators are done or not and is reading complete data

2017-05-30 Thread Adesh Kumar Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-16498:
---
Status: Patch Available  (was: Open)

> [Tez] ReduceRecordProcessor has no check to see if all the operators are done 
> or not and is reading complete data
> -
>
> Key: HIVE-16498
> URL: https://issues.apache.org/jira/browse/HIVE-16498
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Adesh Kumar Rao
> Fix For: 1.2.0
>
> Attachments: HIVE-16498.1.patch
>
>
> ReducerRecordProcessor is not checking if the reducer (Operator) is done or 
> not and this causes reading of useless data.
> It can be reproduced by a reduce side join.
> The data for large_table is generated by following shell script and a table 
> can be created from the file `large.txt`
> {code:java}
> for (( j=1 ; j <=20; j++))
> do
>   for (( i=1; i <= 100; i++ ))
>   do
> echo "$i,$j" >> large.txt
>   done
> done
> {code}
> {code:java}
> create external table large_table ( i int, j int) row format delimited fields 
> terminated by ',' location "hdfs://";
> set hive.auto.convert.join=false; -- So that reduce side join is used instead 
> of MapJoin
> select * from large_table a join large_table b on a,j = b.j limit 100;
> {code}
> The above join query is stuck reading all the data from table (because of no 
> check) and does not seem to finish in real time as compared to MR or even Tez 
> with MapJoin enabled.
> For reference, the same query takes around 5-6 minutes on MR and 2-3 minutes 
> in case of MapJoin on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16498) [Tez] ReduceRecordProcessor has no check to see if all the operators are done or not and is reading complete data

2017-05-30 Thread Adesh Kumar Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-16498:
---
Status: Open  (was: Patch Available)

> [Tez] ReduceRecordProcessor has no check to see if all the operators are done 
> or not and is reading complete data
> -
>
> Key: HIVE-16498
> URL: https://issues.apache.org/jira/browse/HIVE-16498
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Adesh Kumar Rao
> Fix For: 1.2.0
>
> Attachments: HIVE-16498.1.patch
>
>
> ReducerRecordProcessor is not checking if the reducer (Operator) is done or 
> not and this causes reading of useless data.
> It can be reproduced by a reduce side join.
> The data for large_table is generated by following shell script and a table 
> can be created from the file `large.txt`
> {code:java}
> for (( j=1 ; j <=20; j++))
> do
>   for (( i=1; i <= 100; i++ ))
>   do
> echo "$i,$j" >> large.txt
>   done
> done
> {code}
> {code:java}
> create external table large_table ( i int, j int) row format delimited fields 
> terminated by ',' location "hdfs://";
> set hive.auto.convert.join=false; -- So that reduce side join is used instead 
> of MapJoin
> select * from large_table a join large_table b on a,j = b.j limit 100;
> {code}
> The above join query is stuck reading all the data from table (because of no 
> check) and does not seem to finish in real time as compared to MR or even Tez 
> with MapJoin enabled.
> For reference, the same query takes around 5-6 minutes on MR and 2-3 minutes 
> in case of MapJoin on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16498) [Tez] ReduceRecordProcessor has no check to see if all the operators are done or not and is reading complete data

2017-05-30 Thread Adesh Kumar Rao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adesh Kumar Rao updated HIVE-16498:
---
Attachment: (was: HIVE-16498-branch-1.2.patch)

> [Tez] ReduceRecordProcessor has no check to see if all the operators are done 
> or not and is reading complete data
> -
>
> Key: HIVE-16498
> URL: https://issues.apache.org/jira/browse/HIVE-16498
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Adesh Kumar Rao
> Fix For: 1.2.0
>
> Attachments: HIVE-16498.1.patch
>
>
> ReducerRecordProcessor is not checking if the reducer (Operator) is done or 
> not and this causes reading of useless data.
> It can be reproduced by a reduce side join.
> The data for large_table is generated by following shell script and a table 
> can be created from the file `large.txt`
> {code:java}
> for (( j=1 ; j <=20; j++))
> do
>   for (( i=1; i <= 100; i++ ))
>   do
> echo "$i,$j" >> large.txt
>   done
> done
> {code}
> {code:java}
> create external table large_table ( i int, j int) row format delimited fields 
> terminated by ',' location "hdfs://";
> set hive.auto.convert.join=false; -- So that reduce side join is used instead 
> of MapJoin
> select * from large_table a join large_table b on a,j = b.j limit 100;
> {code}
> The above join query is stuck reading all the data from table (because of no 
> check) and does not seem to finish in real time as compared to MR or even Tez 
> with MapJoin enabled.
> For reference, the same query takes around 5-6 minutes on MR and 2-3 minutes 
> in case of MapJoin on Tez.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030610#comment-16030610
 ] 

Hive QA commented on HIVE-16774:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870477/HIVE-16774.02.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10807 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[skewjoin_mapjoin1] 
(batchId=80)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5485/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5485/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5485/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870477 - PreCommit-HIVE-Build

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch, HIVE-16774.02.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15744) Flaky test: TestPerfCliDriver.query23, query14

2017-05-30 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030602#comment-16030602
 ] 

Vihang Karajgaonkar commented on HIVE-15744:


[~ashutoshc] Looks like these are failing again on master. Do we need to reopen 
this?

https://builds.apache.org/job/PreCommit-HIVE-Build/5483
https://builds.apache.org/job/PreCommit-HIVE-Build/5482/

> Flaky test: TestPerfCliDriver.query23, query14
> --
>
> Key: HIVE-15744
> URL: https://issues.apache.org/jira/browse/HIVE-15744
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0
>Reporter: Thejas M Nair
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
>
> There is some flakiness in these tests -
> https://builds.apache.org/job/PreCommit-HIVE-Build/3206/testReport/org.apache.hadoop.hive.cli/TestPerfCliDriver/testCliDriver_query23_/
> https://builds.apache.org/job/PreCommit-HIVE-Build/3204/testReport/org.apache.hadoop.hive.cli/TestPerfCliDriver/testCliDriver_query14_/
> The diff looks like this - 
> {code}
> Running: diff -a 
> /home/hiveptest/130.211.230.155-hiveptest-1/apache-github-source-source/itests/qtest/target/qfile-results/clientpositive/query14.q.out
>  
> /home/hiveptest/130.211.230.155-hiveptest-1/apache-github-source-source/ql/src/test/results/clientpositive/perf/query14.q.out
> 0a1,2
> > Warning: Shuffle Join MERGEJOIN[916][tables = [$hdt$_1, $hdt$_2]] in Stage 
> > 'Reducer 114' is a cross product
> > Warning: Shuffle Join MERGEJOIN[917][tables = [$hdt$_1, $hdt$_2, $hdt$_0]] 
> > in Stage 'Reducer 115' is a cross product
> 5,6d6
> < Warning: Shuffle Join MERGEJOIN[916][tables = [$hdt$_1, $hdt$_2]] in Stage 
> 'Reducer 114' is a cross product
> < Warning: Shuffle Join MERGEJOIN[917][tables = [$hdt$_1, $hdt$_2, $hdt$_0]] 
> in Stage 'Reducer 115' is a cross product
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (HIVE-16767) Update people website with recent changes

2017-05-30 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li resolved HIVE-16767.
---
Resolution: Fixed

Patch committed. Thanks guys.

> Update people website with recent changes
> -
>
> Key: HIVE-16767
> URL: https://issues.apache.org/jira/browse/HIVE-16767
> Project: Hive
>  Issue Type: Task
>  Components: Documentation
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-16767.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-10483) insert overwrite partition deadlocks on itself with DbTxnManager

2017-05-30 Thread Yechao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yechao Chen updated HIVE-10483:
---
Description: 
insert overwrite table ta partition(part=) select xxx from tb join ta where 
part=

It seems like the Shared conflicts with the Exclusive lock for Insert Overwrite 
even though both are part of the same txn.
More precisely insert overwrite requires X lock on partition and the read side 
needs an S lock on the query.

A simpler case is
insert overwrite table ta partition(part=) select * from ta

  was:
insert overwrite ta partition(part=) select xxx from tb join ta where 
part=

It seems like the Shared conflicts with the Exclusive lock for Insert Overwrite 
even though both are part of the same txn.
More precisely insert overwrite requires X lock on partition and the read side 
needs an S lock on the query.

A simpler case is
insert overwrite ta partition(part=) select * from ta


> insert overwrite partition deadlocks on itself with DbTxnManager
> 
>
> Key: HIVE-10483
> URL: https://issues.apache.org/jira/browse/HIVE-10483
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-10483.2.patch, HIVE-10483.patch
>
>
> insert overwrite table ta partition(part=) select xxx from tb join ta 
> where part=
> It seems like the Shared conflicts with the Exclusive lock for Insert 
> Overwrite even though both are part of the same txn.
> More precisely insert overwrite requires X lock on partition and the read 
> side needs an S lock on the query.
> A simpler case is
> insert overwrite table ta partition(part=) select * from ta



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-05-30 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated HIVE-11297:

Attachment: HIVE-11297.1.patch

[~csun]: update patch, as in my environment,[case "multiple sources, single 
key"|https://issues.apache.org/jira/browse/HIVE-16780] in 
spark_dynamic_pruning.q fails, i could not generate new 
spark_dynamic_partition_pruning.q.out. I extract the test case about "multi 
columns, single source" in a new qfile 
"spark_dynamic_partition_pruning_combine.q"( here i create a configuration item 
" hive.spark.dynamic.partition.pruning.combine" ,so if this config item is not 
enabled, combine op trees for partiition info will not happen)
{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.spark.dynamic.partition.pruning.combine=true;


-- SORT_QUERY_RESULTS
create table srcpart_date_hour as select ds as ds, ds as `date`, hr as hr, hr 
as hour from srcpart group by ds, hr;
-- multiple columns single source
EXPLAIN select count(*) from srcpart join srcpart_date_hour on (srcpart.ds = 
srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where 
srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11;
select count(*) from srcpart join srcpart_date_hour on (srcpart.ds = 
srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where 
srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11;
set hive.spark.dynamic.partition.pruning.combine=false;
EXPLAIN select count(*) from srcpart join srcpart_date_hour on (srcpart.ds = 
srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where 
srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11;
select count(*) from srcpart join srcpart_date_hour on (srcpart.ds = 
srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where 
srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11;
{code}

I think we can parallel, you can review and i continue to fix HIVE-16780. after 
fixing HIVE-16780 in my environment, i can update the 
spark_dynamic_partition_pruning.q.out with the change of HIVE-11297.

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

2017-05-30 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned HIVE-11297:
---

Assignee: liyunzhang_intel  (was: Jianguo Tian)

> Combine op trees for partition info generating tasks [Spark branch]
> ---
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
>  Issue Type: Bug
>Affects Versions: spark-branch
>Reporter: Chao Sun
>Assignee: liyunzhang_intel
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030512#comment-16030512
 ] 

Hive QA commented on HIVE-12631:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870464/HIVE-12631.8.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10804 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5483/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5483/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5483/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870464 - PreCommit-HIVE-Build

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16795) Measure Performance for Parquet Vectorization Reader

2017-05-30 Thread Colin Ma (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030509#comment-16030509
 ] 

Colin Ma commented on HIVE-16795:
-

[~Ferd], sure, I'll take this.

> Measure Performance for Parquet Vectorization Reader
> 
>
> Key: HIVE-16795
> URL: https://issues.apache.org/jira/browse/HIVE-16795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Colin Ma
>
> We need to measure the performance of Parquet Vectorization reader feature 
> using TPCx-BB or TPC-DS to see how much performance gain we can archive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16795) Measure Performance for Parquet Vectorization Reader

2017-05-30 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma reassigned HIVE-16795:
---

Assignee: Colin Ma

> Measure Performance for Parquet Vectorization Reader
> 
>
> Key: HIVE-16795
> URL: https://issues.apache.org/jira/browse/HIVE-16795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>Assignee: Colin Ma
>
> We need to measure the performance of Parquet Vectorization reader feature 
> using TPCx-BB or TPC-DS to see how much performance gain we can archive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030475#comment-16030475
 ] 

Hive QA commented on HIVE-16771:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870468/HIVE-16771.04.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10804 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_replicate_rows] 
(batchId=79)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=232)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=232)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteSmallint 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5482/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5482/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5482/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870468 - PreCommit-HIVE-Build

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch, HIVE-16771.04.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16665) Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition

2017-05-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030469#comment-16030469
 ] 

Sahil Takiar commented on HIVE-16665:
-

master branch should be good. Thanks Sergio. Will close this.

> Race condition in Utilities.GetInputPathsCallable --> 
> createDummyFileForEmptyPartition
> --
>
> Key: HIVE-16665
> URL: https://issues.apache.org/jira/browse/HIVE-16665
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16665.1.patch, HIVE-16665.2.patch, 
> HIVE-16665.3.patch
>
>
> Looks like there is a race condition in the {{GetInputPathsCallable}} thread 
> when modifying the input {{MapWork}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16665) Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition

2017-05-30 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-16665:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Race condition in Utilities.GetInputPathsCallable --> 
> createDummyFileForEmptyPartition
> --
>
> Key: HIVE-16665
> URL: https://issues.apache.org/jira/browse/HIVE-16665
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16665.1.patch, HIVE-16665.2.patch, 
> HIVE-16665.3.patch
>
>
> Looks like there is a race condition in the {{GetInputPathsCallable}} thread 
> when modifying the input {{MapWork}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030136#comment-16030136
 ] 

Saumil Mayani edited comment on HIVE-16791 at 5/31/17 1:15 AM:
---

Merge the files and then extract.
{code}
cat sample-data.tar.gz-a* > sample-data.tar.gz
tar -xzvf sample-data.tar.gz -C /tmp
{code}


was (Author: smayani):
Merge the files and then extract.
{code}
cat sample-data.tar.gz-a
tar -xzvf sample-data.tar.gz -C /tmp
{code}

> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030435#comment-16030435
 ] 

Ashutosh Chauhan commented on HIVE-16757:
-

+1

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16258) Suggestion: simplify type 2 SCDs with this non-standard extension to MERGE

2017-05-30 Thread Carter Shanklin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter Shanklin updated HIVE-16258:
---
Summary: Suggestion: simplify type 2 SCDs with this non-standard extension 
to MERGE  (was: Suggesting a non-standard extension to MERGE)

> Suggestion: simplify type 2 SCDs with this non-standard extension to MERGE
> --
>
> Key: HIVE-16258
> URL: https://issues.apache.org/jira/browse/HIVE-16258
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Carter Shanklin
>
> Some common data maintenance strategies, especially the Type 2 SCD update, 
> would become substantially easier with a small extension to the SQL standard 
> for MERGE, specifically the ability to say "when matched then insert". Per 
> the standard, matched records can only be updated or deleted.
> In the Type 2 SCD, when a new record comes in you update the old version of 
> the record and insert the new version of the same record. If this extension 
> were supported, sample Type 2 SCD code would look as follows:
> {code}
> merge into customer
> using new_customer_stage stage
> on stage.source_pk = customer.source_pk
> when not matched then insert values/* Insert a net new record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null)
> when matched then update set   /* Update an old record to mark it as 
> out-of-date */
>   is_current = false, end_date = current_date()
> when matched then insert values/* Insert a new current record */
>   (stage.source_pk, upper(substr(stage.name, 0, 3)), stage.name, stage.state, 
> true, null);
> {code}
> Without this support, the user needs to devise some sort of workaround. A 
> common approach is to first left join the staging table against the table to 
> be updated, then to join these results to a helper table that will spit out 
> two records for each match and one record for each miss. One of the matching 
> records needs to have a join key that can never occur in the source data so 
> this requires precise knowledge of the source dataset.
> An example of this:
> {code}
> merge into customer
> using (
>   select
> *,
> coalesce(invalid_key, source_pk) as join_key
>   from (
> select
>   stage.source_pk, stage.name, stage.state,
>   case when customer.source_pk is null then 1
>   when stage.name <> customer.name or stage.state <> customer.state then 2
>   else 0 end as scd_row_type
> from
>   new_customer_stage stage
> left join
>   customer
> on (stage.source_pk = customer.source_pk and customer.is_current = true)
>   ) updates
>   join scd_types on scd_types.type = scd_row_type
> ) sub
> on sub.join_key = customer.source_pk
> when matched then update set
>   is_current = false,
>   end_date = current_date()
> when not matched then insert values
>   (sub.source_pk, upper(substr(sub.name, 0, 3)), sub.name, sub.state, true, 
> null);
> select * from customer order by source_pk;
> {code}
> This code is very complicated and will fail if the "invalid" key ever shows 
> up in the source dataset. This simple extension provides a lot of value and 
> likely very little maintenance overhead.
> /cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16795) Measure Performance for Parquet Vectorization Reader

2017-05-30 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030420#comment-16030420
 ] 

Ferdinand Xu commented on HIVE-16795:
-

[~colin_mjj] do you want to take JIRA?

> Measure Performance for Parquet Vectorization Reader
> 
>
> Key: HIVE-16795
> URL: https://issues.apache.org/jira/browse/HIVE-16795
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Ferdinand Xu
>
> We need to measure the performance of Parquet Vectorization reader feature 
> using TPCx-BB or TPC-DS to see how much performance gain we can archive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16774:
---
Attachment: HIVE-16774.02.patch

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch, HIVE-16774.02.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030410#comment-16030410
 ] 

Hive QA commented on HIVE-16757:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870455/HIVE-16757.06.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10791 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=228)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_decimal] 
(batchId=9)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5481/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5481/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5481/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870455 - PreCommit-HIVE-Build

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2017-05-30 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14990:
-
Attachment: HIVE-14990.21.patch

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Fix For: hive-14535
>
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.10.patch, HIVE-14990.12.patch, HIVE-14990.13.patch, 
> HIVE-14990.14.patch, HIVE-14990.15.patch, HIVE-14990.16.patch, 
> HIVE-14990.17.patch, HIVE-14990.18.patch, HIVE-14990.19.patch, 
> HIVE-14990.20.patch, HIVE-14990.21.patch, HIVE-14990.patch
>
>
> I am running the tests with isMmTable returning true for most tables (except 
> ACID, temporary tables, views, etc.).
> Many tests will fail because of various expected issues with such an 
> approach; however we can find issues in MM tables from other failures.
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets
> 11) rand - different results due to different sequence of processing.
> 12) many (not all i.e. not the ones with just one insert) tests that have 
> stats output, such as file count, for obvious reasons
> 13) materialized views, not handled by design - the test check erroneously 
> makes them "mm", no easy way to tell them apart, I don't want to plumb more 
> stuff thru just for this test
> I'm filing jiras for some test failures that are not obvious and need an 
> investigation later



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16764) Support numeric as same as decimal

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16764:
---
Affects Version/s: 2.3.0
   2.2.0

> Support numeric as same as decimal
> --
>
> Key: HIVE-16764
> URL: https://issues.apache.org/jira/browse/HIVE-16764
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: incompatibleChange
> Fix For: 3.0.0
>
> Attachments: HIVE-16764.01.patch, HIVE-16764.02.patch
>
>
> for example numeric(12,2) -> decimal(12,2) 
> This will make Numeric reserved keyword



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16764) Support numeric as same as decimal

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16764:
---
Fix Version/s: 3.0.0

> Support numeric as same as decimal
> --
>
> Key: HIVE-16764
> URL: https://issues.apache.org/jira/browse/HIVE-16764
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: incompatibleChange
> Fix For: 3.0.0
>
> Attachments: HIVE-16764.01.patch, HIVE-16764.02.patch
>
>
> for example numeric(12,2) -> decimal(12,2) 
> This will make Numeric reserved keyword



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16764) Support numeric as same as decimal

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16764:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Support numeric as same as decimal
> --
>
> Key: HIVE-16764
> URL: https://issues.apache.org/jira/browse/HIVE-16764
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: incompatibleChange
> Fix For: 3.0.0
>
> Attachments: HIVE-16764.01.patch, HIVE-16764.02.patch
>
>
> for example numeric(12,2) -> decimal(12,2) 
> This will make Numeric reserved keyword



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2017-05-30 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030386#comment-16030386
 ] 

Alan Gates commented on HIVE-14309:
---

+1, as long as this is applied only to the 2.2 branch.  As it just changes 
import statements it seems ok. 

It would be good to put a detailed comment in the orc/pom.xml explaining how 
this works now, as anyone who has to maintain this later is going to be 
seriously confused as to why the files have imports for org.apache.hive.orc yet 
the files list their packages as org.apache.orc.



> Fix naming of classes in orc module to not conflict with standalone orc
> ---
>
> Key: HIVE-14309
> URL: https://issues.apache.org/jira/browse/HIVE-14309
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
> namespace that clash with the ORC project's classes. From Hive 2.2 onward, 
> the classes will only be on ORC, but we'll reduce the problems of classpath 
> issues if we rename the classes to org.apache.hive.orc.
> I've looked at a set of projects (pig, spark, oozie, flume, & storm) and 
> can't find any uses of Hive's versions of the org.apache.orc classes, so I 
> believe this is a safe change that will reduce the integration problems down 
> stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16764) Support numeric as same as decimal

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16764:
---
Description: 
for example numeric(12,2) -> decimal(12,2) 

This will make Numeric reserved keyword

  was:for example numeric(12,2) -> decimal(12,2) 


> Support numeric as same as decimal
> --
>
> Key: HIVE-16764
> URL: https://issues.apache.org/jira/browse/HIVE-16764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: incompatibleChange
> Attachments: HIVE-16764.01.patch, HIVE-16764.02.patch
>
>
> for example numeric(12,2) -> decimal(12,2) 
> This will make Numeric reserved keyword



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16764) Support numeric as same as decimal

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16764:
---
Labels: incompatibleChange  (was: )

> Support numeric as same as decimal
> --
>
> Key: HIVE-16764
> URL: https://issues.apache.org/jira/browse/HIVE-16764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>  Labels: incompatibleChange
> Attachments: HIVE-16764.01.patch, HIVE-16764.02.patch
>
>
> for example numeric(12,2) -> decimal(12,2) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2017-05-30 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-14309:
-
Status: Patch Available  (was: Reopened)

> Fix naming of classes in orc module to not conflict with standalone orc
> ---
>
> Key: HIVE-14309
> URL: https://issues.apache.org/jira/browse/HIVE-14309
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
> namespace that clash with the ORC project's classes. From Hive 2.2 onward, 
> the classes will only be on ORC, but we'll reduce the problems of classpath 
> issues if we rename the classes to org.apache.hive.orc.
> I've looked at a set of projects (pig, spark, oozie, flume, & storm) and 
> can't find any uses of Hive's versions of the org.apache.orc classes, so I 
> believe this is a safe change that will reduce the integration problems down 
> stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Reopened] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2017-05-30 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reopened HIVE-14309:
--

> Fix naming of classes in orc module to not conflict with standalone orc
> ---
>
> Key: HIVE-14309
> URL: https://issues.apache.org/jira/browse/HIVE-14309
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
> namespace that clash with the ORC project's classes. From Hive 2.2 onward, 
> the classes will only be on ORC, but we'll reduce the problems of classpath 
> issues if we rename the classes to org.apache.hive.orc.
> I've looked at a set of projects (pig, spark, oozie, flume, & storm) and 
> can't find any uses of Hive's versions of the org.apache.orc classes, so I 
> believe this is a safe change that will reduce the integration problems down 
> stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2017-05-30 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030376#comment-16030376
 ] 

Owen O'Malley commented on HIVE-14309:
--

Ok, this pull request https://github.com/apache/hive/pull/191 does:
* rename the packages in hive-orc to org.apache.hive.orc
* updates the references to the moved packages from other modules

No clients use org.apache.hive:hive-orc so this just makes the classes internal 
to hive where they don't conflict if clients use both the hive and orc jars. In 
hive 2.3 and later, this is unnecessary because hive uses the orc project's 
artifacts.

> Fix naming of classes in orc module to not conflict with standalone orc
> ---
>
> Key: HIVE-14309
> URL: https://issues.apache.org/jira/browse/HIVE-14309
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> The current Hive 2.0 and 2.1 releases have classes in the org.apache.orc 
> namespace that clash with the ORC project's classes. From Hive 2.2 onward, 
> the classes will only be on ORC, but we'll reduce the problems of classpath 
> issues if we rename the classes to org.apache.hive.orc.
> I've looked at a set of projects (pig, spark, oozie, flume, & storm) and 
> can't find any uses of Hive's versions of the org.apache.orc classes, so I 
> believe this is a safe change that will reduce the integration problems down 
> stream.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16794) Default value for hive.spark.client.connect.timeout of 1000ms is too low

2017-05-30 Thread Eric Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Lin reassigned HIVE-16794:
---

Assignee: Eric Lin

> Default value for hive.spark.client.connect.timeout of 1000ms is too low
> 
>
> Key: HIVE-16794
> URL: https://issues.apache.org/jira/browse/HIVE-16794
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Affects Versions: 2.1.1
>Reporter: Eric Lin
>Assignee: Eric Lin
>
> Currently the default timeout value for hive.spark.client.connect.timeout is 
> set at 1000ms, which is only 1 second. This is not enough when cluster is 
> busy and user will constantly getting the following timeout errors:
> {code}
> 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) 
> at org.apache.hive.spark.client.RemoteDriver.(RemoteDriver.java:156) 
> at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) 
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  
> at java.lang.reflect.Method.invoke(Method.java:606) 
> at 
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
>  
> Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915 
> at 
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220)
>  
> at 
> io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
>  
> at 
> io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120)
>  
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>  
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) 
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>  
> at java.lang.Thread.run(Thread.java:745) 
> 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, 
> exitCode: 15, (reason: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not 
> initialize after waiting for 10 ms. Please check earlier log output for 
> errors. Failing the application. 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering 
> ApplicationMaster with FAILED (diag message: User class threw exception: 
> java.util.concurrent.ExecutionException: 
> io.netty.channel.ConnectTimeoutException: connection timed out: 
> /172.19.22.11:35915) 
> 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory 
> .sparkStaging/application_1492040605432_11445 
> 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14309) Fix naming of classes in orc module to not conflict with standalone orc

2017-05-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030362#comment-16030362
 ] 

ASF GitHub Bot commented on HIVE-14309:
---

GitHub user omalley opened a pull request:

https://github.com/apache/hive/pull/191

HIVE-14309 Shade the contents of hive-orc so that they don't conflict with 
orc project.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/omalley/hive hive-14309

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/191.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #191


commit aaab5c65161f29e00685ed74eee8ad48271dd692
Author: Owen O'Malley 
Date:   2017-05-26T15:52:26Z

Incorporate fixes from storage-api 2.3.1.

commit 2dd6d489ee6710e5a49c937bfa0fcb04a131bb9e
Author: Thejas Nair 
Date:   2016-07-28T07:47:40Z

HIVE-16787. Fix itests in branch-2.2.

commit 18a55ec80bf1ddda179230ca43e7959f76d7357e
Author: Tao LI 
Date:   2016-08-05T06:38:28Z

BUG-63720: TestOperationLogging fails

commit e694efca226b72cc712fecd3f399284faf3dc596
Author: Vaibhav Gumashta 
Date:   2017-02-14T02:31:49Z

BUG-67801: Disable TestJdbcWithMiniHS2.testAddJarConstructorUnCaching

Change-Id: Ie7810f48979720fffdb6934d169cc88edc2baa61

commit 29dd342bc910577b2d26bb4b9e4a3e9f01d12e80
Author: Sergey Shelukhin 
Date:   2017-01-11T20:43:40Z

BUG-72004 : disable Hive-on-Spark tests (Sergey Shelukhin)

commit a7a0afa3363592707226b9de4b1fa124e97f388a
Author: Owen O'Malley 
Date:   2017-05-30T17:09:02Z

More patchup

commit f3a37c654f4bda8f980ac4e51c62f56f6ebd4405
Author: Ashutosh Chauhan 
Date:   2016-07-29T19:02:04Z

BUG-61003 : Hive UT test failures

commit 3d1c03f478c2c5d6b3b05ddafd81da9176a14a06
Author: Matt McCline 
Date:   2016-07-06T04:25:31Z

BUG-61003: Hive2 UT test failure -- the schema_evol_*.q.out failures 
(adjusted tests for the rollback of HIVE-13380)

commit 3fc6200699a5d635ab41446bfaf8a7555481b281
Author: Prasanth Jayachandran 
Date:   2016-12-19T19:35:59Z

BUG-70968: Move schema evolution and some tez tests to MiniLlap

Change-Id: I82d8ab5fd18f54d780f84548ccf045a9c4f3885d

commit 51658a576f8ec3d3260b1a76062d10ee11d8d955
Author: Pengcheng Xiong 
Date:   2017-01-03T22:36:02Z

BUG-70522: Fenton UT - TestCliDriver[autoColumnStats_4] failure

commit 093e2dec7a329243c4d964d3659d617ad8526f45
Author: Ashutosh Chauhan 
Date:   2017-01-26T19:48:18Z

BUG-70522 : Fenton UT - TestCliDriver[autoColumnStats_4] failure

Change-Id: Ic3dd59398f841b9963306ca52a69d8b00a80ffc8

commit 4d28f69f7e3f4a2abf72a76c4ff82e0c7719ae10
Author: Prasanth Jayachandran 
Date:   2016-12-24T05:12:00Z

BUG-69023: addendum patch

Change-Id: I4f81ac7da124fc51f725e52f09fe2c97647f34e5

commit 77d0cc04c7a1c141acf79af89bcde179cfe7cafe
Author: Ashutosh Chauhan 
Date:   2017-01-24T22:35:02Z

BUG-73211 : Update golden files for fenton hive2

Change-Id: I4d73a505bdd10d195a417490c31df28d4b2a8fc7

commit 1f8a083e24ccb184c7e73d3a1bc75c148a186b9d
Author: Pengcheng Xiong 
Date:   2017-01-03T22:46:04Z

BUG-70684: Fenton UT - TestCliDriver.testCliDriver[udtf_explode]

commit 2274215016c934969d2f60bd3af6a1fcae70bcbb
Author: Ashutosh Chauhan 
Date:   2016-08-05T23:47:15Z

BUG-63023 : 10s of errors related to order of keys in hashmap

commit 1617d272d4d215c139c3e1a94f5f70f13202677e
Author: Ashutosh Chauhan 
Date:   2017-01-24T05:15:51Z

BUG-72915 : Fenton UT - TestMiniTezCliDriver.testCliDriver_mapjoin3 is flaky

Change-Id: Ia070bf7a664492d1889edc52ada0cb0aa0871b81

commit 34507b82683aae7690b00ac5170bb401ece49c0a
Author: Matt McCline 
Date:   2016-08-05T20:00:21Z

BUG-63886: schema evol UT fails

commit d6a266969886c462bfac6b6c0f5d23868d697c0a
Author: Owen O'Malley 
Date:   2017-05-30T17:49:20Z

null optimizer

commit 1729970c77d20af27cf7977a513e339d814d65dc
Author: Ashutosh Chauhan 
Date:   2017-01-23T19:45:09Z

BUG-72965 : Backport HIVE-15544 & HIVE-15481 to fenton hive2
Addendum for HIVE-15481

Change-Id: Iee4acb5831c99f120708786ff62018fe65a21a47

commit aa27c6da16fccf95e5b1c43f7a01f6eaee2a6050
Author: Owen O'Malley 
Date:   2017-05-30T17:53:15Z

more fixes

commit f1fbadd268920843283ec1cad072c9f0bc335788
Author: Ashutosh Chauhan 

[jira] [Updated] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-30 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16771:
---
Attachment: HIVE-16771.04.patch

Addressed review comments

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch, HIVE-16771.04.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15051) Test framework integration with findbugs, rat checks etc.

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030355#comment-16030355
 ] 

Hive QA commented on HIVE-15051:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870449/HIVE-15051.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10791 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5480/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5480/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5480/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870449 - PreCommit-HIVE-Build

> Test framework integration with findbugs, rat checks etc.
> -
>
> Key: HIVE-15051
> URL: https://issues.apache.org/jira/browse/HIVE-15051
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, 
> Interim.patch, ql.out
>
>
> Find a way to integrate code analysis tools like findbugs, rat checks to 
> PreCommit tests, thus removing the burden from reviewers to check the code 
> style and other checks which could be done by code. 
> Might worth to take a look on Yetus, but keep in mind the Hive has a specific 
> parallel test framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16792) Estimate Rows When Joining BIGINT to INT Column

2017-05-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030343#comment-16030343
 ] 

Pengcheng Xiong commented on HIVE-16792:


ccing [~jcamachorodriguez]

> Estimate Rows When Joining BIGINT to INT Column
> ---
>
> Key: HIVE-16792
> URL: https://issues.apache.org/jira/browse/HIVE-16792
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.1
>Reporter: BELUGA BEHR
>Priority: Minor
>
> {code:sql}
> create table test1
> (a int);
> create table test2
> (z bigint);
> INSERT INTO test1 VALUES (1);
> INSERT INTO test2 VALUES (2147483648);
> analyze table test1 compute statistics for columns;
> analyze table test2 compute statistics for columns;
> EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
> {code}
> {code}
> Explain
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> ""
> STAGE PLANS:
>   Stage: Stage-4
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> t2 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> t2 
>   TableScan
> alias: t2
> filterExpr: z is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: COMPLETE
> Filter Operator
>   predicate: z is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: COMPLETE
>   HashTable Sink Operator
> keys:
>   0 UDFToLong(a) (type: bigint)
>   1 z (type: bigint)
>   Stage: Stage-3
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t1
> filterExpr: UDFToLong(a) is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: UDFToLong(a) is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 UDFToLong(a) (type: bigint)
>   1 z (type: bigint)
> outputColumnNames: _col0, _col4"
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: _col0 (type: int), _col4 (type: bigint)"
>   outputColumnNames: _col0, _col1"
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Local Work:
> Map Reduce Local Work
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> I would expect that perhaps Hive would be smart enough to know that this join 
> is not going to produce any rows because the MIN VALUE of table test2 is more 
> than INTEGER.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16792) Estimate Rows When Joining BIGINT to INT Column

2017-05-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030342#comment-16030342
 ] 

Pengcheng Xiong commented on HIVE-16792:


yes, this can be an improvement of hive.optimize.filter.stats.reduction

> Estimate Rows When Joining BIGINT to INT Column
> ---
>
> Key: HIVE-16792
> URL: https://issues.apache.org/jira/browse/HIVE-16792
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.1
>Reporter: BELUGA BEHR
>Priority: Minor
>
> {code:sql}
> create table test1
> (a int);
> create table test2
> (z bigint);
> INSERT INTO test1 VALUES (1);
> INSERT INTO test2 VALUES (2147483648);
> analyze table test1 compute statistics for columns;
> analyze table test2 compute statistics for columns;
> EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
> {code}
> {code}
> Explain
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> ""
> STAGE PLANS:
>   Stage: Stage-4
> Map Reduce Local Work
>   Alias -> Map Local Tables:
> t2 
>   Fetch Operator
> limit: -1
>   Alias -> Map Local Operator Tree:
> t2 
>   TableScan
> alias: t2
> filterExpr: z is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column 
> stats: COMPLETE
> Filter Operator
>   predicate: z is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL 
> Column stats: COMPLETE
>   HashTable Sink Operator
> keys:
>   0 UDFToLong(a) (type: bigint)
>   1 z (type: bigint)
>   Stage: Stage-3
> Map Reduce
>   Map Operator Tree:
>   TableScan
> alias: t1
> filterExpr: UDFToLong(a) is not null (type: boolean)
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column 
> stats: NONE
> Filter Operator
>   predicate: UDFToLong(a) is not null (type: boolean)
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   Map Join Operator
> condition map:
>  Inner Join 0 to 1
> keys:
>   0 UDFToLong(a) (type: bigint)
>   1 z (type: bigint)
> outputColumnNames: _col0, _col4"
> Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
> Select Operator
>   expressions: _col0 (type: int), _col4 (type: bigint)"
>   outputColumnNames: _col0, _col1"
>   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE 
> Column stats: NONE
>   File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 1 Basic stats: 
> COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Local Work:
> Map Reduce Local Work
>   Stage: Stage-0
> Fetch Operator
>   limit: -1
>   Processor Tree:
> ListSink
> {code}
> I would expect that perhaps Hive would be smart enough to know that this join 
> is not going to produce any rows because the MIN VALUE of table test2 is more 
> than INTEGER.MAX_VALUE.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables

2017-05-30 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-12631:
--
Attachment: HIVE-12631.8.patch

Uploading the patch again.

> LLAP: support ORC ACID tables
> -
>
> Key: HIVE-12631
> URL: https://issues.apache.org/jira/browse/HIVE-12631
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Transactions
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-12631.1.patch, HIVE-12631.2.patch, 
> HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, 
> HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch
>
>
> LLAP uses a completely separate read path in ORC to allow for caching and 
> parallelization of reads and processing. This path does not support ACID. As 
> far as I remember ACID logic is embedded inside ORC format; we need to 
> refactor it to be on top of some interface, if practical; or just port it to 
> LLAP read path.
> Another consideration is how the logic will work with cache. The cache is 
> currently low-level (CB-level in ORC), so we could just use it to read bases 
> and deltas (deltas should be cached with higher priority) and merge as usual. 
> We could also cache merged representation in future.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16793) Scalar sub-query: Scalar safety checks for explicit group-bys

2017-05-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-16793:
---
Summary: Scalar sub-query: Scalar safety checks for explicit group-bys  
(was: Scalar sub-query: Scalar safety checks for group-bys)

> Scalar sub-query: Scalar safety checks for explicit group-bys
> -
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=2 width=621)

[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030273#comment-16030273
 ] 

Sergio Peña commented on HIVE-16771:


The patch looks good. It might need more refactoring, but it looks a bigger 
task.
+1

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16793) Scalar sub-query: Scalar safety checks for group-bys

2017-05-30 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-16793:
--


> Scalar sub-query: Scalar safety checks for group-bys
> 
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Vineet Garg
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part 
> where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1 vectorized, llap
>   File Output Operator [FS_64]
> Select Operator [SEL_63] (rows= width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   Filter Operator [FIL_62] (rows= width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=2 width=625)
>   
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
>   BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
>   Output:["_col0"]
>   Group By Operator [GBY_56] (rows=1 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
>   <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
>   PartitionCols:_col0
>   Group By Operator [GBY_54] (rows=86 width=89)
> 
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
>   Output:["_col1"]
>   Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=2 width=109)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=2 width=621)
> 
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
>   Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
>   predicate:(sq_count_check(_col0) <= 1)
>   Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
>   <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
>   Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
>   Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
>   <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
>   PartitionCols:_col0
>   Group By Operator [GBY_42] (rows=83 
> width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 
> width=105)
>   Filter Operator [FIL_40] (rows=1212121 
> width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=2 
> width=105)
>   
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
>   <-Select Operator [SEL_59] (rows=2 width=621)
>   
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
>   TableScan [TS_0] 

[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030263#comment-16030263
 ] 

Hive QA commented on HIVE-14990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870446/HIVE-14990.20.patch

{color:green}SUCCESS:{color} +1 due to 19 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1394 failed/errored test(s), 6921 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_custom_key2]
 (batchId=228)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_custom_key]
 (batchId=228)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=228)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_joins] 
(batchId=228)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=228)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=228)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=228)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[mapjoin2] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[select_dummy_source] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_10] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_11] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_13] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_16] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_1] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_3] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=237)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] 
(batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_blobstore_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_blobstore_to_hdfs]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[ctas_hdfs_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_local]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_blobstore_to_warehouse]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_addpartition_local_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_blobstore_nonpart]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_local]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_blobstore_to_warehouse_nonpart]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[import_local_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_blobstore_to_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_empty_into_blobstore]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_dynamic_partitions]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_into_table]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_directory]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_table]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[join2] 
(batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[join] 
(batchId=240)

[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.06.patch

Use mq.getRowCount when appropiate

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch, 
> HIVE-16757.06.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030205#comment-16030205
 ] 

Hive QA commented on HIVE-16774:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870436/HIVE-16774.01.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10789 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_position] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[masking_10] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[position_alias_test_1] 
(batchId=41)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.org.apache.hadoop.hive.cli.TestNegativeCliDriver
 (batchId=90)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[groupby_invalid_position]
 (batchId=89)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[orderby_position_unsupported]
 (batchId=89)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[groupby_position] 
(batchId=116)
org.apache.hive.jdbc.TestJdbcDriver2.testSelectExecAsync2 (batchId=224)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5478/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5478/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5478/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870436 - PreCommit-HIVE-Build

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030194#comment-16030194
 ] 

Sergey Shelukhin commented on HIVE-16774:
-

Also I like ParseUtils.parse changes that move the duplicated code to one place 
:)

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030190#comment-16030190
 ] 

Sergey Shelukhin commented on HIVE-16774:
-

If this one works (e.g. union_pos_alias test passes) without that one, sure.

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15051) Test framework integration with findbugs, rat checks etc.

2017-05-30 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-15051:
--
Attachment: HIVE-15051.02.patch

As [~kgyrtkirk] pointed out, previous version did not compiled the original 
version of the branch before starting the test. This caused errors, when the 
maven repository was empty.

This patch addressing this issue by compiling the root before starting the 
patch analysis.

> Test framework integration with findbugs, rat checks etc.
> -
>
> Key: HIVE-15051
> URL: https://issues.apache.org/jira/browse/HIVE-15051
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, 
> Interim.patch, ql.out
>
>
> Find a way to integrate code analysis tools like findbugs, rat checks to 
> PreCommit tests, thus removing the burden from reviewers to check the code 
> style and other checks which could be done by code. 
> Might worth to take a look on Yetus, but keep in mind the Hive has a specific 
> parallel test framework.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16771) Schematool should use MetastoreSchemaInfo to get the metastore schema version from database

2017-05-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030171#comment-16030171
 ] 

Sahil Takiar commented on HIVE-16771:
-

Could you create an RB for this?

> Schematool should use MetastoreSchemaInfo to get the metastore schema version 
> from database
> ---
>
> Key: HIVE-16771
> URL: https://issues.apache.org/jira/browse/HIVE-16771
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Minor
> Attachments: HIVE-16771.01.patch, HIVE-16771.02.patch, 
> HIVE-16771.03.patch
>
>
> HIVE-16723 gives the ability to have a custom MetastoreSchemaInfo 
> implementation to manage schema upgrades and initialization if needed. In 
> order to make HiveSchemaTool completely agnostic it should depend on 
> IMetastoreSchemaInfo implementation which is configured to get the metastore 
> schema version information from the database. It should also not assume the 
> scripts directory and hardcode it itself. It would rather ask 
> MetastoreSchemaInfo class to get the metastore scripts directory.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030168#comment-16030168
 ] 

Ashutosh Chauhan commented on HIVE-16788:
-

+1

> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 

[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030161#comment-16030161
 ] 

Ashutosh Chauhan commented on HIVE-16757:
-

Left few comments on RB.

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2017-05-30 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14990:
-
Attachment: HIVE-14990.20.patch

> run all tests for MM tables and fix the issues that are found
> -
>
> Key: HIVE-14990
> URL: https://issues.apache.org/jira/browse/HIVE-14990
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Wei Zheng
> Fix For: hive-14535
>
> Attachments: HIVE-14990.01.patch, HIVE-14990.02.patch, 
> HIVE-14990.03.patch, HIVE-14990.04.patch, HIVE-14990.04.patch, 
> HIVE-14990.05.patch, HIVE-14990.05.patch, HIVE-14990.06.patch, 
> HIVE-14990.06.patch, HIVE-14990.07.patch, HIVE-14990.08.patch, 
> HIVE-14990.09.patch, HIVE-14990.10.patch, HIVE-14990.10.patch, 
> HIVE-14990.10.patch, HIVE-14990.12.patch, HIVE-14990.13.patch, 
> HIVE-14990.14.patch, HIVE-14990.15.patch, HIVE-14990.16.patch, 
> HIVE-14990.17.patch, HIVE-14990.18.patch, HIVE-14990.19.patch, 
> HIVE-14990.20.patch, HIVE-14990.patch
>
>
> I am running the tests with isMmTable returning true for most tables (except 
> ACID, temporary tables, views, etc.).
> Many tests will fail because of various expected issues with such an 
> approach; however we can find issues in MM tables from other failures.
> Expected failures 
> 1) All HCat tests (cannot write MM tables via the HCat writer)
> 2) Almost all merge tests (alter .. concat is not supported).
> 3) Tests that run dfs commands with specific paths (path changes).
> 4) Truncate column (not supported).
> 5) Describe formatted will have the new table fields in the output (before 
> merging MM with ACID).
> 6) Many tests w/explain extended - diff in partition "base file name" (path 
> changes).
> 7) TestTxnCommands - all the conversion tests, as they check for bucket count 
> using file lists (path changes).
> 8) HBase metastore tests cause methods are not implemented.
> 9) Some load and ExIm tests that export a table and then rely on specific 
> path for load (path changes).
> 10) Bucket map join/etc. - diffs; disabled the optimization for MM tables due 
> to how it accounts for buckets
> 11) rand - different results due to different sequence of processing.
> 12) many (not all i.e. not the ones with just one insert) tests that have 
> stats output, such as file count, for obvious reasons
> 13) materialized views, not handled by design - the test check erroneously 
> makes them "mm", no easy way to tell them apart, I don't want to plumb more 
> stuff thru just for this test
> I'm filing jiras for some test failures that are not obvious and need an 
> investigation later



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16764) Support numeric as same as decimal

2017-05-30 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030154#comment-16030154
 ] 

Ashutosh Chauhan commented on HIVE-16764:
-

+1

> Support numeric as same as decimal
> --
>
> Key: HIVE-16764
> URL: https://issues.apache.org/jira/browse/HIVE-16764
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16764.01.patch, HIVE-16764.02.patch
>
>
> for example numeric(12,2) -> decimal(12,2) 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saumil Mayani updated HIVE-16791:
-
Summary: Tez engine giving inaccurate results on SMB Map joins while 
map-join and shuffle join gets correct results  (was: Tez engine giving very 
inaccurate results on SMB Map joins while map-join and shuffle join gets 
correct results)

> Tez engine giving inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> --
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16791) Tez engine giving very inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saumil Mayani updated HIVE-16791:
-
Attachment: sample-data.tar.gz-ad
sample-data.tar.gz-ac
sample-data.tar.gz-ab
sample-data.tar.gz-aa

Merge the files and then extract.
{code}
cat sample-data.tar.gz-a
tar -xzvf sample-data.tar.gz -C /tmp
{code}

> Tez engine giving very inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> ---
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt, sample-data.tar.gz-aa, 
> sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16775) Augment ASTConverter for TPCDS queries

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16775:
--

Assignee: Pengcheng Xiong

> Augment ASTConverter for TPCDS queries
> --
>
> Key: HIVE-16775
> URL: https://issues.apache.org/jira/browse/HIVE-16775
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> query4.q,query74.q
> {code}
> [7e490527-156a-48c7-aa87-8c80093cdfa8 main] ql.Driver: FAILED: 
> NullPointerException null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter$QBVisitor.visit(ASTConverter.java:457)
> at org.apache.calcite.rel.RelVisitor.go(RelVisitor.java:61)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:110)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convertSource(ASTConverter.java:393)
> at 
> org.apache.hadoop.hive.ql.optimizer.calcite.translator.ASTConverter.convert(ASTConverter.java:115)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030113#comment-16030113
 ] 

Pengcheng Xiong commented on HIVE-16774:


[~sershe], do you think we should revert HIVE-15938? Thanks.

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16774:
---
Attachment: HIVE-16774.01.patch

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16774:
---
Status: Patch Available  (was: Open)

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-16774.01.patch
>
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16774) Support position in ORDER BY when using SELECT *

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong reassigned HIVE-16774:
--

Assignee: Pengcheng Xiong

> Support position in ORDER BY when using SELECT *
> 
>
> Key: HIVE-16774
> URL: https://issues.apache.org/jira/browse/HIVE-16774
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
>
> query47.q query57.q



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16791) Tez engine giving very inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saumil Mayani updated HIVE-16791:
-
Attachment: sample-data-query.txt

Steps to reproduce the issue.

> Tez engine giving very inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> ---
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (HIVE-16791) Tez engine giving very inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030099#comment-16030099
 ] 

Saumil Mayani edited comment on HIVE-16791 at 5/30/17 8:51 PM:
---

Attached are the steps to reproduce the issue.


was (Author: smayani):
Steps to reproduce the issue.

> Tez engine giving very inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> ---
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
> Attachments: sample-data-query.txt
>
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16791) Tez engine giving very inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saumil Mayani updated HIVE-16791:
-
Description: 
SMB Join gives incorrect results. 
{code}
SMB-Join
set hive.execution.engine=tez;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=50;

OK
20161   11999639
20162   18955110
20172   22217437
Time taken: 92.647 seconds, Fetched: 3 row(s)
{code}
{code}
MAP-JOIN
set hive.execution.engine=tez;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=5000;
OK
20161   26586093
20162   17724062
20172   8862031
Time taken: 17.49 seconds, Fetched: 3 row(s)
{code}
{code}
Shuffle Join
set hive.execution.engine=tez;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=false;
set hive.auto.convert.join=false;
set hive.auto.convert.join.noconditionaltask.size=5000;

OK
20161   26586093
20162   17724062
20172   8862031
Time taken: 38.575 seconds, Fetched: 3 row(s)
{code}

  was:
SMB Join gives incorrect results.
{code}
SMB-Join
set hive.execution.engine=tez;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=50;

OK
20161   11999639
20162   18955110
20172   22217437
Time taken: 92.647 seconds, Fetched: 3 row(s)
{code}
{code}
MAP-JOIN
set hive.execution.engine=tez;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=true;
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask.size=5000;
OK
20161   26586093
20162   17724062
20172   8862031
Time taken: 17.49 seconds, Fetched: 3 row(s)
{code}
{code}
Shuffle Join
set hive.execution.engine=tez;
set hive.enforce.sortmergebucketmapjoin=false;
set hive.optimize.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.auto.convert.sortmerge.join=false;
set hive.auto.convert.join=false;
set hive.auto.convert.join.noconditionaltask.size=5000;

OK
20161   26586093
20162   17724062
20172   8862031
Time taken: 38.575 seconds, Fetched: 3 row(s)
{code}


> Tez engine giving very inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> ---
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
>
> SMB Join gives incorrect results. 
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by 

[jira] [Commented] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030093#comment-16030093
 ] 

Hive QA commented on HIVE-16757:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870425/HIVE-16757.05.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10791 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5477/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5477/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5477/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870425 - PreCommit-HIVE-Build

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16791) Tez engine giving very inaccurate results on SMB Map joins while map-join and shuffle join gets correct results

2017-05-30 Thread Saumil Mayani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saumil Mayani reassigned HIVE-16791:


Assignee: Deepak Jaiswal

> Tez engine giving very inaccurate results on SMB Map joins while map-join and 
> shuffle join gets correct results
> ---
>
> Key: HIVE-16791
> URL: https://issues.apache.org/jira/browse/HIVE-16791
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, HiveServer2
>Reporter: Saumil Mayani
>Assignee: Deepak Jaiswal
>
> SMB Join gives incorrect results.
> {code}
> SMB-Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=50;
> OK
> 2016  1   11999639
> 2016  2   18955110
> 2017  2   22217437
> Time taken: 92.647 seconds, Fetched: 3 row(s)
> {code}
> {code}
> MAP-JOIN
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=true;
> set hive.auto.convert.join=true;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 17.49 seconds, Fetched: 3 row(s)
> {code}
> {code}
> Shuffle Join
> set hive.execution.engine=tez;
> set hive.enforce.sortmergebucketmapjoin=false;
> set hive.optimize.bucketmapjoin=true;
> set hive.optimize.bucketmapjoin.sortedmerge=true;
> set hive.auto.convert.sortmerge.join=false;
> set hive.auto.convert.join=false;
> set hive.auto.convert.join.noconditionaltask.size=5000;
> OK
> 2016  1   26586093
> 2016  2   17724062
> 2017  2   8862031
> Time taken: 38.575 seconds, Fetched: 3 row(s)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16233) llap: Query failed with AllocatorOutOfMemoryException

2017-05-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030010#comment-16030010
 ] 

Sergey Shelukhin commented on HIVE-16233:
-

[~prasanth_j] [~gopalv] ping?

> llap: Query failed with AllocatorOutOfMemoryException
> -
>
> Key: HIVE-16233
> URL: https://issues.apache.org/jira/browse/HIVE-16233
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Siddharth Seth
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16233.01.patch, HIVE-16233.02.patch, 
> HIVE-16233.03.patch, HIVE-16233.04.patch, HIVE-16233.05.patch, 
> HIVE-16233.06.patch, HIVE-16233.06.patch
>
>
> {code}
> TaskAttempt 5 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1488231257387_2288_25_05_56_5:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:74)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
> ... 15 more
> Caused by: java.io.IOException: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79)
> at 
> org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:62)
> ... 17 more
> Caused by: java.io.IOException: 
> org.apache.hadoop.hive.common.io.Allocator$AllocatorOutOfMemoryException: 
> Failed to allocate 262144; at 0 out of 1
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:425)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:413)
> at 
> 

[jira] [Commented] (HIVE-16151) BytesBytesHashTable allocates large arrays

2017-05-30 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16030009#comment-16030009
 ] 

Sergey Shelukhin commented on HIVE-16151:
-

[~gopalv] ping?

> BytesBytesHashTable allocates large arrays
> --
>
> Key: HIVE-16151
> URL: https://issues.apache.org/jira/browse/HIVE-16151
> Project: Hive
>  Issue Type: Bug
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-16151.01.patch, HIVE-16151.patch
>
>
> These arrays cause GC pressure and also impose key count limitations on the 
> table. Wrt the latter, we won't be able to get rid of it without a 64-bit 
> hash function, but for now we can get rid of the former. If we need the 
> latter we'd add murmur64 and probably account for it differently for resize 
> (we don't want to blow up the hashtable by 4 bytes/key in the common case 
> where #of keys is less than ~1.5B :))



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16224) modify off-heap ORC metadata cache after ORC changes

2017-05-30 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-16224:

Description: 
Right now, the patch copy-pastes some code and also makes unnecessary copies. 
When released, we should use:
ORC-158
ORC-159
ORC-160
ORC-197

  was:
Right now, the patch copy-pastes some code and also makes unnecessary copies. 
When released, we should use:
ORC-158
ORC-159
ORC-160



> modify off-heap ORC metadata cache after ORC changes
> 
>
> Key: HIVE-16224
> URL: https://issues.apache.org/jira/browse/HIVE-16224
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> Right now, the patch copy-pastes some code and also makes unnecessary copies. 
> When released, we should use:
> ORC-158
> ORC-159
> ORC-160
> ORC-197



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16757) Use of deprecated getRows() instead of new estimateRowCount(RelMetadataQuery..) has serious performance impact

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu updated HIVE-16757:

Attachment: HIVE-16757.05.patch

Patch 05. fixes omitted instance call in FilterSelectivityEstimator.getMaxNDV

> Use of deprecated getRows() instead of new 
> estimateRowCount(RelMetadataQuery..) has serious performance impact
> --
>
> Key: HIVE-16757
> URL: https://issues.apache.org/jira/browse/HIVE-16757
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16757.01.patch, HIVE-16757.02.patch, 
> HIVE-16757.03.patch, HIVE-16757.04.patch, HIVE-16757.05.patch
>
>
> Calling Calcite's {{RelMetadataQuery.instance()}} is very expensive because 
> it places a new memoization cache on the stack. Hidden in the deperecated 
> {{AbstractRelNode.getRows()}} call is a call to {{instance()}}. In hive we 
> have a number of places where we're calling the deprecated {{getRows()}} 
> instead of the new API {{estimateRowCount(RelMetadataQuery mq)}} which 
> accepts the RelMetadataQuery, which most places we actually have it handy to 
> pass. On looking at the a complex query (49 joins) there are 2995340 calls to 
> {{AbstractRelNode.getRows}}, each one busting the current memoization cache 
> away.
> Was: -On complex queries HiveRelMdRowCount.getRowCount can get called many 
> times. since it does not memoize its result and the call is recursive, it 
> results in an explosion of calls. for example a query with 49 joins, during 
> join ordering (LoptOtimizerJoinRule) the HiveRelMdRowCount.getRowCount gets 
> called 6442 as a top level call, but the recursivity exploded this to 501729 
> calls. Memoization of the rezult would stop the recursion early. In my 
> testing this reduced the join reordering time for said query from 11s to 
> <1s..-
> Note there is no need for {{HiveRelMdRowCount}} memoization because the 
> function is called in stacks similar to this:
> {code}
>   at 
> org.apache.hadoop.hive.ql.optimizer.calcite.stats.HiveRelMdRowCount.getRowCount(HiveRelMdRowCount.java:66)
>   at GeneratedMetadataHandler_RowCount.getRowCount_$
>   at GeneratedMetadataHandler_RowCount.getRowCount
>   at 
> org.apache.calcite.rel.metadata.RelMetadataQuery.getRowCount(RelMetadataQuery.java:204)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.swapInputs(LoptOptimizeJoinRule.java:1865)
>   at 
> org.apache.calcite.rel.rules.LoptOptimizeJoinRule.createJoinSubtree(LoptOptimizeJoinRule.java:1739)
> {code}
> and {{GeneratedMetadataHandler_RowCount.getRowCount}} handles memoization.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16790) HiveCostModel.JoinAlgorithm and derivatives abuse RelMetadataQuery.instance

2017-05-30 Thread Remus Rusanu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Remus Rusanu reassigned HIVE-16790:
---


> HiveCostModel.JoinAlgorithm and derivatives abuse RelMetadataQuery.instance
> ---
>
> Key: HIVE-16790
> URL: https://issues.apache.org/jira/browse/HIVE-16790
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
>
> Calling {{RelMetadataQuery.instance()}} has serious performance implication, 
> as it invalidates the memoization cache used in Calcite, see HIVE-16757
> {{HiveCostModel.JoinAlgorithm}} and the derivate classes abuse this calls, 
> sometimes multiple times per function. All methods in JoinAlgortihm that need 
> a RelMetadataQuery should accept it as argument, not build one instance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029976#comment-16029976
 ] 

Jesus Camacho Rodriguez commented on HIVE-16788:


[~ashutoshc], this is a small fix, could you take a look? Thanks

> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> 

[jira] [Commented] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029955#comment-16029955
 ] 

Hive QA commented on HIVE-16788:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870413/HIVE-16788.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10790 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=237)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener.org.apache.hadoop.hive.ql.security.TestMultiAuthorizationPreEventListener
 (batchId=219)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5476/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5476/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5476/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870413 - PreCommit-HIVE-Build

> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at 

[jira] [Commented] (HIVE-16665) Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition

2017-05-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029906#comment-16029906
 ] 

Sergio Peña commented on HIVE-16665:


[~stakiar] I committed this to master. Should we add this to branch-2? If so, 
there is a failure trying to cherry pick the patch, otherwise, please close the 
jira.

> Race condition in Utilities.GetInputPathsCallable --> 
> createDummyFileForEmptyPartition
> --
>
> Key: HIVE-16665
> URL: https://issues.apache.org/jira/browse/HIVE-16665
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16665.1.patch, HIVE-16665.2.patch, 
> HIVE-16665.3.patch
>
>
> Looks like there is a race condition in the {{GetInputPathsCallable}} thread 
> when modifying the input {{MapWork}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16665) Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition

2017-05-30 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-16665:
---
Fix Version/s: 3.0.0

> Race condition in Utilities.GetInputPathsCallable --> 
> createDummyFileForEmptyPartition
> --
>
> Key: HIVE-16665
> URL: https://issues.apache.org/jira/browse/HIVE-16665
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16665.1.patch, HIVE-16665.2.patch, 
> HIVE-16665.3.patch
>
>
> Looks like there is a race condition in the {{GetInputPathsCallable}} thread 
> when modifying the input {{MapWork}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16789) Query with window function fails when where clause returns no records

2017-05-30 Thread Narayana (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayana updated HIVE-16789:

Description: 
When outer where clause found no matching records, queries like these are 
failing

{code}
select 
a,b,c,d,e,
min(c) over (partition by a,b order by d rows between 1 preceding and current 
row) prev_c
from
(
select '1234' a ,'abc' b,10 c,2 d,'test1' e
union all 
select '1234' a ,'abc' b,9 c,1 d,'test2' e
union all
select '1234' a ,'abc' b,11 c,3 d,'test2' e
union all  
select '1234' a ,'abcd' b,1 c,5 d,'test2' e
union all 
select '1234' a ,'abcd' b,6 c,9 d,'test1' e
)X
where e='test3'
;
{code}

Error:

Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
null
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.util.NoSuchElementException
at java.util.ArrayDeque.getFirst(Unknown Source)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
... 7 more

  was:
When outer where clause found no matching records, queries like these are 
failing

{code}
select 
a,b,c,d,e,
min(case when e='test1' then null else c end) over (partition by a,b order by d 
rows between 1 preceding and current row) prev_c
from
(
select '1234' a ,'abc' b,10 c,2 d,'test1' e
union all 
select '1234' a ,'abc' b,9 c,1 d,'test2' e
union all
select '1234' a ,'abc' b,11 c,3 d,'test2' e
union all  
select '1234' a ,'abcd' b,1 c,5 d,'test2' e
union all 
select '1234' a ,'abcd' b,6 c,9 d,'test1' e
)X
where e='test3'
;
{code}

Error:

Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
null
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.util.NoSuchElementException
at java.util.ArrayDeque.getFirst(Unknown Source)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
... 7 more


> Query with window function fails when where clause returns no records
> -
>
> Key: HIVE-16789
> URL: https://issues.apache.org/jira/browse/HIVE-16789
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1, 1.2.1
> Environment: OS: CentOS release 6.6 (Final)
> HDP: 2.7.3
> Hive: 1.0.1,1.2.1
>Reporter: Narayana
>
> When outer where clause found no matching records, queries like these are 
> failing
> {code}
> select 
> a,b,c,d,e,
> min(c) over (partition by a,b order by d rows between 1 preceding and current 
> row) prev_c
> from
> (
> select '1234' a ,'abc' 

[jira] [Updated] (HIVE-16789) Query with window function fails when where clause returns no records

2017-05-30 Thread Narayana (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Narayana updated HIVE-16789:

Description: 
When outer where clause found no matching records, queries like these are 
failing

{code}
select 
a,b,c,d,e,
min(case when e='test1' then null else c end) over (partition by a,b order by d 
rows between 1 preceding and current row) prev_c
from
(
select '1234' a ,'abc' b,10 c,2 d,'test1' e
union all 
select '1234' a ,'abc' b,9 c,1 d,'test2' e
union all
select '1234' a ,'abc' b,11 c,3 d,'test2' e
union all  
select '1234' a ,'abcd' b,1 c,5 d,'test2' e
union all 
select '1234' a ,'abcd' b,6 c,9 d,'test1' e
)X
where e='test3'
;
{code}

Error:

Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
null
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.util.NoSuchElementException
at java.util.ArrayDeque.getFirst(Unknown Source)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
... 7 more

  was:
When outer where clause found no matching records, queries like these are 
failing with 

{code}
select 
a,b,c,d,e,
min(case when e='test1' then null else c end) over (partition by a,b order by d 
rows between 1 preceding and current row) prev_c
from
(
select '1234' a ,'abc' b,10 c,2 d,'test1' e
union all 
select '1234' a ,'abc' b,9 c,1 d,'test2' e
union all
select '1234' a ,'abc' b,11 c,3 d,'test2' e
union all  
select '1234' a ,'abcd' b,1 c,5 d,'test2' e
union all 
select '1234' a ,'abcd' b,6 c,9 d,'test1' e
)X
where e='test3'
;
{code}

Error:

Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: 
null
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:295)
at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Unknown Source)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.util.NoSuchElementException
at java.util.ArrayDeque.getFirst(Unknown Source)
at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMax$MaxStreamingFixedWindow.terminate(GenericUDAFMax.java:280)
at 
org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.finishPartition(WindowingTableFunction.java:413)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:337)
at 
org.apache.hadoop.hive.ql.exec.PTFOperator.closeOp(PTFOperator.java:95)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
... 7 more


> Query with window function fails when where clause returns no records
> -
>
> Key: HIVE-16789
> URL: https://issues.apache.org/jira/browse/HIVE-16789
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 1.0.1, 1.2.1
> Environment: OS: CentOS release 6.6 (Final)
> HDP: 2.7.3
> Hive: 1.0.1,1.2.1
>Reporter: Narayana
>
> When outer where clause found no matching records, queries like these are 
> failing
> {code}
> select 
> a,b,c,d,e,
> min(case when e='test1' then null else c end) over (partition by a,b order by 
> d 

[jira] [Commented] (HIVE-16665) Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition

2017-05-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029892#comment-16029892
 ] 

Sergio Peña commented on HIVE-16665:


LGTM +1

> Race condition in Utilities.GetInputPathsCallable --> 
> createDummyFileForEmptyPartition
> --
>
> Key: HIVE-16665
> URL: https://issues.apache.org/jira/browse/HIVE-16665
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16665.1.patch, HIVE-16665.2.patch, 
> HIVE-16665.3.patch
>
>
> Looks like there is a race condition in the {{GetInputPathsCallable}} thread 
> when modifying the input {{MapWork}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16776) Strange cast behavior for table backed by druid

2017-05-30 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029886#comment-16029886
 ] 

Jesus Camacho Rodriguez commented on HIVE-16776:


Fails are unrelated. [~ashutoshc], could you take a look? Thanks

> Strange cast behavior for table backed by druid
> ---
>
> Key: HIVE-16776
> URL: https://issues.apache.org/jira/browse/HIVE-16776
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16776.patch
>
>
> The following query 
> {code} 
> explain select SUBSTRING(`Calcs`.`str0`,CAST(`Calcs`.`int2` AS int), 3) from 
> `druid_tableau`.`calcs` `Calcs`;
> OK
> Plan not optimized by CBO. 
> {code}
> fails the cbo with the following exception 
> {code} org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong 
> arguments '3': No matching method for class 
> org.apache.hadoop.hive.ql.udf.UDFSubstr with (string, bigint, int). Po
> ssible choices: _FUNC_(binary, int)  _FUNC_(binary, int, int)  _FUNC_(string, 
> int)  _FUNC_(string, int, int)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1355)
>  ~[hive-exec-2.1.0.2.6.0.2-SNAPSHOT.jar:2.1.0.2.6.0.2-SNA
> PSHOT]{code}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak

2017-05-30 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-16765:

Affects Version/s: 2.3.0

> ParquetFileReader should be closed to avoid resource leak
> -
>
> Key: HIVE-16765
> URL: https://issues.apache.org/jira/browse/HIVE-16765
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.3.0, 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16765.001.patch, HIVE-16765-branch-2.3.patch
>
>
> ParquetFileReader should be closed to avoid resource leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak

2017-05-30 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-16765:

   Resolution: Fixed
Fix Version/s: 2.3.0
   Status: Resolved  (was: Patch Available)

Ptest passed locally. Pushed to branch 2.3.

> ParquetFileReader should be closed to avoid resource leak
> -
>
> Key: HIVE-16765
> URL: https://issues.apache.org/jira/browse/HIVE-16765
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 2.3.0, 3.0.0
>
> Attachments: HIVE-16765.001.patch, HIVE-16765-branch-2.3.patch
>
>
> ParquetFileReader should be closed to avoid resource leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Work started] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-16788 started by Jesus Camacho Rodriguez.
--
> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 

[jira] [Updated] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16788:
---
Status: Patch Available  (was: In Progress)

> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 

[jira] [Updated] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-16788:
---
Attachment: HIVE-16788.patch

> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16788.patch
>
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 

[jira] [Assigned] (HIVE-16788) ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK arguments

2017-05-30 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-16788:
--


> ODBC call SQLForeignKeys leads to NPE if you use PK arguments rather than FK 
> arguments
> --
>
> Key: HIVE-16788
> URL: https://issues.apache.org/jira/browse/HIVE-16788
> Project: Hive
>  Issue Type: Bug
>  Components: ODBC
>Reporter: Carter Shanklin
>Assignee: Jesus Camacho Rodriguez
>
> This ODBC call is meant to allow you to determine FK relationships either 
> from the PK side or from the FK side.
> Hive only allows you to traverse from the FK side, trying it from the PK side 
> leads to an NPE.
> Example using the table "customer" from TPC-H with FKs defined in Hive:
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'HIVE', u'tpch_bin_flat_orc_2', u'nation', u'n_nationkey', u'HIVE', 
> u'tpch_bin_flat_orc_2', u'customer', u'c_nationkey', 1, 0, 0, u'custome
> r_c2', u'nation_c1', 0)
> Not using table as foreign source
> Got an error from the server for customer!
> {code}
> Compare: Postgres
> {code}
> === Foreign Keys ===
> Using table as foreign source
> (u'vagrant', u'public', u'nation', u'n_nationkey', u'vagrant', u'public', 
> u'customer', u'c_nationkey', 1, 3, 3, u'customer_c_nationkey_fkey', 
> u'nation_pkey', 7)
> Not using table as foreign source
> (u'vagrant', u'public', u'customer', u'c_custkey', u'vagrant', u'public', 
> u'orders', u'o_custkey', 1, 3, 3, u'orders_o_custkey_fkey', u'customer_pkey', 
> 7)
> {code}
> Note that Postgres allows traversal from either way. The traceback you get in 
> the HS2 logs is this:
> {code}
> 2016-12-04T21:08:55,398 ERROR [8998ca98-9940-49f8-8833-7c6ebd8c96a2 
> HiveServer2-Handler-Pool: Thread-53] metastore.RetryingHMSHandler: MetaEx
> ception(message:java.lang.NullPointerException)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newMetaException(HiveMetaStore.java:5785)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_foreign_keys(HiveMetaStore.java:6474)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy25.get_foreign_keys(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getForeignKeys(HiveMetaStoreClient.java:1596)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy26.getForeignKeys(Unknown Source)
> at 
> org.apache.hive.service.cli.operation.GetCrossReferenceOperation.runInternal(GetCrossReferenceOperation.java:128)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:324)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getCrossReference(HiveSessionImpl.java:933)
> at 
> org.apache.hive.service.cli.CLIService.getCrossReference(CLIService.java:411)
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetCrossReference(ThriftCLIService.java:738)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1617)
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetCrossReference.getResult(TCLIService.java:1602)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
>   

[jira] [Resolved] (HIVE-16733) Support conflict column name in order by

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong resolved HIVE-16733.

Resolution: Fixed

> Support conflict column name in order by
> 
>
> Key: HIVE-16733
> URL: https://issues.apache.org/jira/browse/HIVE-16733
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: rr.patch
>
>
> There is a bug in RR which is exposed in HIVE-15160. After resolving the bug, 
> we can support both:
> select key as value from src order by src.value
> select key as value from src order by value



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage" twice

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16507:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Hive Explain User-Level may print out "Vertex dependency in root stage" twice
> -
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.2.0, 2.3.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16507.1.patch, HIVE-16507.2.patch
>
>
> User-level explain plans have a section titled {{Vertex dependency in root 
> stage}} - which (according to the name) prints out the dependencies between 
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out 
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then 
> iterates over all the stages, and if the stage contains any edges, it prints 
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root 
> stage}} then we should add a check to see if the stage we are processing 
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the 
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in 
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root 
> stage that contains edges, but it is possible for Hive-on-Spark (support 
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Join Operator [JOIN_28] (rows=2200 width=10)
> condition 
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
>   <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Reducer 5 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
>   Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
>   predicate:_col0 is not null
>   Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
>   <-Map 4 [GROUP]
> GROUP [RS_6]
>   Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
>   Output:["ds"]
>   TableScan [TS_3] (rows=2000 width=10)

[jira] [Updated] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage" twice

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16507:
---
Affects Version/s: 2.3.0
   2.2.0
   2.0.0

> Hive Explain User-Level may print out "Vertex dependency in root stage" twice
> -
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.2.0, 2.3.0
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16507.1.patch, HIVE-16507.2.patch
>
>
> User-level explain plans have a section titled {{Vertex dependency in root 
> stage}} - which (according to the name) prints out the dependencies between 
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out 
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then 
> iterates over all the stages, and if the stage contains any edges, it prints 
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root 
> stage}} then we should add a check to see if the stage we are processing 
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the 
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in 
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root 
> stage that contains edges, but it is possible for Hive-on-Spark (support 
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Join Operator [JOIN_28] (rows=2200 width=10)
> condition 
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
>   <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Reducer 5 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
>   Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
>   predicate:_col0 is not null
>   Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
>   <-Map 4 [GROUP]
> GROUP [RS_6]
>   Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
>   Output:["ds"]
>   TableScan [TS_3] 

[jira] [Updated] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage" twice

2017-05-30 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-16507:
---
Fix Version/s: 3.0.0

> Hive Explain User-Level may print out "Vertex dependency in root stage" twice
> -
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Fix For: 3.0.0
>
> Attachments: HIVE-16507.1.patch, HIVE-16507.2.patch
>
>
> User-level explain plans have a section titled {{Vertex dependency in root 
> stage}} - which (according to the name) prints out the dependencies between 
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out 
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then 
> iterates over all the stages, and if the stage contains any edges, it prints 
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root 
> stage}} then we should add a check to see if the stage we are processing 
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the 
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in 
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root 
> stage that contains edges, but it is possible for Hive-on-Spark (support 
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Join Operator [JOIN_28] (rows=2200 width=10)
> condition 
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
>   <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Reducer 5 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
>   Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
>   predicate:_col0 is not null
>   Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
>   <-Map 4 [GROUP]
> GROUP [RS_6]
>   Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
>   Output:["ds"]
>   TableScan [TS_3] (rows=2000 width=10)
> 
> default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>  

[jira] [Commented] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage" twice

2017-05-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029783#comment-16029783
 ] 

Sahil Takiar commented on HIVE-16507:
-

[~pxiong] can this be merged? Thanks.

> Hive Explain User-Level may print out "Vertex dependency in root stage" twice
> -
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16507.1.patch, HIVE-16507.2.patch
>
>
> User-level explain plans have a section titled {{Vertex dependency in root 
> stage}} - which (according to the name) prints out the dependencies between 
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out 
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then 
> iterates over all the stages, and if the stage contains any edges, it prints 
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root 
> stage}} then we should add a check to see if the stage we are processing 
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the 
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in 
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root 
> stage that contains edges, but it is possible for Hive-on-Spark (support 
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select 
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Reducer 3
>   File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
>   GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
>   Output:["_col0"],aggregations:["count()"]
>   Join Operator [JOIN_28] (rows=2200 width=10)
> condition 
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
>   <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
>   PartitionCols:_col0
>   Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
>   default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
>   <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
>   PartitionCols:_col0
>   Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
>   <-Reducer 5 [GROUP]
> GROUP [RS_23]
>   PartitionCols:_col0
>   Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
>   predicate:_col0 is not null
>   Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
>   <-Map 4 [GROUP]
> GROUP [RS_6]
>   Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
>   Output:["ds"]
>   TableScan [TS_3] (rows=2000 width=10)
> 
> 

[jira] [Commented] (HIVE-16665) Race condition in Utilities.GetInputPathsCallable --> createDummyFileForEmptyPartition

2017-05-30 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029784#comment-16029784
 ] 

Sahil Takiar commented on HIVE-16665:
-

[~spena] could you take a look?

> Race condition in Utilities.GetInputPathsCallable --> 
> createDummyFileForEmptyPartition
> --
>
> Key: HIVE-16665
> URL: https://issues.apache.org/jira/browse/HIVE-16665
> Project: Hive
>  Issue Type: Bug
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16665.1.patch, HIVE-16665.2.patch, 
> HIVE-16665.3.patch
>
>
> Looks like there is a race condition in the {{GetInputPathsCallable}} thread 
> when modifying the input {{MapWork}} object.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak

2017-05-30 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029742#comment-16029742
 ] 

Ferdinand Xu commented on HIVE-16765:
-

Thanks [~spena] for this information. Yes, this issue is critical issue that 
leads to Parquet Vectorization Reader unable to use. 

> ParquetFileReader should be closed to avoid resource leak
> -
>
> Key: HIVE-16765
> URL: https://issues.apache.org/jira/browse/HIVE-16765
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16765.001.patch, HIVE-16765-branch-2.3.patch
>
>
> ParquetFileReader should be closed to avoid resource leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-16787) Fix itests in branch-2.2

2017-05-30 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-16787:



> Fix itests in branch-2.2
> 
>
> Key: HIVE-16787
> URL: https://issues.apache.org/jira/browse/HIVE-16787
> Project: Hive
>  Issue Type: Bug
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
>
> The itests are broken in branch 2.2 and need to be fixed before release.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14990) run all tests for MM tables and fix the issues that are found

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029627#comment-16029627
 ] 

Hive QA commented on HIVE-14990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870147/HIVE-14990.19.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5475/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5475/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5475/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-05-30 16:12:02.642
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-5475/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-05-30 16:12:02.645
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   e2ecc92..fea9142  storage-branch-2.3 -> origin/storage-branch-2.3
 * [new tag] rel/storage-release-2.3.1 -> rel/storage-release-2.3.1
+ git reset --hard HEAD
HEAD is now at 8dcc78a HIVE-16727 : REPL DUMP for insert event should't fail if 
the table is already dropped. (Sankar Hariappan via Thejas Nair
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 8dcc78a HIVE-16727 : REPL DUMP for insert event should't fail if 
the table is already dropped. (Sankar Hariappan via Thejas Nair
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-05-30 16:12:09.902
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: a/common/src/java/org/apache/hadoop/hive/common/FileUtils.java: No such 
file or directory
error: a/common/src/java/org/apache/hadoop/hive/common/HiveStatsUtils.java: No 
such file or directory
error: a/common/src/java/org/apache/hadoop/hive/common/JavaUtils.java: No such 
file or directory
error: a/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java: No such 
file or directory
error: a/data/scripts/q_test_init.sql: No such file or directory
error: a/data/scripts/q_test_init_compare.sql: No such file or directory
error: a/data/scripts/q_test_init_contrib.sql: No such file or directory
error: a/data/scripts/q_test_init_for_minimr.sql: No such file or directory
error: a/data/scripts/q_test_init_src.sql: No such file or directory
error: a/data/scripts/q_test_init_src_with_stats.sql: No such file or directory
error: a/data/scripts/q_test_init_tez.sql: No such file or directory
error: 
a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FileOutputCommitterContainer.java:
 No such file or directory
error: 
a/hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatOutputFormat.java:
 No such file or directory
error: 
a/itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java:
 No such file or directory
error: 
a/itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/history/TestHiveHistory.java:
 No such file or directory
error: a/itests/src/test/resources/testconfiguration.properties: No such file 
or directory
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java:
 No such file or directory
error: 
a/llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/VectorDeserializeOrcWriter.java:
 No such file or directory
error: a/metastore/if/hive_metastore.thrift: No such file or directory
error: a/metastore/scripts/upgrade/derby/hive-schema-2.2.0.derby.sql: No such 
file or directory
error: 

[jira] [Commented] (HIVE-16765) ParquetFileReader should be closed to avoid resource leak

2017-05-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-16765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029603#comment-16029603
 ] 

Sergio Peña commented on HIVE-16765:


[~Ferd] There is only one Jenkins job for all branches. As long as you specify 
the branch name, the Jenkins job should run the tests on it. You already did 
it, so we're good. Make sure to run the tests on branch-2 as well. as this will 
become the 2.4 version. I think 2.3 is frozen now. is this a critical issue?

> ParquetFileReader should be closed to avoid resource leak
> -
>
> Key: HIVE-16765
> URL: https://issues.apache.org/jira/browse/HIVE-16765
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-16765.001.patch, HIVE-16765-branch-2.3.patch
>
>
> ParquetFileReader should be closed to avoid resource leak



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16776) Strange cast behavior for table backed by druid

2017-05-30 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029479#comment-16029479
 ] 

Hive QA commented on HIVE-16776:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12870389/HIVE-16776.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10791 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_boolean] 
(batchId=78)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=145)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5474/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5474/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5474/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12870389 - PreCommit-HIVE-Build

> Strange cast behavior for table backed by druid
> ---
>
> Key: HIVE-16776
> URL: https://issues.apache.org/jira/browse/HIVE-16776
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 3.0.0
>Reporter: slim bouguerra
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-16776.patch
>
>
> The following query 
> {code} 
> explain select SUBSTRING(`Calcs`.`str0`,CAST(`Calcs`.`int2` AS int), 3) from 
> `druid_tableau`.`calcs` `Calcs`;
> OK
> Plan not optimized by CBO. 
> {code}
> fails the cbo with the following exception 
> {code} org.apache.hadoop.hive.ql.parse.SemanticException: Line 0:-1 Wrong 
> arguments '3': No matching method for class 
> org.apache.hadoop.hive.ql.udf.UDFSubstr with (string, bigint, int). Po
> ssible choices: _FUNC_(binary, int)  _FUNC_(binary, int, int)  _FUNC_(string, 
> int)  _FUNC_(string, int, int)
> at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:1355)
>  ~[hive-exec-2.1.0.2.6.0.2-SNAPSHOT.jar:2.1.0.2.6.0.2-SNA
> PSHOT]{code}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >