[jira] [Updated] (HIVE-14578) Fixes to TestEncryptedHDFSCliDriver

2016-09-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14578:
-
Attachment: HIVE-14578.3.patch

> Fixes to TestEncryptedHDFSCliDriver
> ---
>
> Key: HIVE-14578
> URL: https://issues.apache.org/jira/browse/HIVE-14578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14578.1.patch, HIVE-14578.2.patch, 
> HIVE-14578.3.patch
>
>
> 900s runtime. One of the stragglers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14578) Fixes to TestEncryptedHDFSCliDriver

2016-09-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14578:
-
Status: Patch Available  (was: Open)

> Fixes to TestEncryptedHDFSCliDriver
> ---
>
> Key: HIVE-14578
> URL: https://issues.apache.org/jira/browse/HIVE-14578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14578.1.patch, HIVE-14578.2.patch
>
>
> 900s runtime. One of the stragglers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14578) Fixes to TestEncryptedHDFSCliDriver

2016-09-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14578:
-
Attachment: HIVE-14578.2.patch

> Fixes to TestEncryptedHDFSCliDriver
> ---
>
> Key: HIVE-14578
> URL: https://issues.apache.org/jira/browse/HIVE-14578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14578.1.patch, HIVE-14578.2.patch
>
>
> 900s runtime. One of the stragglers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14578) Fixes to TestEncryptedHDFSCliDriver

2016-09-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14578:
-
Status: Open  (was: Patch Available)

> Fixes to TestEncryptedHDFSCliDriver
> ---
>
> Key: HIVE-14578
> URL: https://issues.apache.org/jira/browse/HIVE-14578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14578.1.patch, HIVE-14578.2.patch
>
>
> 900s runtime. One of the stragglers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14578) Fixes to TestEncryptedHDFSCliDriver

2016-09-09 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14578:
-
Status: Patch Available  (was: Open)

> Fixes to TestEncryptedHDFSCliDriver
> ---
>
> Key: HIVE-14578
> URL: https://issues.apache.org/jira/browse/HIVE-14578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14578.1.patch
>
>
> 900s runtime. One of the stragglers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14578) Fixes to TestEncryptedHDFSCliDriver

2016-09-09 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14578:
-
Attachment: HIVE-14578.1.patch

> Fixes to TestEncryptedHDFSCliDriver
> ---
>
> Key: HIVE-14578
> URL: https://issues.apache.org/jira/browse/HIVE-14578
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14578.1.patch
>
>
> 900s runtime. One of the stragglers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-31 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453362#comment-15453362
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14665:
--

IF to SequenceIF happens because of llap mode (as it happens with other tests 
as part of HIVE-14502, e.g. vectorized_timestamp_funcs.q). The _col2 is a bit 
of a surprise, however it is an improvement since we do an effective column 
pruning for the query.



> vector_join_part_col_char.q failure
> ---
>
> Key: HIVE-14665
> URL: https://issues.apache.org/jira/browse/HIVE-14665
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14665.1.patch
>
>
> Happens 100% of the time. Looks like a missed golden file update from 
> HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14681) Remove AssertionError for non zero return codes in QTestUtil cleanup

2016-08-31 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453311#comment-15453311
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14681:
--

+1. Thanks for adding a patch to fix this issue.

> Remove AssertionError for non zero return codes in QTestUtil cleanup
> 
>
> Key: HIVE-14681
> URL: https://issues.apache.org/jira/browse/HIVE-14681
> Project: Hive
>  Issue Type: Sub-task
>  Components: Testing Infrastructure
>Reporter: Siddharth Seth
> Attachments: HIVE-14681.01.patch
>
>
> This was added in HIVE-14625.
> Causes some tests to fail randomly - after executing all the actual queries.
> Shows up as 
> {code}
> org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
> {code}
> without any test name.
> The failures are related to the inability to drop databases. I suspect this 
> has something to do with specific q files running grant etc statements.
> Will create a separate jira to track this down.
> Fow now, I think it makes sense to convert the Assertion to a warning.
> {code}java.lang.AssertionError: Failed during cleanup processLine with 
> code=403{code}
> {code}2016-08-31T11:47:05,627 ERROR [adab525e-198b-49f9-a2c3-8cc7df47b8ea 
> main] ql.Driver: Authorization failed:No privilege 'Drop' found for outputs { 
> database:default, table:src}. Use SHOW GRANT to get more details.{code}
> {code}java.lang.AssertionError: Failed during cleanup processLine with 
> code=4{code}
> {code}2016-08-31T11:41:29,849 ERROR [3906df31-d034-4dfe-8f78-239140c686aa 
> main] ql.Driver: FAILED: HiveAccessControlException Permission denied: 
> Principal [name=user1, type=USER] does not have following privileges for 
> operation DROPTABLE [[OBJECT OWNERSHIP] on Object [type=TABLE_OR_VIEW, 
> name=default.src]]
> org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException:
>  Permission denied: Principal [name=user1, type=USER] does not have following 
> privileges for operation DROPTABLE [[OBJECT OWNERSHIP] on Object 
> [type=TABLE_OR_VIEW, name=default.src]]
>   at 
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLAuthorizationUtils.assertNoDeniedPermissions(SQLAuthorizationUtils.java:476)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:86)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidatorForTest.checkPrivileges(SQLStdHiveAuthorizationValidatorForTest.java:104)
>   at 
> org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerImpl.checkPrivileges(HiveAuthorizerImpl.java:87)
>   at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:868)
>   at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:655)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:502)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:342)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1235)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1355)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1143)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335)
>   at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:861)
>   at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:838)
>   at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:500)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver$6.invokeInternal(CoreCliDriver.java:138)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver$6.invokeInternal(CoreCliDriver.java:135)
>   at 
> org.apache.hadoop.hive.util.ElapsedTimeLoggingWrapper.invoke(ElapsedTimeLoggingWrapper.java:33)
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.shutdown(CoreCliDriver.java:141)
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$1$1.evaluate(CliAdapter.java:75)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:367)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:274)
>   at 
> 

[jira] [Commented] (HIVE-14673) Orc orc_merge_diff_fs.q and orc_llap.q test improvement

2016-08-31 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453112#comment-15453112
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14673:
--

+1

> Orc orc_merge_diff_fs.q and orc_llap.q test improvement
> ---
>
> Key: HIVE-14673
> URL: https://issues.apache.org/jira/browse/HIVE-14673
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14673.1.patch
>
>
> orc_merge_diff_fs.q and orc_llap.q are slow (350.487s and 290.877s 
> respectively). We can move orc_merge_diff_fs.q to MiniLlap as we are testing 
> merge across filesystems and there are several orc merge tests for mr. 
> orc_llap.q seems to be creating a lot of temp tables and running sum(hash(*)) 
> againsts which seems to be slow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14670) org.apache.hadoop.hive.ql.TestMTQueries failure

2016-08-31 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14670:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> org.apache.hadoop.hive.ql.TestMTQueries failure
> ---
>
> Key: HIVE-14670
> URL: https://issues.apache.org/jira/browse/HIVE-14670
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0
>
> Attachments: HIVE-14670.1.patch
>
>
> Introduced by HIVE-14627. We used to have a common q_init file for MR and 
> CliDriver tests till HIVE-14627 was committed. Now, that the init files are 
> separate and join1.q and groupby2.q are run as part of minimr tests, we 
> cannot use these tests to test multi-threaded queries with the same setup 
> file because they would result in different stats (due to the way the init 
> scripts are written). The easy fix would  be to substitute join1.q and 
> groupby2.q with 2 files that actually run in CliDriver mode or to remove the 
> files altogether. I am going with the latter option since we already have 6 
> query files in parallel and additionally we are trying to achieve a faster 
> run. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14670) org.apache.hadoop.hive.ql.TestMTQueries failure

2016-08-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14670:
-
Attachment: HIVE-14670.1.patch

[~sseth] or [~prasanth_j] can either of you please take a look

> org.apache.hadoop.hive.ql.TestMTQueries failure
> ---
>
> Key: HIVE-14670
> URL: https://issues.apache.org/jira/browse/HIVE-14670
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14670.1.patch
>
>
> Introduced by HIVE-14627. We used to have a common q_init file for MR and 
> CliDriver tests till HIVE-14627 was committed. Now, that the init files are 
> separate and join1.q and groupby2.q are run as part of minimr tests, we 
> cannot use these tests to test multi-threaded queries with the same setup 
> file because they would result in different stats (due to the way the init 
> scripts are written). The easy fix would  be to substitute join1.q and 
> groupby2.q with 2 files that actually run in CliDriver mode or to remove the 
> files altogether. I am going with the latter option since we already have 6 
> query files in parallel and additionally we are trying to achieve a faster 
> run. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14670) org.apache.hadoop.hive.ql.TestMTQueries failure

2016-08-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14670:
-
Status: Patch Available  (was: Open)

> org.apache.hadoop.hive.ql.TestMTQueries failure
> ---
>
> Key: HIVE-14670
> URL: https://issues.apache.org/jira/browse/HIVE-14670
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14670.1.patch
>
>
> Introduced by HIVE-14627. We used to have a common q_init file for MR and 
> CliDriver tests till HIVE-14627 was committed. Now, that the init files are 
> separate and join1.q and groupby2.q are run as part of minimr tests, we 
> cannot use these tests to test multi-threaded queries with the same setup 
> file because they would result in different stats (due to the way the init 
> scripts are written). The easy fix would  be to substitute join1.q and 
> groupby2.q with 2 files that actually run in CliDriver mode or to remove the 
> files altogether. I am going with the latter option since we already have 6 
> query files in parallel and additionally we are trying to achieve a faster 
> run. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14670) org.apache.hadoop.hive.ql.TestMTQueries failure

2016-08-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14670:
-
Description: Introduced by HIVE-14627. We used to have a common q_init file 
for MR and CliDriver tests till HIVE-14627 was committed. Now, that the init 
files are separate and join1.q and groupby2.q are run as part of minimr tests, 
we cannot use these tests to test multi-threaded queries with the same setup 
file because they would result in different stats (due to the way the init 
scripts are written). The easy fix would  be to substitute join1.q and 
groupby2.q with 2 files that actually run in CliDriver mode or to remove the 
files altogether. I am going with the latter option since we already have 6 
query files in parallel and additionally we are trying to achieve a faster run. 
  (was: Introduced by HIVE-14627. We used to have a common q_init file for MR 
and CliDriver tests till HIVE-14627 was committed. Now, that the init files are 
separate and join1.q and groupby2.q are run as part of minimr tests, we cannot 
use these tests to test multi-threaded queries with the same setup file because 
they would result in different stats (due to the way the init scripts are 
written). The easy fix would  be to substitute join1.q and groupby2.q with 2 
files that actually run in CliDriver mode.)

> org.apache.hadoop.hive.ql.TestMTQueries failure
> ---
>
> Key: HIVE-14670
> URL: https://issues.apache.org/jira/browse/HIVE-14670
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> Introduced by HIVE-14627. We used to have a common q_init file for MR and 
> CliDriver tests till HIVE-14627 was committed. Now, that the init files are 
> separate and join1.q and groupby2.q are run as part of minimr tests, we 
> cannot use these tests to test multi-threaded queries with the same setup 
> file because they would result in different stats (due to the way the init 
> scripts are written). The easy fix would  be to substitute join1.q and 
> groupby2.q with 2 files that actually run in CliDriver mode or to remove the 
> files altogether. I am going with the latter option since we already have 6 
> query files in parallel and additionally we are trying to achieve a faster 
> run. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449833#comment-15449833
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14576:
--

[~sseth] There is already a HBaseCliConfig which does HBase tests without any 
MiniCluster, not sure why HBaseMinimrCliConfig is required for unit tests. 
[~sushanth], would it be safe to remove this altogether given that 
TestHBaseCliDriver performs these tests.

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449833#comment-15449833
 ] 

Hari Sankar Sivarama Subramaniyan edited comment on HIVE-14576 at 8/30/16 7:00 
PM:
---

[~sseth] There is already a HBaseCliConfig which does HBase tests without any 
MiniCluster, not sure why HBaseMinimrCliConfig is additionally required for 
unit tests. [~sushanth], would it be safe to remove this altogether given that 
TestHBaseCliDriver performs these tests.


was (Author: hsubramaniyan):
[~sseth] There is already a HBaseCliConfig which does HBase tests without any 
MiniCluster, not sure why HBaseMinimrCliConfig is required for unit tests. 
[~sushanth], would it be safe to remove this altogether given that 
TestHBaseCliDriver performs these tests.

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447457#comment-15447457
 ] 

Hari Sankar Sivarama Subramaniyan edited comment on HIVE-14576 at 8/29/16 11:48 
PM:


Brings down the total runtime from ~1000 seconds to ~255 seconds.


was (Author: hsubramaniyan):
Brings down the runtime to ~255 seconds.

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14576:
-
Attachment: HIVE-14576.1.patch

Brings down the runtime to ~255 seconds.

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14576:
-
Status: Patch Available  (was: Open)

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14576.1.patch
>
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447454#comment-15447454
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14576:
--

The immediate fix to point 1 mentioned in the description would be to use 
setInitScript("q_test_init_for_minimr.sql");

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14576) Testing: Fixes to TestHBaseMinimrCliDriver

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan reassigned HIVE-14576:


Assignee: Hari Sankar Sivarama Subramaniyan  (was: Vaibhav Gumashta)

> Testing: Fixes to TestHBaseMinimrCliDriver
> --
>
> Key: HIVE-14576
> URL: https://issues.apache.org/jira/browse/HIVE-14576
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vaibhav Gumashta
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> 1. Runtime over 1000s.
> 2. Runs as an isolated test.
> Need to fix both.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14665:
-
Status: Patch Available  (was: Open)

> vector_join_part_col_char.q failure
> ---
>
> Key: HIVE-14665
> URL: https://issues.apache.org/jira/browse/HIVE-14665
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14665.1.patch
>
>
> Happens 100% of the time. Looks like a missed golden file update from 
> HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14665) vector_join_part_col_char.q failure

2016-08-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14665:
-
Attachment: HIVE-14665.1.patch

cc [~prasanth_j]

> vector_join_part_col_char.q failure
> ---
>
> Key: HIVE-14665
> URL: https://issues.apache.org/jira/browse/HIVE-14665
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14665.1.patch
>
>
> Happens 100% of the time. Looks like a missed golden file update from 
> HIVE-14502.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-27 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441749#comment-15441749
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14612:
--

Thanks Lefty for reminding me. I've updated the same.

> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0
>
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-27 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14612:
-
Fix Version/s: 2.2.0

> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0
>
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14612:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438121#comment-15438121
 ] 

Hari Sankar Sivarama Subramaniyan edited comment on HIVE-14612 at 8/25/16 11:21 
PM:


[~prasanth_j] The reason we are changing the layout is because the console 
layout is defined in hive-log4j2.properties via the following lines: 
{code}
rootLogger.appenderRef.console.ref = console

# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n
{code}
And, the hardcoded patterns are used only if the layout for the console was set 
to null, which in this case is not so.
https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java#L213

Also, I tested via the debugger (with and without HIVE-14405) to ensure that 
this is the case.



was (Author: hsubramaniyan):
[~prasanth_j] The reason we are changing the layout is because the console 
layout is defined in hive-log4j2.properties via the following lines: 

rootLogger.appenderRef.console.ref = console

# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n

And, the hardcoded patterns are used only if the layout for the console was set 
to null, which in this case is not so.
https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java#L213

Also, I tested via the debugger (with and without HIVE-14405) to ensure that 
this is the case.


> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438121#comment-15438121
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14612:
--

[~prasanth_j] The reason we are changing the layout is because the console 
layout is defined in hive-log4j2.properties via the following lines: 

rootLogger.appenderRef.console.ref = console

# console appender
appender.console.type = Console
appender.console.name = console
appender.console.target = SYSTEM_ERR
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %d{ISO8601} %5p [%t] %c{2}: %m%n

And, the hardcoded patterns are used only if the layout for the console was set 
to null, which in this case is not so.
https://github.com/apache/hive/blob/master/service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java#L213

Also, I tested via the debugger (with and without HIVE-14405) to ensure that 
this is the case.


> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14546) TestMiniTezCliDriver plan diffs

2016-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan resolved HIVE-14546.
--
Resolution: Cannot Reproduce

> TestMiniTezCliDriver plan diffs
> ---
>
> Key: HIVE-14546
> URL: https://issues.apache.org/jira/browse/HIVE-14546
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The following tests are failing consistently for the last some days in 
> precommit runs:
> {code}
>  org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez_join_hash] 
> 56 sec  5
>  org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] 
> 35 sec  19
>  org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] 
> 58 sec  19
>  
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] 
>2.8 sec 19
>  
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] 
>2.4 sec 19
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14546) TestMiniTezCliDriver plan diffs

2016-08-25 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437728#comment-15437728
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14546:
--

Verified these have been resolved by HIVE-14502. Closing as cannot reproduce.

> TestMiniTezCliDriver plan diffs
> ---
>
> Key: HIVE-14546
> URL: https://issues.apache.org/jira/browse/HIVE-14546
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The following tests are failing consistently for the last some days in 
> precommit runs:
> {code}
>  org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez_join_hash] 
> 56 sec  5
>  org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2] 
> 35 sec  19
>  org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1] 
> 58 sec  19
>  
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1] 
>2.8 sec 19
>  
> org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1] 
>2.4 sec 19
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-24 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14612:
-
Status: Patch Available  (was: Open)

> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-24 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14612:
-
Attachment: HIVE-14612.1.patch

cc [~ashutoshc]

> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14612.1.patch
>
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14612) org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout failure

2016-08-24 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435509#comment-15435509
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14612:
--

Seems to have been introduced by HIVE-14405.   setting 
rootLogger.appenderRef.console.ref = console seems to have introduced the 
layout pattern as specified in data/conf/hive-log4j2.properties and hence the 
additional date information before the actual INFO/DEBUG message is printed. 
The fix would be a simple regex change.


> org.apache.hive.service.cli.operation.TestOperationLoggingLayout.testSwitchLogLayout
>  failure
> 
>
> Key: HIVE-14612
> URL: https://issues.apache.org/jira/browse/HIVE-14612
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> Failing for some time



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14503) Remove explicit order by in qfiles for union tests

2016-08-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427270#comment-15427270
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14503:
--

looks good, conditional +1 based on clean run. Also, will be a "nice to have":  
the comment I had added in the RB.


> Remove explicit order by in qfiles for union tests
> --
>
> Key: HIVE-14503
> URL: https://issues.apache.org/jira/browse/HIVE-14503
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14503.1.patch, HIVE-14503.2.patch, 
> HIVE-14503.3.patch, HIVE-14503.4.patch
>
>
> Identify qfiles with explicit order by and replace them with 
> SORT_QUERY_RESULTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14552) TestParseNegative fix

2016-08-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14552:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0
>
> Attachments: HIVE-14552.1.patch, HIVE-14552.2.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14552) TestParseNegative fix

2016-08-17 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14552:
-
Status: Patch Available  (was: Open)

> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14552.1.patch, HIVE-14552.2.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14552) TestParseNegative fix

2016-08-17 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14552:
-
Status: Open  (was: Patch Available)

> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14552.1.patch, HIVE-14552.2.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14552) TestParseNegative fix

2016-08-17 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14552:
-
Attachment: HIVE-14552.2.patch

> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14552.1.patch, HIVE-14552.2.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14552) TestParseNegative fix

2016-08-17 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15425011#comment-15425011
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14552:
--

[~ashutoshc] That makes sense. I am adding it in the new patch.
[~sseth] Yes, the unnecessary data generation and removal between the tests is 
the cause for this issue. Since these tests fail in the parse stage, these 
steps are totally trivial.


> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14552.1.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14552) TestParseNegative fix

2016-08-16 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14552:
-
Attachment: HIVE-14552.1.patch

[~ashutoshc] Was able to verify locally that the change is good to get the 
tests passing at  ~ 80 seconds instead of the 1300 seconds otherwise. Can you 
please review.

> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14552.1.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14552) TestParseNegative fix

2016-08-16 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14552:
-
Status: Patch Available  (was: Open)

> TestParseNegative fix
> -
>
> Key: HIVE-14552
> URL: https://issues.apache.org/jira/browse/HIVE-14552
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14552.1.patch
>
>
> 1300s runtime.
> Straggler towards the end of the build.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-16 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14463:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0
>
> Attachments: HIVE-14463.1.patch
>
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at 

[jira] [Updated] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14463:
-
Status: Patch Available  (was: Open)

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14463.1.patch
>
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> 

[jira] [Comment Edited] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421904#comment-15421904
 ] 

Hari Sankar Sivarama Subramaniyan edited comment on HIVE-14463 at 8/15/16 11:55 
PM:


Root cause of this problem is similar to HIVE-14424. The query is failing 
because of :
{code}
java.lang.RuntimeException: Error applying authorization policy on hive 
configuration: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.ClassNotFoundException: 
org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest
{code}
The reason the test used to hang was because consumer.receive() is a blocking 
call. Part of the issue, i.e. hang has been fixed by HIVE-14520 by converting 
the blocking calls to non-blocking calls. The other issue, i.e. the actual 
error is fixed by the patch attached.


was (Author: hsubramaniyan):
Looks like a problem similar to HIVE-14424. The query is failing and the 
consumer.receive() is a blocking call. Part of the issue, i.e. hang has been 
fixed by HIVE-14520 by converting the blocking to non-blocking calls. The other 
issue, i.e. the actual error is fixed by the patch attached.

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14463.1.patch
>
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> 

[jira] [Updated] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14463:
-
Attachment: HIVE-14463.1.patch

cc [~ashutoshc] for review.

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14463.1.patch
>
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> 

[jira] [Updated] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14463:
-
Attachment: HIVE-14463.1.patch

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14463.1.patch
>
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> 

[jira] [Updated] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14463:
-
Attachment: (was: HIVE-14463.1.patch)

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14463.1.patch
>
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> 

[jira] [Commented] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15421904#comment-15421904
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14463:
--

Looks like a problem similar to HIVE-14424. The query is failing and the 
consumer.receive() is a blocking call. Part of the issue, i.e. hang has been 
fixed by HIVE-14520 by converting the blocking to non-blocking calls. The other 
issue, i.e. the actual error is fixed by the patch attached.

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at 

[jira] [Assigned] (HIVE-14463) hcatalog server extensions test cases getting stuck

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan reassigned HIVE-14463:


Assignee: Hari Sankar Sivarama Subramaniyan

> hcatalog server extensions test cases getting stuck
> ---
>
> Key: HIVE-14463
> URL: https://issues.apache.org/jira/browse/HIVE-14463
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Rajat Khandelwal
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The module is getting stuck in tests and not coming out for as long as 2 
> days. 
> Specifically, TestMsgBusConnection is the test case which has this problem. I 
> ran the tests on local environment and took a thread dump after it got stuck. 
> {noformat}
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.80-b11 mixed mode):
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@2c040428[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d89e000 nid=0x8827 waiting on condition 
> [0x000117b74000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "InactivityMonitor Async Task: 
> java.util.concurrent.ThreadPoolExecutor$Worker@182a483f[State = -1, empty 
> queue]" daemon prio=5 tid=0x7fe90d801000 nid=0x585f waiting on condition 
> [0x00011786b000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00078166f0b8> (a 
> java.util.concurrent.SynchronousQueue$TransferStack)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>   at 
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>   at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>   at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp:///127.0.0.1:56883" daemon prio=5 
> tid=0x7fe90c83e800 nid=0x8403 runnable [0x0001196ab000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.fill(TcpBufferedInputStream.java:50)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.fill(TcpTransport.java:576)
>   at 
> org.apache.activemq.transport.tcp.TcpBufferedInputStream.read(TcpBufferedInputStream.java:58)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport$2.read(TcpTransport.java:561)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.activemq.openwire.OpenWireFormat.unmarshal(OpenWireFormat.java:269)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.readCommand(TcpTransport.java:227)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.doRun(TcpTransport.java:219)
>   at 
> org.apache.activemq.transport.tcp.TcpTransport.run(TcpTransport.java:202)
>   at java.lang.Thread.run(Thread.java:745)
> "ActiveMQ Transport: tcp://localhost/127.0.0.1:61616" prio=5 
> tid=0x7fe90b81e800 nid=0x8003 runnable [0x0001194a5000]
>java.lang.Thread.State: RUNNABLE
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at 
> 

[jira] [Updated] (HIVE-14505) Analyze org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching failure

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14505:
-
Assignee: Vaibhav Gumashta  (was: Hari Sankar Sivarama Subramaniyan)

>  Analyze 
> org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching 
> failure
> 
>
> Key: HIVE-14505
> URL: https://issues.apache.org/jira/browse/HIVE-14505
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-14505.1.patch
>
>
> Flaky test failure. Fails ~50% of the time locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14505) Analyze org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching failure

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14505:
-
Status: Patch Available  (was: Open)

>  Analyze 
> org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching 
> failure
> 
>
> Key: HIVE-14505
> URL: https://issues.apache.org/jira/browse/HIVE-14505
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14505.1.patch
>
>
> Flaky test failure. Fails ~50% of the time locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14505) Analyze org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching failure

2016-08-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14505:
-
Attachment: HIVE-14505.1.patch

>  Analyze 
> org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching 
> failure
> 
>
> Key: HIVE-14505
> URL: https://issues.apache.org/jira/browse/HIVE-14505
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14505.1.patch
>
>
> Flaky test failure. Fails ~50% of the time locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Status: Patch Available  (was: Open)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Attachment: HIVE-14506.1.patch

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Description: The test fails because there are no tests to be executed and 
the file name starts with  'Test'  (was: The test fails because there are no 
tests to be executed)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed and the file name 
> starts with  'Test'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Description: The test fails because there are no tests to be executed  
(was: The test hangs locally.)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The test fails because there are no tests to be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Summary: TestQueryLifeTimeHook fail  (was: TestQueryLifeTimeHook hang)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The test hangs locally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14520) We should set a timeout for the blocking calls in TestMsgBusConnection

2016-08-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14520:
-
Status: Patch Available  (was: Open)

> We should set a timeout for the blocking calls in TestMsgBusConnection
> --
>
> Key: HIVE-14520
> URL: https://issues.apache.org/jira/browse/HIVE-14520
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14520.1.patch
>
>
> consumer.receive() is a blocking call and if it fails, it will block for 
> ever. Need to set timeout at the bare minimum to force the test to fail 
> incase of failure rather than timing out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14520) We should set a timeout for the blocking calls in TestMsgBusConnection

2016-08-11 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14520:
-
Attachment: HIVE-14520.1.patch

cc [~ashutoshc] for review.

> We should set a timeout for the blocking calls in TestMsgBusConnection
> --
>
> Key: HIVE-14520
> URL: https://issues.apache.org/jira/browse/HIVE-14520
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14520.1.patch
>
>
> consumer.receive() is a blocking call and if it fails, it will block for 
> ever. Need to set timeout at the bare minimum to force the test to fail 
> incase of failure rather than timing out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406797#comment-15406797
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13822:
--

The failures were file diffs which went away after a rebase. Re uploading the 
patch.

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch, HIVE-13822.4.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15406796#comment-15406796
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14396:
--

Looks like a duplicate of HIVE-12803.

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Status: Open  (was: Patch Available)

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch, HIVE-13822.4.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Status: Patch Available  (was: Open)

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch, HIVE-13822.4.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-03 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Attachment: HIVE-13822.4.patch

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch, HIVE-13822.4.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402998#comment-15402998
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13822:
--

[~ashutoshc] hive.in.test=true is already there in 
data/conf/perf-reg/hive-site.xml

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Status: Patch Available  (was: Open)

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Status: Open  (was: Patch Available)

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Attachment: HIVE-13822.3.patch

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch, 
> HIVE-13822.3.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-08-01 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15402908#comment-15402908
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13822:
--

Looked at why some of these qfile changes are missing:
1.
{code}
2016-08-01T14:50:03,831 DEBUG [630ce616-f45b-4c59-b8c2-6e27e80e2cca main] 
parse.TypeCheckCtx: Setting error: [Line 2:121 Invalid table alias or column 
reference 'ws_ext_sales_price': (possible column names are: .(tok_table_or_col 
i_item_id), .(tok_table_or_col i_item_desc\
), .(tok_table_or_col i_category), .(tok_table_or_col i_class), 
.(tok_table_or_col i_current_price), .(tok_function sum (tok_table_or_col 
ws_ext_sales_price)))] from (tok_table_or_col ws_ext_sales_price)
java.lang.Exception
at 
org.apache.hadoop.hive.ql.parse.TypeCheckCtx.setError(TypeCheckCtx.java:162) 
[hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$ColumnExprProcessor.process(TypeCheckProcFactory.java:653)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:217)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:163)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:11252)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:11208)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:4195)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3977)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:9428)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:9383)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10250)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:10128)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genOPTree(SemanticAnalyzer.java:10801)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10812)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10507)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:75)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:250)
 [hive-exec-2.2.0-SNAPSHOT.jar:2.2.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:435) 
[hive-exec-2.2.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:326) 
[hive-exec-2.2.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1169) 
[hive-exec-2.2.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1288) 
[hive-exec-2.2.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1095) 
[hive-exec-2.2.0-SNAPSHOT.jar:?]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1083) 
[hive-exec-2.2.0-SNAPSHOT.jar:?]
at 

[jira] [Updated] (HIVE-14363) bucketmap inner join query fails due to NullPointerException in some cases

2016-07-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14363:
-
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

> bucketmap inner join query fails due to NullPointerException in some cases
> --
>
> Key: HIVE-14363
> URL: https://issues.apache.org/jira/browse/HIVE-14363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jagruti Varia
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14363.1.patch, HIVE-14363.final.patch
>
>
> Bucketmap inner join query between bucketed tables throws following exception 
> when one table contains all the empty buckets while other has all the 
> non-empty buckets.
> {noformat}
> Vertex failed, vertexName=Map 2, vertexId=vertex_1466710232033_0432_4_01, 
> diagnostics=[Task failed, taskId=task_1466710232033_0432_4_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1466710232033_0432_4_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:330)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getKeyValueReader(MapRecordProcessor.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.initializeMapRecordSources(MapRecordProcessor.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:292)
>   ... 15 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1466710232033_0432_4_01_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Updated] (HIVE-14363) bucketmap inner join query fails due to NullPointerException in some cases

2016-07-30 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14363:
-
Attachment: HIVE-14363.final.patch

Adding comments, the issue was reproduced only in cluster so I havent put a q 
file test case. I am not running the unit tests again for just adding comments.

> bucketmap inner join query fails due to NullPointerException in some cases
> --
>
> Key: HIVE-14363
> URL: https://issues.apache.org/jira/browse/HIVE-14363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jagruti Varia
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14363.1.patch, HIVE-14363.final.patch
>
>
> Bucketmap inner join query between bucketed tables throws following exception 
> when one table contains all the empty buckets while other has all the 
> non-empty buckets.
> {noformat}
> Vertex failed, vertexName=Map 2, vertexId=vertex_1466710232033_0432_4_01, 
> diagnostics=[Task failed, taskId=task_1466710232033_0432_4_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1466710232033_0432_4_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:330)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getKeyValueReader(MapRecordProcessor.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.initializeMapRecordSources(MapRecordProcessor.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:292)
>   ... 15 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1466710232033_0432_4_01_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> 

[jira] [Commented] (HIVE-14363) bucketmap inner join query fails due to NullPointerException in some cases

2016-07-29 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400436#comment-15400436
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14363:
--

None of the errors look related.

> bucketmap inner join query fails due to NullPointerException in some cases
> --
>
> Key: HIVE-14363
> URL: https://issues.apache.org/jira/browse/HIVE-14363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jagruti Varia
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14363.1.patch
>
>
> Bucketmap inner join query between bucketed tables throws following exception 
> when one table contains all the empty buckets while other has all the 
> non-empty buckets.
> {noformat}
> Vertex failed, vertexName=Map 2, vertexId=vertex_1466710232033_0432_4_01, 
> diagnostics=[Task failed, taskId=task_1466710232033_0432_4_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1466710232033_0432_4_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:330)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getKeyValueReader(MapRecordProcessor.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.initializeMapRecordSources(MapRecordProcessor.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:292)
>   ... 15 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1466710232033_0432_4_01_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map 

[jira] [Commented] (HIVE-14363) bucketmap inner join query fails due to NullPointerException in some cases

2016-07-28 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397928#comment-15397928
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-14363:
--

[~mmccline] Can you please look at the change.

> bucketmap inner join query fails due to NullPointerException in some cases
> --
>
> Key: HIVE-14363
> URL: https://issues.apache.org/jira/browse/HIVE-14363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jagruti Varia
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14363.1.patch
>
>
> Bucketmap inner join query between bucketed tables throws following exception 
> when one table contains all the empty buckets while other has all the 
> non-empty buckets.
> {noformat}
> Vertex failed, vertexName=Map 2, vertexId=vertex_1466710232033_0432_4_01, 
> diagnostics=[Task failed, taskId=task_1466710232033_0432_4_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1466710232033_0432_4_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:330)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getKeyValueReader(MapRecordProcessor.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.initializeMapRecordSources(MapRecordProcessor.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:292)
>   ... 15 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1466710232033_0432_4_01_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: 

[jira] [Updated] (HIVE-14363) bucketmap inner join query fails due to NullPointerException in some cases

2016-07-27 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14363:
-
Attachment: HIVE-14363.1.patch

> bucketmap inner join query fails due to NullPointerException in some cases
> --
>
> Key: HIVE-14363
> URL: https://issues.apache.org/jira/browse/HIVE-14363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jagruti Varia
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14363.1.patch
>
>
> Bucketmap inner join query between bucketed tables throws following exception 
> when one table contains all the empty buckets while other has all the 
> non-empty buckets.
> {noformat}
> Vertex failed, vertexName=Map 2, vertexId=vertex_1466710232033_0432_4_01, 
> diagnostics=[Task failed, taskId=task_1466710232033_0432_4_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1466710232033_0432_4_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:330)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getKeyValueReader(MapRecordProcessor.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.initializeMapRecordSources(MapRecordProcessor.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:292)
>   ... 15 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1466710232033_0432_4_01_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   

[jira] [Updated] (HIVE-14363) bucketmap inner join query fails due to NullPointerException in some cases

2016-07-27 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14363:
-
Status: Patch Available  (was: Open)

> bucketmap inner join query fails due to NullPointerException in some cases
> --
>
> Key: HIVE-14363
> URL: https://issues.apache.org/jira/browse/HIVE-14363
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.2.0
>Reporter: Jagruti Varia
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14363.1.patch
>
>
> Bucketmap inner join query between bucketed tables throws following exception 
> when one table contains all the empty buckets while other has all the 
> non-empty buckets.
> {noformat}
> Vertex failed, vertexName=Map 2, vertexId=vertex_1466710232033_0432_4_01, 
> diagnostics=[Task failed, taskId=task_1466710232033_0432_4_01_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1466710232033_0432_4_01_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:330)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:184)
>   ... 14 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getKeyValueReader(MapRecordProcessor.java:372)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.initializeMapRecordSources(MapRecordProcessor.java:344)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:292)
>   ... 15 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1466710232033_0432_4_01_00_1:java.lang.RuntimeException: 
> java.lang.RuntimeException: Map operator initialization failed
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Map operator initialization failed
>   

[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-07-22 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Status: Patch Available  (was: Open)

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13822) TestPerfCliDriver throws warning in StatsSetupConst that JsonParser cannot parse COLUMN_STATS

2016-07-22 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13822:
-
Attachment: HIVE-13822.2.patch

> TestPerfCliDriver throws warning in StatsSetupConst that  JsonParser cannot 
> parse COLUMN_STATS
> --
>
> Key: HIVE-13822
> URL: https://issues.apache.org/jira/browse/HIVE-13822
> Project: Hive
>  Issue Type: Bug
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13822.1.patch, HIVE-13822.2.patch
>
>
> Thanks to [~jcamachorodriguez] for uncovering this issue as part of 
> HIVE-13269. StatsSetupConst.areColumnStatsUptoDate() is used to check whether 
> stats are up-to-date.  In case of PerfCliDriver, ‘false’ (thus, not 
> up-to-date) is returned and the following debug message in the logs:
> {code}
> In StatsSetupConst, JsonParser can not parse COLUMN_STATS. (line 190 in 
> StatsSetupConst)
> {code}
> Looks like the issue started happening after HIVE-12261 went in. 
> The fix would be to replace
> {color:red}COLUMN_STATS_ACCURATE,true{color}
> with
> {color:green}COLUMN_STATS_ACCURATE,{"COLUMN_STATS":{"key":"true","value":"true"},"BASIC_STATS":"true"}{color}
> where key, value are the column names.
> in data/files/tpcds-perf/metastore_export/csv/TABLE_PARAMS.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.7.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.7.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.7.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-21 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.7.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13708) Create table should verify datatypes supported by the serde

2016-07-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15386762#comment-15386762
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13708:
--

[~ashutoshc] My .1 patch does not support non string column types with 
OpenCSVSerde. Rather it throws an error when non-string columns are used.  The 
change for HIVE-13709 might be to replace the below code with the ones 
corresponding to the field type and make the corresponding changes everywhere 
else affected :
{code}
for (int i = 0; i < numCols; i++) {
  columnOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
}
{code}

Thanks

> Create table should verify datatypes supported by the serde
> ---
>
> Key: HIVE-13708
> URL: https://issues.apache.org/jira/browse/HIVE-13708
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Reporter: Thejas M Nair
>Assignee: Hari Sankar Sivarama Subramaniyan
>Priority: Critical
> Attachments: HIVE-13708.1.patch, HIVE-13708.2.patch, 
> HIVE-13708.3.patch, HIVE-13708.4.patch
>
>
> As [~Goldshuv] mentioned in HIVE-.
> Create table with serde such as OpenCSVSerde allows for creation of table 
> with columns of arbitrary types. But 'describe table' would still return 
> string datatypes, and so does selects on the table.
> This is misleading and would result in users not getting intended results.
> The create table ideally should disallow the creation of such tables with 
> unsupported types.
> Example posted by [~Goldshuv] in HIVE- -
> {noformat}
> CREATE EXTERNAL TABLE test (totalprice DECIMAL(38,10)) 
> ROW FORMAT SERDE 'com.bizo.hive.serde.csv.CSVSerde' with 
> serdeproperties ("separatorChar" = ",","quoteChar"= "'","escapeChar"= "\\") 
> STORED AS TEXTFILE 
> LOCATION '' 
> tblproperties ("skip.header.line.count"="1");
> {noformat}
> Now consider this sql:
> hive> select min(totalprice) from test;
> in this case given my data, the result should have been 874.89, but the 
> actual result became 11.57 (as it is first according to byte ordering of 
> a string type). this is a wrong result.
> hive> desc extended test;
> OK
> o_totalprice  string  from deserializer
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.7.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, 
> HIVE-13995.6.patch, HIVE-13995.7.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-20 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.6.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch, HIVE-13995.6.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384877#comment-15384877
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13995:
--

Updated RB, did some basic testing on the failed tests to make that 1. NPE is 
not encountered  2. We remove the unnecessary PART_NAME IN () whenever we do not prune any partitions.

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics 

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.4.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.3.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-18 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-15 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15380321#comment-15380321
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13995:
--

I tried to modify this query by adding PARTNAME to the group by columns and do 
a join on PART_COL_STATS and PARTITIONS column but it turns out that the query 
will be semantically incorrect.

One of the ways to do this in a single query will be along these lines which I 
am trying as of now:
{code}
WITH sq1 as 
SELECT 'a', sq1.* from sq1
UNION ALL
select b,  from PART_COLS
where PartId in (SELECT partid from sq1)
group by COLNAME, COLTYPE
{code}
This way we can use the tags 'a' and 'b' to distinguish between the rows coming 
of the UNION ALL query.



> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> 

  1   2   3   4   5   6   7   8   9   10   >