[jira] [Commented] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419819#comment-15419819
 ] 

Hive QA commented on HIVE-14418:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823532/HIVE-14418.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 68 failed/errored test(s), 10466 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_00_nonpart_empty]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_01_nonpart]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_02_00_part_empty]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_02_part]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_03_nonpart_over_compat]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_all_part]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_04_evolved_parts]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_05_some_part]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_06_one_part]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_07_all_part_over_nonoverlap]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_08_nonpart_rename]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_09_part_spec_nonoverlap]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_10_external_managed]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_11_managed_external]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_12_external_location]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_13_managed_location]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_14_managed_location_over_existing]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_15_external_part]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_16_part_external]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_17_part_managed]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_18_part_external]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_19_00_part_external_location]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_19_part_external_location]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_20_part_managed_location]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_21_export_authsuccess]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_22_import_exist_authsuccess]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_23_import_part_authsuccess]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_24_import_nonexist_authsuccess]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_25_export_parentpath_has_inaccessible_children]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exim_hidden_files]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_1_drop]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_2_exim_basic]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repl_3_exim_metadata]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez_join_hash]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_import]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_uri_export]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_00_unsupported_schema]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_01_nonpart_over_loaded]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_02_all_part_over_overlap]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_03_nonpart_noncompat_colschema]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_04_nonpart_noncompat_colnumber]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_05_nonpart_noncompat_coltype]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_06_nonpart_noncompat_storage]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_07_nonpart_noncompat_ifof]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_08_nonpart_noncompat_serde]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_09_nonpart_noncompat_serdeparam]
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[exim_10_nonpart_noncompat_bucketing]

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419818#comment-15419818
 ] 

Matt McCline commented on HIVE-14448:
-

Test failures are unrelated.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Updated] (HIVE-14345) Beeline result table has erroneous characters

2016-08-12 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Csanady updated HIVE-14345:
--
Attachment: HIVE-14345.5.patch

fixing formatting issue

> Beeline result table has erroneous characters 
> --
>
> Key: HIVE-14345
> URL: https://issues.apache.org/jira/browse/HIVE-14345
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Jeremy Beard
>Assignee: Miklos Csanady
>Priority: Minor
> Attachments: HIVE-14345.3.patch, HIVE-14345.4.patch, 
> HIVE-14345.5.patch, HIVE-14345.patch
>
>
> Beeline returns query results with erroneous characters. For example:
> {code}
> 0: jdbc:hive2://:1/def> select 10;
> +--+--+
> | _c0  |
> +--+--+
> | 10   |
> +--+--+
> 1 row selected (3.207 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-12 Thread Saket Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419809#comment-15419809
 ] 

Saket Saurabh commented on HIVE-14233:
--

It is to be noted that this patch for improved vectorization process does not 
handle the case when the split is on an original file (a non-acid schema file). 
In such cases, it resorts to the older strategy of creating vectorized row 
batches using row-by-row stitching. However, this performance roadblock will 
happen only for the non-ACID to ACID converted tables and even then will only 
exist till the first major compaction on the table produces a base file.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: Design.Document.Improving ACID performance in Hive.02.docx

Updated version of design document with few minor corrections & typo fix

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: Design.Document.Improving ACID performance in 
> Hive.01.docx, Design.Document.Improving ACID performance in Hive.02.docx, 
> HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, 
> HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, 
> HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, 
> HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, 
> HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, 
> HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419796#comment-15419796
 ] 

Hive QA commented on HIVE-14448:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823528/HIVE-14448.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10470 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables_compact]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_2]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[load_dyn_part1]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[tez_join_hash]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[transform_ppr1]
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/872/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/872/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-872/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823528 - PreCommit-HIVE-MASTER-Build

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
> 

[jira] [Commented] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419783#comment-15419783
 ] 

Hive QA commented on HIVE-14362:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823510/HIVE-14362.01.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 234 failed/errored test(s), 10471 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_smb_mapjoin_14]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_13]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_6]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_9]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketizedhiveinputformat_auto]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_6]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketsortoptimize_insert_8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_auto_join1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_0]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explainanalyze_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_filters]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_nulls]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[join_nullsafe]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_join]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_join_partition_key]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin9]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_11]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_12]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_13]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_14]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_15]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_16]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_17]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_4]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_6]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smb_mapjoin_7]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[smblimit]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_1]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_2]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_3]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_5]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[sort_merge_join_desc_8]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_auto_smb_mapjoin_14]

[jira] [Updated] (HIVE-14413) Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and extract more deterministic pieces out

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14413:

Attachment: HIVE-14413.4.patch

> Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and 
> extract more deterministic pieces out
> 
>
> Key: HIVE-14413
> URL: https://issues.apache.org/jira/browse/HIVE-14413
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14413.01.patch, HIVE-14413.02.patch, 
> HIVE-14413.03.patch, HIVE-14413.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14413) Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and extract more deterministic pieces out

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14413:

Status: Patch Available  (was: Open)

> Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and 
> extract more deterministic pieces out
> 
>
> Key: HIVE-14413
> URL: https://issues.apache.org/jira/browse/HIVE-14413
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14413.01.patch, HIVE-14413.02.patch, 
> HIVE-14413.03.patch, HIVE-14413.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14413) Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and extract more deterministic pieces out

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14413:

Status: Open  (was: Patch Available)

> Extend HivePreFilteringRule to traverse inside elements of DNF/CNF and 
> extract more deterministic pieces out
> 
>
> Key: HIVE-14413
> URL: https://issues.apache.org/jira/browse/HIVE-14413
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-14413.01.patch, HIVE-14413.02.patch, 
> HIVE-14413.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14519) Multi insert query bug

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419725#comment-15419725
 ] 

Hive QA commented on HIVE-14519:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823456/HIVE-14519.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10467 tests 
executed
*Failed tests:*
{noformat}
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/868/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/868/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-868/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823456 - PreCommit-HIVE-MASTER-Build

> Multi insert query bug
> --
>
> Key: HIVE-14519
> URL: https://issues.apache.org/jira/browse/HIVE-14519
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Reporter: Yongzhi Chen
>Assignee: Yongzhi Chen
> Attachments: HIVE-14519.1.patch
>
>
> When running multi-insert queries, when one of the query is not returning 
> results, the other query is not returning the right result.
> For example:
> After following query, there is no value in /tmp/emp/dir3/00_0
> {noformat}
> From (select * from src) a
> insert overwrite directory '/tmp/emp/dir1/'
> select key, value
> insert overwrite directory '/tmp/emp/dir2/'
> select 'header'
> where 1=2
> insert overwrite directory '/tmp/emp/dir3/'
> select key, value 
> where key = 100;
> {noformat}
> where clause in the second insert should not affect the third insert. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419717#comment-15419717
 ] 

Gopal V commented on HIVE-14199:


[~ekoifman]: The acid_bucket_pruning.q does not have vectorization enabled.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419715#comment-15419715
 ] 

Eugene Koifman commented on HIVE-14199:
---

[~mmccline], is there anything in the query plan that shows that the plan is 
vectorized or not?
In the attached patch, I don't see anything in the plan that indicates that 
it's been vectorized.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14506:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Hari!

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 2.2.0
>
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed and the file name 
> starts with  'Test'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14444) Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-1:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Zoltan!

> Upgrade qtest execution framework to junit4 - migrate most of them
> --
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>
> this is the second step..migrating all exiting qtestgen generated tests to 
> junit4
> it might be possible that not all will get migrated in this ticket...I will 
> leave out the problematic ones...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14444) Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-12 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419712#comment-15419712
 ] 

Peter Vary commented on HIVE-1:
---

Lets see what could we do about the BeeLine driver,  and when we know the 
possibilities, maybe problems, then we discuss them on the dev list.

> Upgrade qtest execution framework to junit4 - migrate most of them
> --
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>
> this is the second step..migrating all exiting qtestgen generated tests to 
> junit4
> it might be possible that not all will get migrated in this ticket...I will 
> leave out the problematic ones...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13175) Disallow making external tables transactional

2016-08-12 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419707#comment-15419707
 ] 

Wei Zheng commented on HIVE-13175:
--

Wiki updated.

> Disallow making external tables transactional
> -
>
> Key: HIVE-13175
> URL: https://issues.apache.org/jira/browse/HIVE-13175
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: TODOC1.3, TODOC2.1
> Fix For: 1.3.0, 2.1.0
>
> Attachments: HIVE-13175.1.patch, HIVE-13175.2.patch, 
> HIVE-13175.3.patch, HIVE-13175.4.patch
>
>
> The fact that compactor rewrites contents of ACID tables is in conflict with 
> what is expected of external tables.
> Conversely, end user can write to External table which certainly not what is 
> expected of ACID table.
> So we should explicitly disallow making an external table ACID.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14444) Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419699#comment-15419699
 ] 

Ashutosh Chauhan commented on HIVE-1:
-

+1 for this current patch.

[~pvary] I agree now that cli is deprecated, we should migrate our tests to 
beeline. It will be good to discuss this on dev@hive since folks may have 
opinion on what path we should take for this.

> Upgrade qtest execution framework to junit4 - migrate most of them
> --
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>
> this is the second step..migrating all exiting qtestgen generated tests to 
> junit4
> it might be possible that not all will get migrated in this ticket...I will 
> leave out the problematic ones...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12366) Refactor Heartbeater logic for transaction

2016-08-12 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419698#comment-15419698
 ] 

Wei Zheng commented on HIVE-12366:
--

Configuration Properties and Hive Transactions wikis have been updated.

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>  Labels: TODOC1.3
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, 
> HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, 
> HIVE-12366.15.patch, HIVE-12366.2.patch, HIVE-12366.3.patch, 
> HIVE-12366.4.patch, HIVE-12366.5.patch, HIVE-12366.6.patch, 
> HIVE-12366.7.patch, HIVE-12366.8.patch, HIVE-12366.9.patch, 
> HIVE-12366.branch-1.patch, HIVE-12366.branch-2.0.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419697#comment-15419697
 ] 

Matt McCline commented on HIVE-14448:
-

[~sershe] Thank you very much for your code reviewing.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419694#comment-15419694
 ] 

Sergey Shelukhin commented on HIVE-14448:
-

+1

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14035:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: Design.Document.Improving ACID performance in 
> Hive.01.docx, HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, 
> HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, 
> HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, 
> HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, 
> HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, 
> HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14521) codahale metrics exceptions

2016-08-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta resolved HIVE-14521.
-
Resolution: Duplicate

Looks like false alarm. The logs were from an older build which did not have 
HIVE-13410. Closing as dup.

> codahale metrics exceptions
> ---
>
> Key: HIVE-14521
> URL: https://issues.apache.org/jira/browse/HIVE-14521
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vaibhav Gumashta
>
> One some random setup, I see bazillions of errors like this in HS2 log:
> {noformat}
> 2016-08-08 04:52:18,619 WARN  [HiveServer2-Handler-Pool: Thread-101]: 
> log.PerfLogger (PerfLogger.java:beginMetrics(226)) - Error recording metrics
> java.io.IOException: Scope named api_Driver.run is not closed, cannot be 
> opened.
> at 
> org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope.open(CodahaleMetrics.java:133)
> at 
> org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics.startStoredScope(CodahaleMetrics.java:220)
> at 
> org.apache.hadoop.hive.ql.log.PerfLogger.beginMetrics(PerfLogger.java:223)
> at 
> org.apache.hadoop.hive.ql.log.PerfLogger.PerfLogBegin(PerfLogger.java:143)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:378)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1214)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1208)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:226)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:276)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:468)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:456)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {noformat}
> I suspect that either, just like the metastore deadline, this needs better 
> error handling when whatever the metrics surround fails; or, it is just not 
> thread safe.
> But I actually haven't looked at the code yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14527) Schema evolution tests are not running in TestCliDriver

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419673#comment-15419673
 ] 

Prasanth Jayachandran commented on HIVE-14527:
--

[~sseth] Can you please review this patch? This moves the schema evolution 
files from minillap to minillap.shared so that its shared with default 
CliDriver. It also exclude the llap tests from MiniTez so that MiniTez doesn't 
see any of the llap tests.

> Schema evolution tests are not running in TestCliDriver
> ---
>
> Key: HIVE-14527
> URL: https://issues.apache.org/jira/browse/HIVE-14527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Matt McCline
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14527.1.patch
>
>
> HIVE-14376 broke something that makes schema evolution tests being excluded 
> from TestCliDriver test suite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14527) Schema evolution tests are not running in TestCliDriver

2016-08-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14527:
-
Status: Patch Available  (was: Open)

> Schema evolution tests are not running in TestCliDriver
> ---
>
> Key: HIVE-14527
> URL: https://issues.apache.org/jira/browse/HIVE-14527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Matt McCline
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14527.1.patch
>
>
> HIVE-14376 broke something that makes schema evolution tests being excluded 
> from TestCliDriver test suite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14444) Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-12 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419671#comment-15419671
 ] 

Peter Vary commented on HIVE-1:
---

I am only concerned about the end results, and not really about the steps which 
are leading there. So if you agree, after you commit the patch then I will 
incorporate the stuff I proposed in the review. Of course I would like you both 
to review my patch.

Regarding of the BeeLine testcases, I think we should check the HIVE-4161. It 
was reported be [~thejas] a log time ago, and it was about to creating a small 
subset of the test cases which should use BeeLine for testing Hive. The funny 
thing is, that it was only partially solved - with the result of removing every 
single BeeLine tests in the end, and we are still waiting for someone to create 
some BeeLine tests package.

On the other hand, in this document we stated, that the HiveCLI is deprecated, 
and the users should use BeeLine instead 
(https://cwiki.apache.org/confluence/display/Hive/Replacing+the+Implementation+of+Hive+CLI+Using+Beeline)

I think with that in mind, in the medium term we should consider to migrate the 
integration test cases to BeeLine. The exact method and steps is up for 
discussion. Maybe the first thing would be to create the working TestBeeLine 
testcase for several chosen queries from the existing mix, and with that check 
the viability of the idea. It this is possible we might consider running the 
testcases parallel. This has serious risks, but can have a huge impact on the 
overall runtime of the tests.

I do not think I could like to create a patch independently of this one to make 
the BeeLine testcases working. What I did do to be able to run the alter3.q 
testcase query on my laptop is some trivial refactoring, and the following 
almost trivial changes:
{noformat}
diff --git beeline/src/java/org/apache/hive/beeline/util/QFileClient.java 
beeline/src/java/org/apache/hive/beeline/util/QFileClient.java
index 81f1b0e..44b9ca5 100644
--- beeline/src/java/org/apache/hive/beeline/util/QFileClient.java
+++ beeline/src/java/org/apache/hive/beeline/util/QFileClient.java
@@ -125,6 +125,7 @@ void initFilterSet() {
 .addFilter(outputDirectory.toString(), "!!{outputDirectory}!!")
 .addFilter(qFileDirectory.toString(), "!!{qFileDirectory}!!")
 .addFilter(hiveRootDirectory.toString(), "!!{hive.root}!!")
+.addFilter("\\(queryId=[^\\)]*\\)","queryId=(!!{queryId}!!)")
 .addFilter("file:/\\w\\S+", "file:/!!ELIDED!!")
 .addFilter("pfile:/\\w\\S+", "pfile:/!!ELIDED!!")
 .addFilter("hdfs:/\\w\\S+", "hdfs:/!!ELIDED!!")
@@ -134,6 +135,7 @@ void initFilterSet() {
 .addFilter("(\\D)" + currentTimePrefix + "\\d{9}(\\D)", 
"$1!!UNIXTIMEMILLIS!!$2")
 .addFilter(userName, "!!{user.name}!!")
 .addFilter(operatorPattern, "\"$1_!!ELIDED!!\"")
+.addFilter("Time taken: [0-9\\.]* seconds", "Time taken: !!ELIDED!! 
seconds")
 ;
   };
{noformat}

With that I think we will have a more or less working testdriver for BeeLine 
testcase, on which we can improve on.

> Upgrade qtest execution framework to junit4 - migrate most of them
> --
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>
> this is the second step..migrating all exiting qtestgen generated tests to 
> junit4
> it might be possible that not all will get migrated in this ticket...I will 
> leave out the problematic ones...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14527) Schema evolution tests are not running in TestCliDriver

2016-08-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14527:
-
Attachment: HIVE-14527.1.patch

> Schema evolution tests are not running in TestCliDriver
> ---
>
> Key: HIVE-14527
> URL: https://issues.apache.org/jira/browse/HIVE-14527
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Matt McCline
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14527.1.patch
>
>
> HIVE-14376 broke something that makes schema evolution tests being excluded 
> from TestCliDriver test suite. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14533) improve performance of enforceMaxLength in HiveCharWritable/HiveVarcharWritable

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419665#comment-15419665
 ] 

Prasanth Jayachandran commented on HIVE-14533:
--

lgtm, +1. Pending tests

> improve performance of enforceMaxLength in 
> HiveCharWritable/HiveVarcharWritable
> ---
>
> Key: HIVE-14533
> URL: https://issues.apache.org/jira/browse/HIVE-14533
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: performance
> Attachments: HIVE-14533.patch
>
>
> The enforceMaxLength method in HiveVarcharWritable calls 
> set(getHiveVarchar(), maxLength); and in HiveCharWritable set(getHiveChar(), 
> maxLength); no matter how long the string is. The calls to getHiveVarchar() 
> and getHiveChar() decode the string every time the method is called 
> (Text.toString() calls Text.decode). This can be very expensive and is 
> unnecessary if the string is shorter than maxLength for HiveVarcharWritable 
> or different than maxLength for HiveCharWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14533) improve performance of enforceMaxLength in HiveCharWritable/HiveVarcharWritable

2016-08-12 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-14533:

Status: Patch Available  (was: Open)

> improve performance of enforceMaxLength in 
> HiveCharWritable/HiveVarcharWritable
> ---
>
> Key: HIVE-14533
> URL: https://issues.apache.org/jira/browse/HIVE-14533
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: performance
> Attachments: HIVE-14533.patch
>
>
> The enforceMaxLength method in HiveVarcharWritable calls 
> set(getHiveVarchar(), maxLength); and in HiveCharWritable set(getHiveChar(), 
> maxLength); no matter how long the string is. The calls to getHiveVarchar() 
> and getHiveChar() decode the string every time the method is called 
> (Text.toString() calls Text.decode). This can be very expensive and is 
> unnecessary if the string is shorter than maxLength for HiveVarcharWritable 
> or different than maxLength for HiveCharWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419662#comment-15419662
 ] 

Chao Sun edited comment on HIVE-14506 at 8/12/16 11:13 PM:
---

+1. Sorry didn't know class name starting with "Test" matters...


was (Author: csun):
+1

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed and the file name 
> starts with  'Test'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14533) improve performance of enforceMaxLength in HiveCharWritable/HiveVarcharWritable

2016-08-12 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-14533:

Status: Open  (was: Patch Available)

> improve performance of enforceMaxLength in 
> HiveCharWritable/HiveVarcharWritable
> ---
>
> Key: HIVE-14533
> URL: https://issues.apache.org/jira/browse/HIVE-14533
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: performance
> Attachments: HIVE-14533.patch
>
>
> The enforceMaxLength method in HiveVarcharWritable calls 
> set(getHiveVarchar(), maxLength); and in HiveCharWritable set(getHiveChar(), 
> maxLength); no matter how long the string is. The calls to getHiveVarchar() 
> and getHiveChar() decode the string every time the method is called 
> (Text.toString() calls Text.decode). This can be very expensive and is 
> unnecessary if the string is shorter than maxLength for HiveVarcharWritable 
> or different than maxLength for HiveCharWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419662#comment-15419662
 ] 

Chao Sun commented on HIVE-14506:
-

+1

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed and the file name 
> starts with  'Test'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14533) improve performance of enforceMaxLength in HiveCharWritable/HiveVarcharWritable

2016-08-12 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-14533:

Labels: performance  (was: )
Status: Patch Available  (was: Open)

> improve performance of enforceMaxLength in 
> HiveCharWritable/HiveVarcharWritable
> ---
>
> Key: HIVE-14533
> URL: https://issues.apache.org/jira/browse/HIVE-14533
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 2.1.0, 1.2.1
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: performance
> Attachments: HIVE-14533.patch
>
>
> The enforceMaxLength method in HiveVarcharWritable calls 
> set(getHiveVarchar(), maxLength); and in HiveCharWritable set(getHiveChar(), 
> maxLength); no matter how long the string is. The calls to getHiveVarchar() 
> and getHiveChar() decode the string every time the method is called 
> (Text.toString() calls Text.decode). This can be very expensive and is 
> unnecessary if the string is shorter than maxLength for HiveVarcharWritable 
> or different than maxLength for HiveCharWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14533) improve performance of enforceMaxLength in HiveCharWritable/HiveVarcharWritable

2016-08-12 Thread Thomas Friedrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419657#comment-15419657
 ] 

Thomas Friedrich commented on HIVE-14533:
-

The patch adds a check to enforceMaxLength to only enforce the maxLength if the 
string is longer than maxLength. This check can be done without decoding the 
string, so it saves the unnecessary decoding of every value.

HiveVarcharWritable: if (value.getLength()>maxLength && 
getCharacterLength()>maxLength)
- value.getLength is the number of bytes of the string
- maxLength is the max number of characters
For single-byte characters, the number of bytes is similar to the number of 
characters. For double-byte characters, the number of characters is less than 
the number of bytes. If the number of bytes is lower than maxLength, then the 
string has fewer than maxLength characters and we don't have to truncate the 
string. If the number of bytes is larger than the number of characters, we need 
to compare the characterLength with the maxLength. We could just compare 
getCharacterLength()>maxLength in any case, but getCharacterLength calls 
getTextUtfLength which takes more time than just comparing the byte length with 
maxLength.

HiveCharwritable: if (getCharacterLength()!=maxLength)
For char values, we can only compare the number of characters with the 
maxLength and if it's different we need to call set to enforce the right 
length. This is to ensure we get the padded value if the string is not long 
enough and to truncate it in case it's longer. If we were to compare the bytes 
(value.getLength()) with maxLength, then it might not enforce the maxLength if 
double-byte characters are involved.



> improve performance of enforceMaxLength in 
> HiveCharWritable/HiveVarcharWritable
> ---
>
> Key: HIVE-14533
> URL: https://issues.apache.org/jira/browse/HIVE-14533
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
>  Labels: performance
> Attachments: HIVE-14533.patch
>
>
> The enforceMaxLength method in HiveVarcharWritable calls 
> set(getHiveVarchar(), maxLength); and in HiveCharWritable set(getHiveChar(), 
> maxLength); no matter how long the string is. The calls to getHiveVarchar() 
> and getHiveChar() decode the string every time the method is called 
> (Text.toString() calls Text.decode). This can be very expensive and is 
> unnecessary if the string is shorter than maxLength for HiveVarcharWritable 
> or different than maxLength for HiveCharWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14533) improve performance of enforceMaxLength in HiveCharWritable/HiveVarcharWritable

2016-08-12 Thread Thomas Friedrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Friedrich updated HIVE-14533:

Attachment: HIVE-14533.patch

> improve performance of enforceMaxLength in 
> HiveCharWritable/HiveVarcharWritable
> ---
>
> Key: HIVE-14533
> URL: https://issues.apache.org/jira/browse/HIVE-14533
> Project: Hive
>  Issue Type: Improvement
>  Components: Serializers/Deserializers
>Affects Versions: 1.2.1, 2.1.0
>Reporter: Thomas Friedrich
>Assignee: Thomas Friedrich
>Priority: Minor
> Attachments: HIVE-14533.patch
>
>
> The enforceMaxLength method in HiveVarcharWritable calls 
> set(getHiveVarchar(), maxLength); and in HiveCharWritable set(getHiveChar(), 
> maxLength); no matter how long the string is. The calls to getHiveVarchar() 
> and getHiveChar() decode the string every time the method is called 
> (Text.toString() calls Text.decode). This can be very expensive and is 
> unnecessary if the string is shorter than maxLength for HiveVarcharWritable 
> or different than maxLength for HiveCharWritable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-14504:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks for the review! Committed to master.

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Fix For: 2.2.0
>
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14233) Improve vectorization for ACID by eliminating row-by-row stitching

2016-08-12 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14233:
-
Attachment: HIVE-14233.08.patch

Rebase with master after HIVE-14035 got committed. Also fixes some comments at 
RB.

> Improve vectorization for ACID by eliminating row-by-row stitching
> --
>
> Key: HIVE-14233
> URL: https://issues.apache.org/jira/browse/HIVE-14233
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions, Vectorization
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14233.01.patch, HIVE-14233.02.patch, 
> HIVE-14233.03.patch, HIVE-14233.04.patch, HIVE-14233.05.patch, 
> HIVE-14233.06.patch, HIVE-14233.07.patch, HIVE-14233.08.patch
>
>
> This JIRA proposes to improve vectorization for ACID by eliminating 
> row-by-row stitching when reading back ACID files. In the current 
> implementation, a vectorized row batch is created by populating the batch one 
> row at a time, before the vectorized batch is passed up along the operator 
> pipeline. This row-by-row stitching limitation was because of the fact that 
> the ACID insert/update/delete events from various delta files needed to be 
> merged together before the actual version of a given row was found out. 
> HIVE-14035 has enabled us to break away from that limitation by splitting 
> ACID update events into a combination of delete+insert. In fact, it has now 
> enabled us to create splits on delta files.
> Building on top of HIVE-14035, this JIRA proposes to solve this earlier 
> bottleneck in the vectorized code path for ACID by now directly reading row 
> batches from the underlying ORC files and avoiding any stitching altogether. 
> Once a row batch is read from the split (which may be on a base/delta file), 
> the deleted rows will be found by cross-referencing them against a data 
> structure that will just keep track of deleted events (found in the 
> deleted_delta files). This will lead to a large performance gain when reading 
> ACID files in vectorized fashion, while enabling further optimizations in 
> future that can be done on top of that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14345) Beeline result table has erroneous characters

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419627#comment-15419627
 ] 

Hive QA commented on HIVE-14345:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823533/HIVE-14345.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10463 tests 
executed
*Failed tests:*
{noformat}
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/867/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/867/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-867/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823533 - PreCommit-HIVE-MASTER-Build

> Beeline result table has erroneous characters 
> --
>
> Key: HIVE-14345
> URL: https://issues.apache.org/jira/browse/HIVE-14345
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Jeremy Beard
>Assignee: Miklos Csanady
>Priority: Minor
> Attachments: HIVE-14345.3.patch, HIVE-14345.4.patch, HIVE-14345.patch
>
>
> Beeline returns query results with erroneous characters. For example:
> {code}
> 0: jdbc:hive2://:1/def> select 10;
> +--+--+
> | _c0  |
> +--+--+
> | 10   |
> +--+--+
> 1 row selected (3.207 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14199:
-
Status: Patch Available  (was: Open)

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14199) Enable Bucket Pruning for ACID tables

2016-08-12 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14199:
-
Attachment: HIVE-14199.03.patch

Rebased with master after HIVE-14035 got committed. Submitting for Ptest.

> Enable Bucket Pruning for ACID tables
> -
>
> Key: HIVE-14199
> URL: https://issues.apache.org/jira/browse/HIVE-14199
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Attachments: HIVE-14199.01.patch, HIVE-14199.02.patch, 
> HIVE-14199.03.patch
>
>
> Currently, ACID tables do not benefit from the bucket pruning feature 
> introduced in HIVE-11525. The reason for this has been the fact that bucket 
> pruning happens at split generation level and for ACID, traditionally the 
> delta files were never split. The parallelism for ACID was then restricted to 
> the number of buckets. There would be as many splits as the number of buckets 
> and each worker processing one split would inevitably read all the delta 
> files for that bucket, even when the query may have originally required only 
> one of the buckets to be read.
> However, HIVE-14035 now enables even the delta files to be also split. What 
> this means is that now we have enough information at the split generation 
> level to determine appropriate buckets to process for the delta files. This 
> can efficiently allow us to prune unnecessary buckets for delta files and 
> will lead to good performance gain for a large number of selective queries on 
> ACID tables.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12656) Turn hive.compute.query.using.stats on by default

2016-08-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419604#comment-15419604
 ] 

Pengcheng Xiong commented on HIVE-12656:


i do not think so. let's resubmit the patch and rerun?

> Turn hive.compute.query.using.stats on by default
> -
>
> Key: HIVE-12656
> URL: https://issues.apache.org/jira/browse/HIVE-12656
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12656.01.patch, HIVE-12656.02.patch, 
> HIVE-12656.03.patch, HIVE-12656.04.patch
>
>
> We now have hive.compute.query.using.stats=false by default. We plan to turn 
> it on by default so that we can have better performance. We can also set it 
> to false in some test cases to maintain the original purpose of those tests..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12656) Turn hive.compute.query.using.stats on by default

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419599#comment-15419599
 ] 

Ashutosh Chauhan commented on HIVE-12656:
-

[~pxiong] Any of the failures related?

> Turn hive.compute.query.using.stats on by default
> -
>
> Key: HIVE-12656
> URL: https://issues.apache.org/jira/browse/HIVE-12656
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12656.01.patch, HIVE-12656.02.patch, 
> HIVE-12656.03.patch, HIVE-12656.04.patch
>
>
> We now have hive.compute.query.using.stats=false by default. We plan to turn 
> it on by default so that we can have better performance. We can also set it 
> to false in some test cases to maintain the original purpose of those tests..



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saket Saurabh updated HIVE-14035:
-
Attachment: Design.Document.Improving ACID performance in Hive.01.docx

Initial version of the design document for reference that describes high level 
changes to ACID introduced by HIVE-14035, HIVE-14199 & HIVE-14233. 

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: Design.Document.Improving ACID performance in 
> Hive.01.docx, HIVE-14035.02.patch, HIVE-14035.03.patch, HIVE-14035.04.patch, 
> HIVE-14035.05.patch, HIVE-14035.06.patch, HIVE-14035.07.patch, 
> HIVE-14035.08.patch, HIVE-14035.09.patch, HIVE-14035.10.patch, 
> HIVE-14035.11.patch, HIVE-14035.12.patch, HIVE-14035.13.patch, 
> HIVE-14035.14.patch, HIVE-14035.15.patch, HIVE-14035.16.patch, 
> HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419579#comment-15419579
 ] 

Siddharth Seth commented on HIVE-14504:
---

+1

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14458) change relative data refernces in qfiles

2016-08-12 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419562#comment-15419562
 ] 

Zoltan Haindrich edited comment on HIVE-14458 at 8/12/16 9:57 PM:
--

this might possibly end-up as a wont needed change...it looks like to me that 
qtests containing relative references are working (for me at least - in 
eclipse, HIVE-14532 applied).

if this is not always true: please drop a comment here...


was (Author: kgyrtkirk):
this might possibly end-up as a wont needed change...it looks like to me that 
qtests containing relative references are working (for me at least - in 
eclipse).

if this is not always true: please drop a comment here...

> change relative data refernces in qfiles
> 
>
> Key: HIVE-14458
> URL: https://issues.apache.org/jira/browse/HIVE-14458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>
> there are many relative ({{../..}}) references inside qfiles and q.out files;
> because these references heavily dependent on the current working directory,  
> these should be changed to
> * either use properties like {{test.data.dir}} or {{hive.root}} ...
> * or any other reliable method to access those files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14458) change relative data refernces in qfiles

2016-08-12 Thread Zoltan Haindrich (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419562#comment-15419562
 ] 

Zoltan Haindrich commented on HIVE-14458:
-

this might possibly end-up as a wont needed change...it looks like to me that 
qtests containing relative references are working (for me at least - in 
eclipse).

if this is not always true: please drop a comment here...

> change relative data refernces in qfiles
> 
>
> Key: HIVE-14458
> URL: https://issues.apache.org/jira/browse/HIVE-14458
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>
> there are many relative ({{../..}}) references inside qfiles and q.out files;
> because these references heavily dependent on the current working directory,  
> these should be changed to
> * either use properties like {{test.data.dir}} or {{hive.root}} ...
> * or any other reliable method to access those files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14532) Enable qtests from IDE

2016-08-12 Thread Zoltan Haindrich (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Haindrich updated HIVE-14532:

Attachment: HIVE-14532.1.patch

> Enable qtests from IDE
> --
>
> Key: HIVE-14532
> URL: https://issues.apache.org/jira/browse/HIVE-14532
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Minor
> Attachments: HIVE-14532.1.patch
>
>
> with HIVE-1 applied; i've played around with executing qtest-s from 
> eclipse...after the patch seemed ok; i've checked it with:
> {code}
> git clean -dfx
> mvn package install eclipse:eclipse -Pitests -DskipTests
> mvn -q test -Pitests -Dtest=TestNegativeCliDriver -Dqfile=combine2.q
> {code}
> the last step I think is not required...but I bootstrapped and checked my 
> project integrity this way.
> After this I was able to execute {{TestCliDriver}} from eclipse using 
> {{-Dqfile=combine.q}}, other qfiles may or may not work...but will have at 
> least some chances to be usable.
> For my biggest surprise {{alter_concatenate_indexed_table.q}} also 
> passed...which contains relative file references - and I suspected that it 
> will have issues with that..
> note: I've the datanucleus plugin installed...and i use it when I need to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14432) LLAP signing unit test may be timing-dependent

2016-08-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14432:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Committed to master

> LLAP signing unit test may be timing-dependent
> --
>
> Key: HIVE-14432
> URL: https://issues.apache.org/jira/browse/HIVE-14432
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0
>
> Attachments: HIVE-14432.patch
>
>
> Seems like it's possible for slow background thread to roll the key after we 
> have signed with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14479) Add some join tests for acid table

2016-08-12 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-14479:
-
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   1.3.0
   Status: Resolved  (was: Patch Available)

Committed to master, branch-2.1 and branch-1. Thanks Eugene for review!

> Add some join tests for acid table
> --
>
> Key: HIVE-14479
> URL: https://issues.apache.org/jira/browse/HIVE-14479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14479.1.patch, HIVE-14479.2.patch, 
> HIVE-14479.3.patch, HIVE-14479.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14478) Remove seemingly unused common/src/test/resources/core-site.xml

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14478:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Zoltan!

> Remove seemingly unused common/src/test/resources/core-site.xml
> ---
>
> Key: HIVE-14478
> URL: https://issues.apache.org/jira/browse/HIVE-14478
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-14478.1.patch, HIVE-14478.1.patch
>
>
> this resouce file confuses eclipse...
> it's content advertises that it belongs to {{TestHiveConf}} ...which passes 
> without it...I think its removal will be painless



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14478) Remove seemingly unused common/src/test/resources/core-site.xml

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419496#comment-15419496
 ] 

Ashutosh Chauhan commented on HIVE-14478:
-

+1

> Remove seemingly unused common/src/test/resources/core-site.xml
> ---
>
> Key: HIVE-14478
> URL: https://issues.apache.org/jira/browse/HIVE-14478
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Trivial
> Attachments: HIVE-14478.1.patch, HIVE-14478.1.patch
>
>
> this resouce file confuses eclipse...
> it's content advertises that it belongs to {{TestHiveConf}} ...which passes 
> without it...I think its removal will be painless



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14525) beeline still writing log data to stdout as of version 2.1.0

2016-08-12 Thread stephen sprague (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stephen sprague updated HIVE-14525:
---
Description: 
simple test. note that i'm looking to get a tsv file back.

{code}
$ beeline -u dwrdevnn1 --showHeader=false --outputformat=tsv2 stderr
> select count(*)
> from default.dual;
> SQL
{code}

instead i get this in stdout:

{code}
$ cat stdout
0: jdbc:hive2://dwrdevnn1.sv2.trulia.com:1000> select count(*)
. . . . . . . . . . . . . . . . . . . . . . .> from default.dual;
0
0: jdbc:hive2://dwrdevnn1.sv2.trulia.com:1000> 
{code}

i should only get one row which is the *result* of the query (which is 0) - not 
the ovthe loggy kind of lines you see above. that stuff goes to stderr my 
friends.

also i refer to this ticket b/c the last comment suggested so - its close but 
not exactly the same.
https://issues.apache.org/jira/browse/HIVE-14183


  was:
simple test. note that i'm looking to get a tsv file back.

{code}
$ beeline -u dwrdevnn1 --showHeader=false --outputformat=tsv2 stderr
> select count(*)
> from default.dual;
> SQL
{code}

instead i get this in stdout:

{code}
$ cat stdout
0: jdbc:hive2://dwrdevnn1.sv2.trulia.com:1000> select count(*)
. . . . . . . . . . . . . . . . . . . . . . .> from default.dual;
0
0: jdbc:hive2://dwrdevnn1.sv2.trulia.com:1000> 
{code}

i should only get one row which is the *result* of the query (which is 0) - not 
the over loggy kind lines you see above. that stuff goes to stderr my friends.

also i refer to this ticket b/c the last comment suggested so - its close but 
not exactly the same.
https://issues.apache.org/jira/browse/HIVE-14183



> beeline still writing log data to stdout as of version 2.1.0
> 
>
> Key: HIVE-14525
> URL: https://issues.apache.org/jira/browse/HIVE-14525
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
>Reporter: stephen sprague
>
> simple test. note that i'm looking to get a tsv file back.
> {code}
> $ beeline -u dwrdevnn1 --showHeader=false --outputformat=tsv2  2>stderr
> > select count(*)
> > from default.dual;
> > SQL
> {code}
> instead i get this in stdout:
> {code}
> $ cat stdout
> 0: jdbc:hive2://dwrdevnn1.sv2.trulia.com:1000> select count(*)
> . . . . . . . . . . . . . . . . . . . . . . .> from default.dual;
> 0
> 0: jdbc:hive2://dwrdevnn1.sv2.trulia.com:1000> 
> {code}
> i should only get one row which is the *result* of the query (which is 0) - 
> not the ovthe loggy kind of lines you see above. that stuff goes to stderr my 
> friends.
> also i refer to this ticket b/c the last comment suggested so - its close but 
> not exactly the same.
> https://issues.apache.org/jira/browse/HIVE-14183



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14396:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Vineet!

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Fix For: 2.2.0
>
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419492#comment-15419492
 ] 

Ashutosh Chauhan commented on HIVE-14396:
-

+1

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14433) refactor LLAP plan cache avoidance and fix issue in merge processor

2016-08-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14433:

   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed. Thanks for the review!

> refactor LLAP plan cache avoidance and fix issue in merge processor
> ---
>
> Key: HIVE-14433
> URL: https://issues.apache.org/jira/browse/HIVE-14433
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.2.0, 2.1.1
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14433.01.patch, HIVE-14433.02.patch, 
> HIVE-14433.03.patch, HIVE-14433.patch
>
>
> Map and reduce processors do this:
> {noformat}
> if (LlapProxy.isDaemon()) {
>   cache = new org.apache.hadoop.hive.ql.exec.mr.ObjectCache(); // do not 
> cache plan
> ...
> {noformat}
> but merge processor just gets the plan. If it runs in LLAP, it can get a 
> cached plan. Need to move this logic into ObjectCache itself, via a isPlan 
> arg or something. That will also fix this issue for merge processor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14478) Remove seemingly unused common/src/test/resources/core-site.xml

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419445#comment-15419445
 ] 

Hive QA commented on HIVE-14478:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823426/HIVE-14478.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10462 tests 
executed
*Failed tests:*
{noformat}
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver_orc_llap_counters
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/866/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/866/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-866/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823426 - PreCommit-HIVE-MASTER-Build

> Remove seemingly unused common/src/test/resources/core-site.xml
> ---
>
> Key: HIVE-14478
> URL: https://issues.apache.org/jira/browse/HIVE-14478
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
>Priority: Trivial
> Attachments: HIVE-14478.1.patch, HIVE-14478.1.patch
>
>
> this resouce file confuses eclipse...
> it's content advertises that it belongs to {{TestHiveConf}} ...which passes 
> without it...I think its removal will be painless



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14345) Beeline result table has erroneous characters

2016-08-12 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Csanady updated HIVE-14345:
--
Attachment: HIVE-14345.4.patch

Formatting fixes

> Beeline result table has erroneous characters 
> --
>
> Key: HIVE-14345
> URL: https://issues.apache.org/jira/browse/HIVE-14345
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Jeremy Beard
>Assignee: Miklos Csanady
>Priority: Minor
> Attachments: HIVE-14345.3.patch, HIVE-14345.4.patch, HIVE-14345.patch
>
>
> Beeline returns query results with erroneous characters. For example:
> {code}
> 0: jdbc:hive2://:1/def> select 10;
> +--+--+
> | _c0  |
> +--+--+
> | 10   |
> +--+--+
> 1 row selected (3.207 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14521) codahale metrics exceptions

2016-08-12 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta reassigned HIVE-14521:
---

Assignee: Vaibhav Gumashta

> codahale metrics exceptions
> ---
>
> Key: HIVE-14521
> URL: https://issues.apache.org/jira/browse/HIVE-14521
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Vaibhav Gumashta
>
> One some random setup, I see bazillions of errors like this in HS2 log:
> {noformat}
> 2016-08-08 04:52:18,619 WARN  [HiveServer2-Handler-Pool: Thread-101]: 
> log.PerfLogger (PerfLogger.java:beginMetrics(226)) - Error recording metrics
> java.io.IOException: Scope named api_Driver.run is not closed, cannot be 
> opened.
> at 
> org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope.open(CodahaleMetrics.java:133)
> at 
> org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics.startStoredScope(CodahaleMetrics.java:220)
> at 
> org.apache.hadoop.hive.ql.log.PerfLogger.beginMetrics(PerfLogger.java:223)
> at 
> org.apache.hadoop.hive.ql.log.PerfLogger.PerfLogBegin(PerfLogger.java:143)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:378)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:320)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1214)
> at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1208)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:226)
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:276)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:468)
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:456)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> {noformat}
> I suspect that either, just like the metastore deadline, this needs better 
> error handling when whatever the metrics surround fails; or, it is just not 
> thread safe.
> But I actually haven't looked at the code yet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14444) Upgrade qtest execution framework to junit4 - migrate most of them

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419419#comment-15419419
 ] 

Ashutosh Chauhan commented on HIVE-1:
-

Sounds fine to me. [~pvary] What do you think?

> Upgrade qtest execution framework to junit4 - migrate most of them
> --
>
> Key: HIVE-1
> URL: https://issues.apache.org/jira/browse/HIVE-1
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-1.1.patch, HIVE-1.2.patch, 
> HIVE-1.3.patch, HIVE-1.4.patch
>
>
> this is the second step..migrating all exiting qtestgen generated tests to 
> junit4
> it might be possible that not all will get migrated in this ticket...I will 
> leave out the problematic ones...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14418:

Attachment: HIVE-14418.02.patch

Trying again.

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14418.01.patch, HIVE-14418.02.patch, 
> HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14418) Hive config validation prevents unsetting the settings

2016-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419411#comment-15419411
 ] 

Sergey Shelukhin commented on HIVE-14418:
-

I cannot repro these... 

> Hive config validation prevents unsetting the settings
> --
>
> Key: HIVE-14418
> URL: https://issues.apache.org/jira/browse/HIVE-14418
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14418.01.patch, HIVE-14418.02.patch, 
> HIVE-14418.patch
>
>
> {noformat}
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> hive> set hive.tez.task.scale.memory.reserve.fraction.max=null;
> Query returned non-zero code: 1, cause: 'SET 
> hive.tez.task.scale.memory.reserve.fraction.max=null' FAILED because 
> hive.tez.task.scale.memory.reserve.fraction.max expects FLOAT type value.
> {noformat}
> unset also doesn't work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: Patch Available  (was: In Progress)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Fix Version/s: 2.1.1
   2.2.0

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Status: In Progress  (was: Patch Available)

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Updated] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14448:

Attachment: HIVE-14448.04.patch

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.testETLSplitStrategyForACID(TestTxnCommands2.java:280)
>

[jira] [Updated] (HIVE-14345) Beeline result table has erroneous characters

2016-08-12 Thread Miklos Csanady (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miklos Csanady updated HIVE-14345:
--
Attachment: HIVE-14345.3.patch

Checkstyle fixes

> Beeline result table has erroneous characters 
> --
>
> Key: HIVE-14345
> URL: https://issues.apache.org/jira/browse/HIVE-14345
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 1.1.0, 2.2.0
>Reporter: Jeremy Beard
>Assignee: Miklos Csanady
>Priority: Minor
> Attachments: HIVE-14345.3.patch, HIVE-14345.patch
>
>
> Beeline returns query results with erroneous characters. For example:
> {code}
> 0: jdbc:hive2://:1/def> select 10;
> +--+--+
> | _c0  |
> +--+--+
> | 10   |
> +--+--+
> 1 row selected (3.207 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419386#comment-15419386
 ] 

Matt McCline commented on HIVE-14448:
-

Ok, more review comment changes.

ALSO NOTE: For general style improvements there is HIVE-14354 "Cleanup ORC 
reader interfaces and redundant metadata objects", too.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.04.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> 

[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419377#comment-15419377
 ] 

Eugene Koifman commented on HIVE-14448:
---

usually "isOriginal" means that the base file in the split is from before the 
table was converted to acid.  This means that the file itself doesn't have any 
ACID meta columns and they need to be injected on the fly.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> 

[jira] [Commented] (HIVE-14479) Add some join tests for acid table

2016-08-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419371#comment-15419371
 ] 

Eugene Koifman commented on HIVE-14479:
---

+1

> Add some join tests for acid table
> --
>
> Key: HIVE-14479
> URL: https://issues.apache.org/jira/browse/HIVE-14479
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-14479.1.patch, HIVE-14479.2.patch, 
> HIVE-14479.3.patch, HIVE-14479.4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14342) Beeline output is garbled when executed from a remote shell

2016-08-12 Thread Chaoyu Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-14342:
---
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   Status: Resolved  (was: Patch Available)

Committed to 2.2.0 and 2.1.1, thanks [~ngangam] for the patch and 
[~mohitsabharwal] for review.

> Beeline output is garbled when executed from a remote shell
> ---
>
> Key: HIVE-14342
> URL: https://issues.apache.org/jira/browse/HIVE-14342
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14342.2.patch, HIVE-14342.patch, HIVE-14342.patch
>
>
> {code}
> use default;
> create table clitest (key int, name String, value String);
> insert into table clitest values 
> (1,"TRUE","1"),(2,"TRUE","1"),(3,"TRUE","1"),(4,"TRUE","1"),(5,"FALSE","0"),(6,"FALSE","0"),(7,"FALSE","0");
> {code}
> then run a select query
> {code} 
> # cat /tmp/select.sql 
> set hive.execution.engine=mr;
> select key,name,value 
> from clitest 
> where value="1" limit 1;
> {code}
> Then run beeline via a remote shell, for example
> {code}
> $ ssh -l root  "sudo -u hive beeline -u 
> jdbc:hive2://localhost:1 -n hive -p hive --silent=true 
> --outputformat=csv2 -f /tmp/select.sql" 
> root@'s password: 
> 16/07/12 14:59:22 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree 
> module jar containing PrefixTreeCodec is not present.  Continuing without it.
> nullkey,name,value 
> 1,TRUE,1
> null   
> $
> {code}
> In older releases that the output is as follows
> {code}
> $ ssh -l root  "sudo -u hive beeline -u 
> jdbc:hive2://localhost:1 -n hive -p hive --silent=true 
> --outputformat=csv2 -f /tmp/run.sql" 
> Are you sure you want to continue connecting (yes/no)? yes
> root@'s password: 
> 16/07/12 14:57:55 WARN mapreduce.TableMapReduceUtil: The hbase-prefix-tree 
> module jar containing PrefixTreeCodec is not present.  Continuing without it.
> key,name,value
> 1,TRUE,1
> $
> {code}
> The output contains nulls instead of blank lines. This is due to the use of 
> -Djline.terminal=jline.UnsupportedTerminal introduced in HIVE-6758 to be able 
> to run beeline as a background process. But this is the unfortunate side 
> effect of that fix.
> Running beeline in background also produces garbled output.
> {code}
> # beeline -u "jdbc:hive2://localhost:1" -n hive -p hive --silent=true 
> --outputformat=csv2 --showHeader=false -f /tmp/run.sql 2>&1 > 
> /tmp/beeline.txt &
> # cat /tmp/beeline.txt 
> null1,TRUE,1   
> #
> {code}
> So I think the use of jline.UnsupportedTerminal should be documented but not 
> used automatically by beeline under the covers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419312#comment-15419312
 ] 

Sergey Shelukhin commented on HIVE-14448:
-

{noformat}
+  // File included has ACID columns, so always pass true for isOriginal.
{noformat}
What is the logic behind this? Original is used to determine root column 
IIRC... so if ACID always pretends to be original (which it isn't), causing 
rootColumn to become 0, this argument becomes useless... is it used for any 
other purpose? I wonder if these methods should just take root column at the 
top to avoid confusion.

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>  

[jira] [Updated] (HIVE-14445) upgrade maven surefire to 2.19.1

2016-08-12 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-14445:

   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Zoltan!

> upgrade maven surefire to 2.19.1
> 
>
> Key: HIVE-14445
> URL: https://issues.apache.org/jira/browse/HIVE-14445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 2.2.0
>
> Attachments: HIVE-14445.1.patch
>
>
> newer maven surefire has a great feature:
> * it is possible to select testmethods by regular expressions...and there are 
> also improvements in using '#' to address testmethods
> i've looked into this earlier...the upgrade is "almost" seemless...i'm 
> already using 2.19.1, but the spark modules don't really like the empty 
> spark.home variable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14035) Enable predicate pushdown to delta files created by ACID Transactions

2016-08-12 Thread Saket Saurabh (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419309#comment-15419309
 ] 

Saket Saurabh commented on HIVE-14035:
--

Thanks [~ekoifman] and [~sershe] for the review.

> Enable predicate pushdown to delta files created by ACID Transactions
> -
>
> Key: HIVE-14035
> URL: https://issues.apache.org/jira/browse/HIVE-14035
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Saket Saurabh
>Assignee: Saket Saurabh
> Fix For: 2.2.0
>
> Attachments: HIVE-14035.02.patch, HIVE-14035.03.patch, 
> HIVE-14035.04.patch, HIVE-14035.05.patch, HIVE-14035.06.patch, 
> HIVE-14035.07.patch, HIVE-14035.08.patch, HIVE-14035.09.patch, 
> HIVE-14035.10.patch, HIVE-14035.11.patch, HIVE-14035.12.patch, 
> HIVE-14035.13.patch, HIVE-14035.14.patch, HIVE-14035.15.patch, 
> HIVE-14035.16.patch, HIVE-14035.17.patch, HIVE-14035.patch
>
>
> In current Hive version, delta files created by ACID transactions do not 
> allow predicate pushdown if they contain any update/delete events. This is 
> done to preserve correctness when following a multi-version approach during 
> event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete 
> event followed by a new insert event, that can enable predicate push down to 
> all delta files without breaking correctness. To support backward 
> compatibility for this feature, this JIRA also proposes to add some sort of 
> versioning to ACID that can allow different versions of ACID transactions to 
> co-exist together.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14445) upgrade maven surefire to 2.19.1

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419306#comment-15419306
 ] 

Ashutosh Chauhan commented on HIVE-14445:
-

+1

> upgrade maven surefire to 2.19.1
> 
>
> Key: HIVE-14445
> URL: https://issues.apache.org/jira/browse/HIVE-14445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14445.1.patch
>
>
> newer maven surefire has a great feature:
> * it is possible to select testmethods by regular expressions...and there are 
> also improvements in using '#' to address testmethods
> i've looked into this earlier...the upgrade is "almost" seemless...i'm 
> already using 2.19.1, but the spark modules don't really like the empty 
> spark.home variable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419302#comment-15419302
 ] 

Prasanth Jayachandran commented on HIVE-14504:
--

I reverted the addendum commit. I will look at the next test run to see if that 
succeeds. 

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419300#comment-15419300
 ] 

Prasanth Jayachandran commented on HIVE-14504:
--

The problem is the test doesn't fail if i run it individually.

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14432) LLAP signing unit test may be timing-dependent

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419295#comment-15419295
 ] 

Prasanth Jayachandran commented on HIVE-14432:
--

+1

> LLAP signing unit test may be timing-dependent
> --
>
> Key: HIVE-14432
> URL: https://issues.apache.org/jira/browse/HIVE-14432
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14432.patch
>
>
> Seems like it's possible for slow background thread to roll the key after we 
> have signed with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419296#comment-15419296
 ] 

Ashutosh Chauhan commented on HIVE-14504:
-

Test was unstable in few previous runs. So, I did that addendum. You may 
overwrite it to stabilize this. 

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14445) upgrade maven surefire to 2.19.1

2016-08-12 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419294#comment-15419294
 ] 

Hive QA commented on HIVE-14445:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12823412/HIVE-14445.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10421 tests 
executed
*Failed tests:*
{noformat}
TestQueryLifeTimeHook - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_load_dyn_part1
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_join_hash
org.apache.hive.hcatalog.listener.TestMsgBusConnection.testConnection
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/865/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/865/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-865/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12823412 - PreCommit-HIVE-MASTER-Build

> upgrade maven surefire to 2.19.1
> 
>
> Key: HIVE-14445
> URL: https://issues.apache.org/jira/browse/HIVE-14445
> Project: Hive
>  Issue Type: Sub-task
>  Components: Tests
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-14445.1.patch
>
>
> newer maven surefire has a great feature:
> * it is possible to select testmethods by regular expressions...and there are 
> also improvements in using '#' to address testmethods
> i've looked into this earlier...the upgrade is "almost" seemless...i'm 
> already using 2.19.1, but the spark modules don't really like the empty 
> spark.home variable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14448) Queries with predicate fail when ETL split strategy is chosen for ACID tables

2016-08-12 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419291#comment-15419291
 ] 

Matt McCline commented on HIVE-14448:
-

Test failures are not related.  Ready for final code review [~sershe].  Thanks

> Queries with predicate fail when ETL split strategy is chosen for ACID tables
> -
>
> Key: HIVE-14448
> URL: https://issues.apache.org/jira/browse/HIVE-14448
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Saket Saurabh
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14448.01.patch, HIVE-14448.02.patch, 
> HIVE-14448.03.patch, HIVE-14448.patch
>
>
> When ETL split strategy is applied to ACID tables with predicate pushdown 
> (SARG enabled), split generation fails for ACID. This bug will be usually 
> exposed when working with data at scale, because in most otherwise cases only 
> BI split strategy is chosen. My guess is that this is happening because the 
> correct readerSchema is not being picked up when we try to extract SARG 
> column names.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>  @Test
>   public void testETLSplitStrategyForACID() throws Exception {
> hiveConf.setVar(HiveConf.ConfVars.HIVE_ORC_SPLIT_STRATEGY, "ETL");
> hiveConf.setBoolVar(HiveConf.ConfVars.HIVEOPTINDEXFILTER, true);
> runStatementOnDriver("insert into " + Table.ACIDTBL + " values(1,2)");
> runStatementOnDriver("alter table " + Table.ACIDTBL + " compact 'MAJOR'");
> runWorker(hiveConf);
> List rs = runStatementOnDriver("select * from " +  Table.ACIDTBL  
> + " where a = 1");
> int[][] resultData = new int[][] {{1,2}};
> Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}
> Back-trace for this failed test is as follows:
> {code}
> exec.Task: Job Submission failed with exception 
> 'java.lang.RuntimeException(ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException)'
> java.lang.RuntimeException: ORC split generation failed with exception: 
> java.lang.NegativeArraySizeException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1570)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1656)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:370)
>   at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:488)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:329)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:321)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:197)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1297)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1294)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1294)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>   at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
>   at 
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:417)
>   at 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:141)
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1962)
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1653)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1389)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1131)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1119)
>   at 
> org.apache.hadoop.hive.ql.TestTxnCommands2.runStatementOnDriver(TestTxnCommands2.java:1292)
>   at 
> 

[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419287#comment-15419287
 ] 

Prasanth Jayachandran commented on HIVE-14504:
--

https://github.com/apache/hive/commit/09272628e908d42882fd0eaea63f7520aceea341 
this addendum patch is causing tez_join_hash.q to fail. [~ashutoshc] Is this 
addendum patch required? I don't see this failing locally on master. 

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14432) LLAP signing unit test may be timing-dependent

2016-08-12 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419288#comment-15419288
 ] 

Sergey Shelukhin commented on HIVE-14432:
-

[~prasanth_j] [~jdere] ping?

> LLAP signing unit test may be timing-dependent
> --
>
> Key: HIVE-14432
> URL: https://issues.apache.org/jira/browse/HIVE-14432
> Project: Hive
>  Issue Type: Bug
>  Components: Test
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14432.patch
>
>
> Seems like it's possible for slow background thread to roll the key after we 
> have signed with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Status: Patch Available  (was: Open)

Addressed review comments

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Status: Open  (was: Patch Available)

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14396) CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver count.q failure

2016-08-12 Thread Vineet Garg (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vineet Garg updated HIVE-14396:
---
Attachment: HIVE-14396.2.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): TestCliDriver 
> count.q failure
> ---
>
> Key: HIVE-14396
> URL: https://issues.apache.org/jira/browse/HIVE-14396
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-14396.1.patch, HIVE-14396.2.patch
>
>
> Currently there are three different failures
> Set hive.cbo.returnpath.hiveop=true for all cases.
> 1) First case is wrong result for following query
> {code:title=failure 1 Wrong result}
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> This occurs due to a bug in HiveCalciteUtil.getExprNodes. While looking for 
> corresponding expression for a aggregate function's argument wrong index is 
> being used.
> 2) Out of bound exception for following
> {code}
> set hive.map.aggr=false
> explain select count(1), count(*), count(a), count(b), count(c), count(d), 
> count(distinct a), count(distinct b), count(distinct c), count(distinct d), 
> count(distinct a,b), count(distinct b,c), count(distinct c,d), count(distinct 
> a,d), count(distinct a,c), count(distinct b,d), count(distinct a,b,c), 
> count(distinct b,c,d), count(distinct a,c,d), count(distinct a,b,d), 
> count(distinct a,b,c,d) from abcd;
> {code}
> The above happens while converting Calcite Aggregation to Hive's group by 
> operator.
> 3) Once the above case with exception is fixed same query with 
> hive.map.aggr=false give wrong results. Problem in this case is that while 
> creating expression for aggregate function's argument we end up with wrong 
> column info from underlying reduce sink operator. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14526) HadoopMetrics2Reporter logs way, way too much on INFO level

2016-08-12 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-14526:

Resolution: Invalid
Status: Resolved  (was: Patch Available)

Makes sense.. the build was using the old version of the plugin

> HadoopMetrics2Reporter logs way, way too much on INFO level
> ---
>
> Key: HIVE-14526
> URL: https://issues.apache.org/jira/browse/HIVE-14526
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-14526.patch
>
>
> {noformat}
> # grep -c HadoopMetrics2Reporter hiveserver2.log.2016-08-11
> 547524076
> # grep -c . hiveserver2.log.2016-08-11
> 548430185
> # ll hiveserver2.log.2016-08-11
> -rw-r--r-- 1 hive hadoop 204695432463 Aug 11 23:59 hiveserver2.log.2016-08-11
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419249#comment-15419249
 ] 

Gopal V edited comment on HIVE-14362 at 8/12/16 6:25 PM:
-

[~pxiong]: the branch + counter to profile was abandoned earlier due to known 
performance issues. 

https://issues.apache.org/jira/browse/HIVE-4318?focusedCommentId=13629957=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13629957

The crucial test is how much this extra code impacts perf when it is disabled.


was (Author: gopalv):
[~pxiong]: this approach was abandoned earlier due to known performance issues 
- 
https://issues.apache.org/jira/browse/HIVE-4318?focusedCommentId=13629957=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13629957

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14513) Enhance custom query feature in LDAP atn to support resultset of ldap groups

2016-08-12 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419259#comment-15419259
 ] 

Naveen Gangam commented on HIVE-14513:
--

Thanks Chaoyu for review and the commit.

> Enhance custom query feature in LDAP atn to support resultset of ldap groups
> 
>
> Key: HIVE-14513
> URL: https://issues.apache.org/jira/browse/HIVE-14513
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.0.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14513.patch
>
>
> LDAP Authenticator can be configured to use a result set from a LDAP query to 
> authenticate. However, is it expected that this LDAP query would only result 
> a set of users (aka full DNs for the users in LDAP).
> However, its not always straightforward to be able to author queries that 
> return users. For example, say you would like to allow "all users from group1 
> and group2" to be authenticated. The LDAP query has to return a union of all 
> members of the group1 and group2.
> For example, one common configuration is that groups contain a list of its 
> users
>   "dn: uid=group1,ou=Groups,dc=example,dc=com",
>   "distinguishedName: uid=group1,ou=Groups,dc=example,dc=com",
>   "objectClass: top",
>   "objectClass: groupOfNames",
>   "objectClass: ExtensibleObject",
>   "cn: group1",
>   "ou: Groups",
>   "sn: group1",
>   "member: uid=user1,ou=People,dc=example,dc=com",
> The query 
> {{(&(objectClass=groupOfNames)(|(cn=group1)(cn=group2)))}}
> will return the entries
> uid=group1,ou=Groups,dc=example,dc=com
> uid=group2,ou=Groups,dc=example,dc=com
> but there is no means to form a query that would return just the values of 
> "member" attributes. (ldap client tools are able to do by filtering out the 
> attributes on these entries.
> So it will be useful to have such support to be able to specify queries that 
> return groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14504) tez_join_hash.q test is slow

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419252#comment-15419252
 ] 

Prasanth Jayachandran commented on HIVE-14504:
--

tez_join_hash.q failure is not related. It happens on master as well. [~sseth] 
Can you please review? I will see if tez_join_hash.q failure is just a golden 
file update caused by another patch.

> tez_join_hash.q test is slow
> 
>
> Key: HIVE-14504
> URL: https://issues.apache.org/jira/browse/HIVE-14504
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.2.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-14504.1.patch, HIVE-14504.1.patch, 
> HIVE-14504.1.patch
>
>
> tez_join_hash.q also explicitly sets execution engine to mr which slows down 
> the entire test. Test takes around 7 mins. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419249#comment-15419249
 ] 

Gopal V commented on HIVE-14362:


[~pxiong]: this approach was abandoned earlier due to known performance issues 
- 
https://issues.apache.org/jira/browse/HIVE-4318?focusedCommentId=13629957=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13629957

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14362:
---
Status: Patch Available  (was: Open)

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419242#comment-15419242
 ] 

Pengcheng Xiong edited comment on HIVE-14362 at 8/12/16 6:16 PM:
-

ccing [~gopalv], will do a performance test and upload the design doc soon.


was (Author: pxiong):
ccing [~gopalv], will do a performance test soon.

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14362:
---
Attachment: HIVE-14362.01.patch

initial patch, still some problems. UpdateDelete, columnStats, Index, Masking, 
any thing involving a new context.

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14362) Support explain analyze in Hive

2016-08-12 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419242#comment-15419242
 ] 

Pengcheng Xiong commented on HIVE-14362:


ccing [~gopalv], will do a performance test soon.

> Support explain analyze in Hive
> ---
>
> Key: HIVE-14362
> URL: https://issues.apache.org/jira/browse/HIVE-14362
> Project: Hive
>  Issue Type: New Feature
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14362.01.patch
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Status: Patch Available  (was: Open)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Attachment: HIVE-14506.1.patch

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Description: The test fails because there are no tests to be executed and 
the file name starts with  'Test'  (was: The test fails because there are no 
tests to be executed)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-14506.1.patch
>
>
> The test fails because there are no tests to be executed and the file name 
> starts with  'Test'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-08-12 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419237#comment-15419237
 ] 

Prasanth Jayachandran commented on HIVE-12181:
--

Looks like tez_dynpart_hashjoin_1.q is related. Is there a jira to track the 
failure?

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Fix For: 2.2.0
>
> Attachments: HIVE-12181.1.patch, HIVE-12181.10.patch, 
> HIVE-12181.12.patch, HIVE-12181.13.patch, HIVE-12181.15.patch, 
> HIVE-12181.2.patch, HIVE-12181.3.patch, HIVE-12181.4.patch, 
> HIVE-12181.7.patch, HIVE-12181.8.patch, HIVE-12181.9.patch, HIVE-12181.patch, 
> HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14257) CBO: Push Join through Groupby to trigger shuffle reductions

2016-08-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419219#comment-15419219
 ] 

Ashutosh Chauhan commented on HIVE-14257:
-

[~gopalv] I am not sure what optimization you are suggesting here. For the 
query in description predicate {{ d_date_sk = 1}} should be pushed to TS and if 
it did then there will be no shuffle of unnecessary tuples. Is that not 
happening?

> CBO: Push Join through Groupby to trigger shuffle reductions
> 
>
> Key: HIVE-14257
> URL: https://issues.apache.org/jira/browse/HIVE-14257
> Project: Hive
>  Issue Type: Improvement
>  Components: CBO
>Reporter: Gopal V
>
> Similar to the optimizations in hive, already which push aggregates through a 
> join (hive.transpose.aggr.join=true).
> {code}
> select count(v) from (select d_year, count(ss_item_sk) as v from store_sales, 
> date_dim where ss_sold_date_sk=d_Date_sk group by d_year) w, date_dim d where 
> d.d_year = w.d_year and d_date_sk = 1;
> {code}
> currently produces an entire aggregate of all years before discarding all of 
> that (because obviously, there's no data for d_date_sk=1;
> This particular example is a simplified version of TPC-DS Query59's join 
> condition, which can have a reduction in scans by applying the d_month_seq 
> between 1185 and 1185 + 11 into the wss alias.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14506) TestQueryLifeTimeHook fail

2016-08-12 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-14506:
-
Description: The test fails because there are no tests to be executed  
(was: The test hangs locally.)

> TestQueryLifeTimeHook fail
> --
>
> Key: HIVE-14506
> URL: https://issues.apache.org/jira/browse/HIVE-14506
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
>
> The test fails because there are no tests to be executed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >