[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction

2017-04-29 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-16553:
--
Labels: TODOC3.0  (was: )

> Change default value for hive.tez.bigtable.minsize.semijoin.reduction
> -
>
> Key: HIVE-16553
> URL: https://issues.apache.org/jira/browse/HIVE-16553
> Project: Hive
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jason Dere
>Assignee: Jason Dere
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-16553.1.patch
>
>
> Current value is 1M rows, would like to bump this up to make sure we are not 
> creating semjoin optimizations on dimension tables, since having too many 
> semijoin optimizations can cause serialized execution of tasks if lots of 
> tasks are waiting for semijoin optimizations to be computed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-16557) Vectorization: Specialize ReduceSink empty key case

2017-04-29 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-16557:

Attachment: HIVE-16557.01.patch.tar.gz

Unable to attach the actual patch due to infrastructure issue 
https://issues.apache.org/jira/browse/INFRA-14051

So, attached compressed version of the patch instead.

NOTE: Patch #01 contains the changes for 
https://issues.apache.org/jira/browse/HIVE-16541, too.

> Vectorization: Specialize ReduceSink empty key case
> ---
>
> Key: HIVE-16557
> URL: https://issues.apache.org/jira/browse/HIVE-16557
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-16557.01.patch.tar.gz
>
>
> Gopal pointed out that native Vectorization of ReduceSink is missing the 
> empty key case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15160) Can't order by an unselected column

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989904#comment-15989904
 ] 

Hive QA commented on HIVE-15160:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865657/HIVE-15160.12.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 10637 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask]
 (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[regex_col] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_json_tuple] 
(batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_parse_url_tuple] 
(batchId=69)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby]
 (batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_gby] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_gby] 
(batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_limit]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_semijoin]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_semijoin]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[selectDistinctStar]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[special_character_in_tabnames_1]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin]
 (batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_date_1]
 (batchId=145)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_2]
 (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_round]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_grouping]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_limit]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_1]
 (batchId=144)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_arithmetic]
 (batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_null_projection]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=151)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query66] 
(batchId=229)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join8] 
(batchId=134)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_gby] 
(batchId=117)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_limit] 
(batchId=136)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_semijoin] 
(batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[dynamic_rdd_cache] 
(batchId=121)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join8] (batchId=118)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] 
(batchId=127)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[pcr] (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_25] 
(batchId=102)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] 
(batchId=125)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress]
 (batchId=118)
org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testStoreMultiTables 
(batchId=178)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4946/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4946/console
Test logs: 

[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989887#comment-15989887
 ] 

Hive QA commented on HIVE-12636:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12864848/HIVE-12636.17.patch

{color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10627 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_blobstore_to_blobstore]
 (batchId=237)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4945/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4945/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4945/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12864848 - PreCommit-HIVE-Build

> Ensure that all queries (with DbTxnManager) run in a transaction
> 
>
> Key: HIVE-12636
> URL: https://issues.apache.org/jira/browse/HIVE-12636
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, 
> HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, 
> HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, 
> HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, 
> HIVE-12636.17.patch
>
>
> Assuming Hive is using DbTxnManager
> Currently (as of this writing only auto commit mode is supported), only 
> queries that write to an Acid table start a transaction.
> Read-only queries don't open a txn but still acquire locks.
> This makes internal structures confusing/odd.
> The are constantly 2 code paths to deal with which is inconvenient and error 
> prone.
> Also, a txn id is convenient "handle" for all locks/resources within a txn.
> Doing thing would mean the client no longer needs to track locks that it 
> acquired.  This enables further improvements to metastore side of Acid.
> # add metastore call to openTxn() and acquireLocks() in a single call.  this 
> it to make sure perf doesn't degrade for read-only query.  (Would also be 
> useful for auto commit write queries)
> # Should RO queries generate txn ids from the same sequence?  (they could for 
> example use negative values of a different sequence).  Txnid is part of the 
> delta/base file name.  Currently it's 7 digits.  If we use the same sequence, 
> we'll exceed 7 digits faster. (possible upgrade issue).  On the other hand 
> there is value in being able to pick txn id and commit timestamp out of the 
> same logical sequence.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989874#comment-15989874
 ] 

Hive QA commented on HIVE-16484:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865595/HIVE-16484.7.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] 
(batchId=234)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4944/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4944/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4944/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865595 - PreCommit-HIVE-Build

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, 
> HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, 
> HIVE-16484.6.patch, HIVE-16484.7.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16143) Improve msck repair batching

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989862#comment-15989862
 ] 

Hive QA commented on HIVE-16143:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865599/HIVE-16143.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10647 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] 
(batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_batchsize] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repair] (batchId=32)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4943/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4943/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4943/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865599 - PreCommit-HIVE-Build

> Improve msck repair batching
> 
>
> Key: HIVE-16143
> URL: https://issues.apache.org/jira/browse/HIVE-16143
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, 
> HIVE-16143.03.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>   if (batch_size > 0 && partsNotInMs.size() > batch_size) {
> int counter = 0;
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   counter++;
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
>   if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
> db.createPartitions(apd);
> apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>   }
> }
>   } else {
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
> }
> db.createPartitions(apd);
>   }
> } catch (Exception e) {
>   LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>   repairOutput.clear();
>   msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
> }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16488) Support replicating into existing db if the db is empty

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989848#comment-15989848
 ] 

Hive QA commented on HIVE-16488:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865591/HIVE-16488.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4942/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4942/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4942/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865591 - PreCommit-HIVE-Build

> Support replicating into existing db if the db is empty
> ---
>
> Key: HIVE-16488
> URL: https://issues.apache.org/jira/browse/HIVE-16488
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, Replication
> Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch
>
>
> This is a potential usecase where a user may want to manually create a db on 
> destination to make sure it goes to a certain dir root, or they may have 
> cases where the db (default, for instance) was automatically created. We 
> should still allow replicating into this without failing if the db is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989836#comment-15989836
 ] 

Hive QA commented on HIVE-16527:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865594/HIVE-16527.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10631 tests 
executed
*Failed tests:*
{noformat}
TestHs2Hooks - did not produce a TEST-*.xml file (likely timed out) 
(batchId=214)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=236)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4941/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4941/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4941/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865594 - PreCommit-HIVE-Build

> Support outer and mixed reference aggregates in windowed functions
> --
>
> Key: HIVE-16527
> URL: https://issues.apache.org/jira/browse/HIVE-16527
> Project: Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Reporter: Remus Rusanu
>Assignee: Remus Rusanu
> Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, 
> HIVE-16527.03.patch
>
>
> {noformat}
> select sum(sum(c1)) over() from e011_01;
> select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by 
> e011_01.c1, e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, 
> e011_01.c2;
> select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) 
> from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, 
> e011_03.c2;
> select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order 
> by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by 
> e011_03.c2, e011_01.c2;
> {noformat}
> We fail to generate a plan for any of the above. The issue is that in 
> {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we 
> ignore all children except the last (the window spec child). Additionally the 
> typecheck processor is not prepared to encounter UDAF expressions 
> ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, 
> {{getXpathOrFuncExprNodeDesc}}). 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens

2017-04-29 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989833#comment-15989833
 ] 

Peter Vary commented on HIVE-16487:
---

The failures are not related:
- There is a patch waiting for commit which will solve 
TestBeeLineDriver.testCliDriver[smb_mapjoin_2]. See: HIVE-16451 - Race 
condition between HiveStatement.getQueryLog and HiveStatement.runAsyncOnServer 
- Created new flaky test jira, since I have already seen this recently: 
HIVE-16561 - Flaky test 
:TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
- There is ongoing work for 
TestAccumuloCliDriver.testCliDriver[accumulo_index]. See: HIVE-15795
Support Accumulo Index Tables in Hive Accumulo Connector

Thanks for the review,
Peter

> Serious Zookeeper exception is logged when a race condition happens
> ---
>
> Key: HIVE-16487
> URL: https://issues.apache.org/jira/browse/HIVE-16487
> Project: Hive
>  Issue Type: Bug
>  Components: Locking
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-16487.02.patch, HIVE-16487.patch
>
>
> A customer started to see this in the logs, but happily everything was 
> working as intended:
> {code}
> 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: 
> [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /hive_zookeeper_namespace//LOCK-SHARED-
> {code}
> This was happening, because a race condition between the lock releasing, and 
> lock acquiring. The thread releasing the lock removes the parent ZK node just 
> after the thread acquiring the lock made sure, that the parent node exists.
> Since this can happen without any real problem, I plan to add NODEEXISTS, and 
> NONODE as a transient ZooKeeper exception, so the users are not confused.
> Also, the original author of ZooKeeperHiveLockManager maybe planned to handle 
> different ZooKeeperExceptions differently, and the code is hard to 
> understand. See the {{continue}} and the {{break}}. The {{break}} only breaks 
> the switch, and not the loop which IMHO is not intuitive:
> {code}
> do {
>   try {
> [..]
> ret = lockPrimitive(key, mode, keepAlive, parentCreated, 
>   } catch (Exception e1) {
> if (e1 instanceof KeeperException) {
>   KeeperException e = (KeeperException) e1;
>   switch (e.code()) {
>   case CONNECTIONLOSS:
>   case OPERATIONTIMEOUT:
> LOG.debug("Possibly transient ZooKeeper exception: ", e);
> continue;
>   default:
> LOG.error("Serious Zookeeper exception: ", e);
> break;
>   }
> }
> [..]
>   }
> } while (tryNum < numRetriesForLock);
> {code}
> If we do not want to try again in case of a "Serious Zookeeper exception:", 
> then we should add a label to the do loop, and break it in the switch.
> If we do want to try regardless of the type of the ZK exception, then we 
> should just change the {{continue;}} to {{break;}} and move the lines part of 
> the code which did not run in case of {{continue}} to the {{default}} switch, 
> so it is easier to understand the code.
> Any suggestions or ideas [~ctang.ma] or [~szehon]?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16143) Improve msck repair batching

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989827#comment-15989827
 ] 

Hive QA commented on HIVE-16143:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865599/HIVE-16143.03.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10647 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] 
(batchId=237)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_batchsize] 
(batchId=64)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repair] (batchId=32)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4939/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4939/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4939/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865599 - PreCommit-HIVE-Build

> Improve msck repair batching
> 
>
> Key: HIVE-16143
> URL: https://issues.apache.org/jira/browse/HIVE-16143
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, 
> HIVE-16143.03.patch
>
>
> Currently, the {{msck repair table}} command batches the number of partitions 
> created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. 
> Following snippet shows the batching logic. There can be couple of 
> improvements to this batching logic:
> {noformat} 
> int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE);
>   if (batch_size > 0 && partsNotInMs.size() > batch_size) {
> int counter = 0;
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   counter++;
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
>   if (counter % batch_size == 0 || counter == 
> partsNotInMs.size()) {
> db.createPartitions(apd);
> apd = new AddPartitionDesc(table.getDbName(), 
> table.getTableName(), false);
>   }
> }
>   } else {
> for (CheckResult.PartitionResult part : partsNotInMs) {
>   
> apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null);
>   repairOutput.add("Repair: Added partition to metastore " + 
> msckDesc.getTableName()
>   + ':' + part.getPartitionName());
> }
> db.createPartitions(apd);
>   }
> } catch (Exception e) {
>   LOG.info("Could not bulk-add partitions to metastore; trying one by 
> one", e);
>   repairOutput.clear();
>   msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput);
> }
> {noformat}
> 1. If the batch size is too aggressive the code falls back to adding 
> partitions one by one which is almost always very slow. It is easily possible 
> that users increase the batch size to higher value to make the command run 
> faster but end up with a worse performance because code falls back to adding 
> one by one. Users are then expected to determine the tuned value of batch 
> size which works well for their environment. I think the code could handle 
> this situation better by exponentially decaying the batch size instead of 
> falling back to one by one.
> 2. The other issue with this implementation is if lets say first batch 
> succeeds and the second one fails, the code tries to add all the partitions 
> one by one irrespective of whether some of the were successfully added or 
> not. If we need to fall back to one by one we should atleast remove the ones 
> which we know for sure are already added successfully.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989820#comment-15989820
 ] 

Hive QA commented on HIVE-15642:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865589/HIVE-15642.02.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4937/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4937/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4937/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865589 - PreCommit-HIVE-Build

> Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
> 
>
> Key: HIVE-15642
> URL: https://issues.apache.org/jira/browse/HIVE-15642
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Vaibhav Gumashta
>Assignee: Sankar Hariappan
> Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch
>
>
> 1. Insert Overwrites to a new partition should not capture new files as part 
> of insert event but instead use the subsequent add partition event to capture 
> the files + checksums.
> 2. Insert Overwrites to an existing partition should capture new files as 
> part of the insert event. 
> Similar behaviour for DP inserts and loads.
> This will need changes from HIVE-15478



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989800#comment-15989800
 ] 

Hive QA commented on HIVE-16213:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865585/HIVE-16213.08.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4936/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4936/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4936/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865585 - PreCommit-HIVE-Build

> ObjectStore can leak Queries when rollbackTransaction throws an exception
> -
>
> Key: HIVE-16213
> URL: https://issues.apache.org/jira/browse/HIVE-16213
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Alexander Kolbasov
>Assignee: Vihang Karajgaonkar
> Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, 
> HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, 
> HIVE-16213.06.patch, HIVE-16213.07.patch, HIVE-16213.08.patch
>
>
> In ObjectStore.java there are a few places with the code similar to:
> {code}
> Query query = null;
> try {
>   openTransaction();
>   query = pm.newQuery(Something.class);
>   ...
>   commited = commitTransaction();
> } finally {
>   if (!commited) {
> rollbackTransaction();
>   }
>   if (query != null) {
> query.closeAll();
>   }
> }
> {code}
> The problem is that rollbackTransaction() may throw an exception in which 
> case query.closeAll() wouldn't be executed. 
> The fix would be to wrap rollbackTransaction in its own try-catch block.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ

2017-04-29 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989784#comment-15989784
 ] 

Hive QA commented on HIVE-16559:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12865552/HIVE-16559.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] 
(batchId=225)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=143)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4934/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4934/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4934/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12865552 - PreCommit-HIVE-Build

> Parquet schema evolution for partitioned tables may break if table and 
> partition serdes differ
> --
>
> Key: HIVE-16559
> URL: https://issues.apache.org/jira/browse/HIVE-16559
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Barna Zsombor Klara
>Assignee: Barna Zsombor Klara
> Fix For: 3.0.0
>
> Attachments: HIVE-16559.01.patch
>
>
> Parquet schema evolution should make it possible to have partitions/tables 
>  backed by files with different schemas. Hive should match the table columns 
> with file columns based on the column name if possible.
> However if the serde for a table is missing columns from the serde of a 
> partition Hive fails to match the columns together.
> Steps to reproduce:
> {code}
> CREATE TABLE myparquettable_parted
> (
>   name string,
>   favnumber int,
>   favcolor string,
>   age int,
>   favpet string
> )
> PARTITIONED BY (day string)
> STORED AS PARQUET;
> INSERT OVERWRITE TABLE myparquettable_parted
> PARTITION(day='2017-04-04')
> SELECT
>'mary' as name,
>5 AS favnumber,
>'blue' AS favcolor,
>35 AS age,
>'dog' AS favpet;
> alter table myparquettable_parted
> REPLACE COLUMNS
> (
> favnumber int,
> age int
> );