[jira] [Updated] (HIVE-16553) Change default value for hive.tez.bigtable.minsize.semijoin.reduction
[ https://issues.apache.org/jira/browse/HIVE-16553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16553: -- Labels: TODOC3.0 (was: ) > Change default value for hive.tez.bigtable.minsize.semijoin.reduction > - > > Key: HIVE-16553 > URL: https://issues.apache.org/jira/browse/HIVE-16553 > Project: Hive > Issue Type: Bug > Components: Configuration >Reporter: Jason Dere >Assignee: Jason Dere > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16553.1.patch > > > Current value is 1M rows, would like to bump this up to make sure we are not > creating semjoin optimizations on dimension tables, since having too many > semijoin optimizations can cause serialized execution of tasks if lots of > tasks are waiting for semijoin optimizations to be computed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16557) Vectorization: Specialize ReduceSink empty key case
[ https://issues.apache.org/jira/browse/HIVE-16557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16557: Attachment: HIVE-16557.01.patch.tar.gz Unable to attach the actual patch due to infrastructure issue https://issues.apache.org/jira/browse/INFRA-14051 So, attached compressed version of the patch instead. NOTE: Patch #01 contains the changes for https://issues.apache.org/jira/browse/HIVE-16541, too. > Vectorization: Specialize ReduceSink empty key case > --- > > Key: HIVE-16557 > URL: https://issues.apache.org/jira/browse/HIVE-16557 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16557.01.patch.tar.gz > > > Gopal pointed out that native Vectorization of ReduceSink is missing the > empty key case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15160) Can't order by an unselected column
[ https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989904#comment-15989904 ] Hive QA commented on HIVE-15160: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865657/HIVE-15160.12.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 10637 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join_without_localtask] (batchId=1) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_vc] (batchId=78) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[regex_col] (batchId=15) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_json_tuple] (batchId=74) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udtf_parse_url_tuple] (batchId=69) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_groupby] (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_gby] (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_limit] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_gby] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_limit] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_rp_semijoin] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cbo_semijoin] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[explainuser_1] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown3] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[limit_pushdown] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[offset_limit_ppd_optimizer] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[selectDistinctStar] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[special_character_in_tabnames_1] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_in] (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_notin] (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_coalesce] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_date_1] (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_2] (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_decimal_round] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_grouping] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_groupby_grouping_sets_limit] (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_1] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_arithmetic] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_null_projection] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_short_regress] (batchId=151) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query66] (batchId=229) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join8] (batchId=134) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_gby] (batchId=117) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_limit] (batchId=136) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cbo_semijoin] (batchId=114) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[dynamic_rdd_cache] (batchId=121) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join8] (batchId=118) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[limit_pushdown] (batchId=127) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[pcr] (batchId=123) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_25] (batchId=102) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_in] (batchId=125) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_short_regress] (batchId=118) org.apache.hive.hcatalog.pig.TestRCFileHCatStorer.testStoreMultiTables (batchId=178) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4946/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4946/console Test logs:
[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989887#comment-15989887 ] Hive QA commented on HIVE-12636: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864848/HIVE-12636.17.patch {color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10627 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_blobstore_to_blobstore] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4945/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4945/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4945/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864848 - PreCommit-HIVE-Build > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.17.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit
[ https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989874#comment-15989874 ] Hive QA commented on HIVE-16484: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865595/HIVE-16484.7.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[escape_comments] (batchId=234) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4944/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4944/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4944/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865595 - PreCommit-HIVE-Build > Investigate SparkLauncher for HoS as alternative to bin/spark-submit > > > Key: HIVE-16484 > URL: https://issues.apache.org/jira/browse/HIVE-16484 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, > HIVE-16484.3.patch, HIVE-16484.4.patch, HIVE-16484.5.patch, > HIVE-16484.6.patch, HIVE-16484.7.patch > > > The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} > directory and invokes the {{bin/spark-submit}} script, which spawns a > separate process to run the Spark application. > {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch > Spark applications. > I see a few advantages: > * No need to spawn a separate process to launch a HoS --> lower startup time > * Simplifies the code in {{SparkClientImpl}} --> easier to debug > * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which > contains some useful utilities for querying the state of the Spark job > ** It also allows the launcher to specify a list of job listeners -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16143) Improve msck repair batching
[ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989862#comment-15989862 ] Hive QA commented on HIVE-16143: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865599/HIVE-16143.03.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10647 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_batchsize] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repair] (batchId=32) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4943/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4943/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4943/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865599 - PreCommit-HIVE-Build > Improve msck repair batching > > > Key: HIVE-16143 > URL: https://issues.apache.org/jira/browse/HIVE-16143 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, > HIVE-16143.03.patch > > > Currently, the {{msck repair table}} command batches the number of partitions > created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. > Following snippet shows the batching logic. There can be couple of > improvements to this batching logic: > {noformat} > int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE); > if (batch_size > 0 && partsNotInMs.size() > batch_size) { > int counter = 0; > for (CheckResult.PartitionResult part : partsNotInMs) { > counter++; > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > if (counter % batch_size == 0 || counter == > partsNotInMs.size()) { > db.createPartitions(apd); > apd = new AddPartitionDesc(table.getDbName(), > table.getTableName(), false); > } > } > } else { > for (CheckResult.PartitionResult part : partsNotInMs) { > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > } > db.createPartitions(apd); > } > } catch (Exception e) { > LOG.info("Could not bulk-add partitions to metastore; trying one by > one", e); > repairOutput.clear(); > msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput); > } > {noformat} > 1. If the batch size is too aggressive the code falls back to adding > partitions one by one which is almost always very slow. It is easily possible > that users increase the batch size to higher value to make the command run > faster but end up with a worse performance because code falls back to adding > one by one. Users are then expected to determine the tuned value of batch > size which works well for their environment. I think the code could handle > this situation better by exponentially decaying the batch size instead of > falling back to one by one. > 2. The other issue with this implementation is if lets say first batch > succeeds and the second one fails, the code tries to add all the partitions > one by one irrespective of whether some of the were successfully added or > not. If we need to fall back to one by one we should atleast remove the ones > which we know for sure are already added successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989848#comment-15989848 ] Hive QA commented on HIVE-16488: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865591/HIVE-16488.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4942/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4942/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4942/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865591 - PreCommit-HIVE-Build > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.1.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch, HIVE-16488.02.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16527) Support outer and mixed reference aggregates in windowed functions
[ https://issues.apache.org/jira/browse/HIVE-16527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989836#comment-15989836 ] Hive QA commented on HIVE-16527: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865594/HIVE-16527.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10631 tests executed *Failed tests:* {noformat} TestHs2Hooks - did not produce a TEST-*.xml file (likely timed out) (batchId=214) org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=236) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4941/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4941/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4941/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865594 - PreCommit-HIVE-Build > Support outer and mixed reference aggregates in windowed functions > -- > > Key: HIVE-16527 > URL: https://issues.apache.org/jira/browse/HIVE-16527 > Project: Hive > Issue Type: Improvement > Components: Query Processor >Reporter: Remus Rusanu >Assignee: Remus Rusanu > Attachments: HIVE-16527.00.patch, HIVE-16527.02.patch, > HIVE-16527.03.patch > > > {noformat} > select sum(sum(c1)) over() from e011_01; > select sum(sum(c1)) over(partition by c2 order by c1) from e011_01 group by > e011_01.c1, e011_01.c2; > select sum(sum(e011_01.c1)) over(partition by e011_01.c2 order by e011_01.c1) > from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_01.c1, > e011_01.c2; > select sum(sum(e011_01.c1)) over(partition by e011_03.c2 order by e011_03.c1) > from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by e011_03.c1, > e011_03.c2; > select sum(corr(e011_01.c1, e011_03.c1)) over(partition by e011_01.c2 order > by e011_03.c2) from e011_01 join e011_03 on e011_01.c1 = e011_03.c1 group by > e011_03.c2, e011_01.c2; > {noformat} > We fail to generate a plan for any of the above. The issue is that in > {{SemanticAnalyzer.doPhase1GetAllAggregations}}, for {{TOK_WINDOWSPEC}} we > ignore all children except the last (the window spec child). Additionally the > typecheck processor is not prepared to encounter UDAF expressions > ({{TypeCheckProcFactory.DefaultExpreProcessor.validateUDF}}, > {{getXpathOrFuncExprNodeDesc}}). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16487) Serious Zookeeper exception is logged when a race condition happens
[ https://issues.apache.org/jira/browse/HIVE-16487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989833#comment-15989833 ] Peter Vary commented on HIVE-16487: --- The failures are not related: - There is a patch waiting for commit which will solve TestBeeLineDriver.testCliDriver[smb_mapjoin_2]. See: HIVE-16451 - Race condition between HiveStatement.getQueryLog and HiveStatement.runAsyncOnServer - Created new flaky test jira, since I have already seen this recently: HIVE-16561 - Flaky test :TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery - There is ongoing work for TestAccumuloCliDriver.testCliDriver[accumulo_index]. See: HIVE-15795 Support Accumulo Index Tables in Hive Accumulo Connector Thanks for the review, Peter > Serious Zookeeper exception is logged when a race condition happens > --- > > Key: HIVE-16487 > URL: https://issues.apache.org/jira/browse/HIVE-16487 > Project: Hive > Issue Type: Bug > Components: Locking >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-16487.02.patch, HIVE-16487.patch > > > A customer started to see this in the logs, but happily everything was > working as intended: > {code} > 2017-03-30 12:01:59,446 ERROR ZooKeeperHiveLockManager: > [HiveServer2-Background-Pool: Thread-620]: Serious Zookeeper exception: > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /hive_zookeeper_namespace//LOCK-SHARED- > {code} > This was happening, because a race condition between the lock releasing, and > lock acquiring. The thread releasing the lock removes the parent ZK node just > after the thread acquiring the lock made sure, that the parent node exists. > Since this can happen without any real problem, I plan to add NODEEXISTS, and > NONODE as a transient ZooKeeper exception, so the users are not confused. > Also, the original author of ZooKeeperHiveLockManager maybe planned to handle > different ZooKeeperExceptions differently, and the code is hard to > understand. See the {{continue}} and the {{break}}. The {{break}} only breaks > the switch, and not the loop which IMHO is not intuitive: > {code} > do { > try { > [..] > ret = lockPrimitive(key, mode, keepAlive, parentCreated, > } catch (Exception e1) { > if (e1 instanceof KeeperException) { > KeeperException e = (KeeperException) e1; > switch (e.code()) { > case CONNECTIONLOSS: > case OPERATIONTIMEOUT: > LOG.debug("Possibly transient ZooKeeper exception: ", e); > continue; > default: > LOG.error("Serious Zookeeper exception: ", e); > break; > } > } > [..] > } > } while (tryNum < numRetriesForLock); > {code} > If we do not want to try again in case of a "Serious Zookeeper exception:", > then we should add a label to the do loop, and break it in the switch. > If we do want to try regardless of the type of the ZK exception, then we > should just change the {{continue;}} to {{break;}} and move the lines part of > the code which did not run in case of {{continue}} to the {{default}} switch, > so it is easier to understand the code. > Any suggestions or ideas [~ctang.ma] or [~szehon]? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16143) Improve msck repair batching
[ https://issues.apache.org/jira/browse/HIVE-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989827#comment-15989827 ] Hive QA commented on HIVE-16143: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865599/HIVE-16143.03.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10647 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[create_like] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[msck_repair_batchsize] (batchId=64) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[repair] (batchId=32) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4939/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4939/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4939/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865599 - PreCommit-HIVE-Build > Improve msck repair batching > > > Key: HIVE-16143 > URL: https://issues.apache.org/jira/browse/HIVE-16143 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16143.01.patch, HIVE-16143.02.patch, > HIVE-16143.03.patch > > > Currently, the {{msck repair table}} command batches the number of partitions > created in the metastore using the config {{HIVE_MSCK_REPAIR_BATCH_SIZE}}. > Following snippet shows the batching logic. There can be couple of > improvements to this batching logic: > {noformat} > int batch_size = conf.getIntVar(ConfVars.HIVE_MSCK_REPAIR_BATCH_SIZE); > if (batch_size > 0 && partsNotInMs.size() > batch_size) { > int counter = 0; > for (CheckResult.PartitionResult part : partsNotInMs) { > counter++; > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > if (counter % batch_size == 0 || counter == > partsNotInMs.size()) { > db.createPartitions(apd); > apd = new AddPartitionDesc(table.getDbName(), > table.getTableName(), false); > } > } > } else { > for (CheckResult.PartitionResult part : partsNotInMs) { > > apd.addPartition(Warehouse.makeSpecFromName(part.getPartitionName()), null); > repairOutput.add("Repair: Added partition to metastore " + > msckDesc.getTableName() > + ':' + part.getPartitionName()); > } > db.createPartitions(apd); > } > } catch (Exception e) { > LOG.info("Could not bulk-add partitions to metastore; trying one by > one", e); > repairOutput.clear(); > msckAddPartitionsOneByOne(db, table, partsNotInMs, repairOutput); > } > {noformat} > 1. If the batch size is too aggressive the code falls back to adding > partitions one by one which is almost always very slow. It is easily possible > that users increase the batch size to higher value to make the command run > faster but end up with a worse performance because code falls back to adding > one by one. Users are then expected to determine the tuned value of batch > size which works well for their environment. I think the code could handle > this situation better by exponentially decaying the batch size instead of > falling back to one by one. > 2. The other issue with this implementation is if lets say first batch > succeeds and the second one fails, the code tries to add all the partitions > one by one irrespective of whether some of the were successfully added or > not. If we need to fall back to one by one we should atleast remove the ones > which we know for sure are already added successfully. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15642) Replicate Insert Overwrites, Dynamic Partition Inserts and Loads
[ https://issues.apache.org/jira/browse/HIVE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989820#comment-15989820 ] Hive QA commented on HIVE-15642: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865589/HIVE-15642.02.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10635 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4937/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4937/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4937/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865589 - PreCommit-HIVE-Build > Replicate Insert Overwrites, Dynamic Partition Inserts and Loads > > > Key: HIVE-15642 > URL: https://issues.apache.org/jira/browse/HIVE-15642 > Project: Hive > Issue Type: Sub-task > Components: repl >Reporter: Vaibhav Gumashta >Assignee: Sankar Hariappan > Attachments: HIVE-15642.02.patch, HIVE-15642.1.patch > > > 1. Insert Overwrites to a new partition should not capture new files as part > of insert event but instead use the subsequent add partition event to capture > the files + checksums. > 2. Insert Overwrites to an existing partition should capture new files as > part of the insert event. > Similar behaviour for DP inserts and loads. > This will need changes from HIVE-15478 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception
[ https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989800#comment-15989800 ] Hive QA commented on HIVE-16213: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865585/HIVE-16213.08.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4936/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4936/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4936/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865585 - PreCommit-HIVE-Build > ObjectStore can leak Queries when rollbackTransaction throws an exception > - > > Key: HIVE-16213 > URL: https://issues.apache.org/jira/browse/HIVE-16213 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Alexander Kolbasov >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, > HIVE-16213.03.patch, HIVE-16213.04.patch, HIVE-16213.05.patch, > HIVE-16213.06.patch, HIVE-16213.07.patch, HIVE-16213.08.patch > > > In ObjectStore.java there are a few places with the code similar to: > {code} > Query query = null; > try { > openTransaction(); > query = pm.newQuery(Something.class); > ... > commited = commitTransaction(); > } finally { > if (!commited) { > rollbackTransaction(); > } > if (query != null) { > query.closeAll(); > } > } > {code} > The problem is that rollbackTransaction() may throw an exception in which > case query.closeAll() wouldn't be executed. > The fix would be to wrap rollbackTransaction in its own try-catch block. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16559) Parquet schema evolution for partitioned tables may break if table and partition serdes differ
[ https://issues.apache.org/jira/browse/HIVE-16559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989784#comment-15989784 ] Hive QA commented on HIVE-16559: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12865552/HIVE-16559.01.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 10636 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4934/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4934/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4934/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12865552 - PreCommit-HIVE-Build > Parquet schema evolution for partitioned tables may break if table and > partition serdes differ > -- > > Key: HIVE-16559 > URL: https://issues.apache.org/jira/browse/HIVE-16559 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-16559.01.patch > > > Parquet schema evolution should make it possible to have partitions/tables > backed by files with different schemas. Hive should match the table columns > with file columns based on the column name if possible. > However if the serde for a table is missing columns from the serde of a > partition Hive fails to match the columns together. > Steps to reproduce: > {code} > CREATE TABLE myparquettable_parted > ( > name string, > favnumber int, > favcolor string, > age int, > favpet string > ) > PARTITIONED BY (day string) > STORED AS PARQUET; > INSERT OVERWRITE TABLE myparquettable_parted > PARTITION(day='2017-04-04') > SELECT >'mary' as name, >5 AS favnumber, >'blue' AS favcolor, >35 AS age, >'dog' AS favpet; > alter table myparquettable_parted > REPLACE COLUMNS > ( > favnumber int, > age int > );