[jira] [Created] (HIVE-23732) [refactor] Move allocating write IDs to ValidTxnManager.getTxnWriteIds
Kishen Das created HIVE-23732: - Summary: [refactor] Move allocating write IDs to ValidTxnManager.getTxnWriteIds Key: HIVE-23732 URL: https://issues.apache.org/jira/browse/HIVE-23732 Project: Hive Issue Type: Task Components: HiveServer2 Reporter: Kishen Das Right now some of this logic is spread across in Driver.java and ValidTxnManager.java . It would be nice if we can consolidate the logic of allocating the write IDs and move it within ValidTxnManager. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23731) Review of AvroInstance Cache
David Mollitor created HIVE-23731: - Summary: Review of AvroInstance Cache Key: HIVE-23731 URL: https://issues.apache.org/jira/browse/HIVE-23731 Project: Hive Issue Type: Improvement Reporter: David Mollitor Assignee: David Mollitor -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23730) Compiler support tracking TS keyColName for Probe MapJoin
Panagiotis Garefalakis created HIVE-23730: - Summary: Compiler support tracking TS keyColName for Probe MapJoin Key: HIVE-23730 URL: https://issues.apache.org/jira/browse/HIVE-23730 Project: Hive Issue Type: Sub-task Reporter: Panagiotis Garefalakis Assignee: Panagiotis Garefalakis TezCompiler needs to track the original TS key columnName used for MJ probedecode. Even thought we know the MJ keyCol at compile time, this could be generated by previous (parent) operators thus we dont always know the original TS column it maps to. To find the original columnMapping, we need to track the MJ keyCol through the operator pipeline. The tracking is done through the output column name to input expression Map of the operators. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23729) LLAP text cache fails when using multiple tables/schemas on the same files
Ádám Szita created HIVE-23729: - Summary: LLAP text cache fails when using multiple tables/schemas on the same files Key: HIVE-23729 URL: https://issues.apache.org/jira/browse/HIVE-23729 Project: Hive Issue Type: Bug Reporter: Ádám Szita Assignee: Ádám Szita When using the text based cache we will hit exceptions in the following case: * Table A with 3 columns is defined on location X (where we have text based data files) * Table B with 2 columns is defined on the same location X * User runs a query on table A, thereby filling the LLAP cache. * If the next query goes against table B that has a different schema, LLAP will throw an error: {code:java} Caused by: java.lang.ArrayIndexOutOfBoundsException: 2 at org.apache.hadoop.hive.llap.cache.SerDeLowLevelCacheImpl.getCacheDataForOneSlice(SerDeLowLevelCacheImpl.java:411) at org.apache.hadoop.hive.llap.cache.SerDeLowLevelCacheImpl.getFileData(SerDeLowLevelCacheImpl.java:389) at org.apache.hadoop.hive.llap.io.encoded.SerDeEncodedDataReader.readFileWithCache(SerDeEncodedDataReader.java:819) at org.apache.hadoop.hive.llap.io.encoded.SerDeEncodedDataReader.performDataRead(SerDeEncodedDataReader.java:720) at org.apache.hadoop.hive.llap.io.encoded.SerDeEncodedDataReader$5.run(SerDeEncodedDataReader.java:274) at org.apache.hadoop.hive.llap.io.encoded.SerDeEncodedDataReader$5.run(SerDeEncodedDataReader.java:271) {code} This is because the cache lookup is based on file ID, which in this case is the same for both tables. However, unlike with ORC files, the cached content and the file content is different, as it is dependent on the schema that was defined by the user. That's because the original text content is encoded into ORC in the cache. I think for the text cache case we will need to extend the cache key from being just the simple file ID to something that tracks the schema too. This will result in caching the *same* *file* *content* multiple times (if there are multiple schemas like this), however as we can see the *cached content itself could be quite different* (e.g. different streams with different encodings), and in turn we gain correctness. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23728) Run metastore verification tests during precommit
Zoltan Haindrich created HIVE-23728: --- Summary: Run metastore verification tests during precommit Key: HIVE-23728 URL: https://issues.apache.org/jira/browse/HIVE-23728 Project: Hive Issue Type: Sub-task Reporter: Zoltan Haindrich Assignee: Zoltan Haindrich -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23727) Improve SQLOperation log handling when cleanup
Zhihua Deng created HIVE-23727: -- Summary: Improve SQLOperation log handling when cleanup Key: HIVE-23727 URL: https://issues.apache.org/jira/browse/HIVE-23727 Project: Hive Issue Type: Improvement Reporter: Zhihua Deng The SQLOperation checks _if (shouldRunAsync() && state != OperationState.CANCELED && state != OperationState.TIMEDOUT)_ to cancel the background task. If true, the state should not be OperationState.CANCELED, so logging under the state == OperationState.CANCELED should never happen. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (HIVE-23726) Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string)
Istvan Fajth created HIVE-23726: --- Summary: Create table may throw MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string) Key: HIVE-23726 URL: https://issues.apache.org/jira/browse/HIVE-23726 Project: Hive Issue Type: Bug Reporter: Istvan Fajth - Given: metastore.warehouse.tenant.colocation is set to true a test database was created as {{create database test location '/data'}} - When: I try to create a table as {{create table t1 (a int) location '/data/t1'}} - Then: The create table fails with the following exception: {code} org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from a null string) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1138) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1143) at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:148) at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:80) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:359) at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482) at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225) at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hadoop.hive.metastore.api.MetaException: java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63325) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63293) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63219) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1780) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1767) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3518) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1052) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1037) at sun.reflect.GeneratedMethodAccessor66.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at
Re: Some tests started hanging recently
Hello Zoltan, Thank you for this. So, I ran few tests under itests to replicate the issue. There are 10 files inside itests which use Tez in one form or the other. I ran tests for all. All of them finished with maximum one running duration for 8 mins. Below are the timings for all |Testname|Duration| |TestMmCompactorOnTez| 3.41 min| |TestAcidOnTez|| |TestCrudCompactorOnTez| 4.00 min| |TestBeeLineWithArgs| 2.59 min| |TestCopyUtils| 49 s| |TestTriggersTezSessionPoolManager| 22 s| |TestTriggersWorkloadManager| 21 s| |TestTriggersNoTezSessionPool| 1.50 min| |TestTezPerfConstraintsCliDriver| 8 min | I am not sure what is causing the issue seen on the build server. Regards, Jagat Singh On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich wrote: > Hey all, > > Since yesterday some tests started to hang - most frequently > TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication > test as well - so I don't think its > limited to those 2 tests. > > I was not able to figure out what have caused this - my current guess is > that somehow the tez 0.9.2 upgrade have caused it. > To validate this guess I've started the flaky checker with and without > that patch from the current state... > > I've collected some jstacks from the containers running for more than 20 > hours > > https://termbin.com/z1eoc > https://termbin.com/2m0j > https://termbin.com/027t > https://termbin.com/1dbe > > cheers, > Zoltan >
Re: Review Request 72528: ValidTxnManager doesn't consider txns opened and committed between snapshot generation and locking when evaluating ValidTxnListState
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/72528/#review221036 --- Ship it! +1 since the remaining issue will be fixed in HIVE-23725 - Peter Vary On jún. 9, 2020, 8:52 de, Denys Kuzmenko wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/72528/ > --- > > (Updated jún. 9, 2020, 8:52 de) > > > Review request for hive, Jesús Camacho Rodríguez, Peter Varga, and Peter Vary. > > > Bugs: HIVE-23503 > https://issues.apache.org/jira/browse/HIVE-23503 > > > Repository: hive-git > > > Description > --- > > ValidTxnManager doesn't consider txns opened and committed between snapshot > generation and locking when evaluating ValidTxnListState. This cause issues > like duplicate insert in case of concurrent merge insert & insert. > > > Diffs > - > > ql/src/java/org/apache/hadoop/hive/ql/Driver.java e70c92eef4 > ql/src/java/org/apache/hadoop/hive/ql/DriverContext.java a8c83fc504 > ql/src/java/org/apache/hadoop/hive/ql/ValidTxnManager.java 7d49c57dda > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 71afcbdc68 > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DummyTxnManager.java > 0383881acc > ql/src/java/org/apache/hadoop/hive/ql/lockmgr/HiveTxnManager.java > 600289f837 > ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java > 8a15b7cc5d > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java > 65df9c2ba9 > > standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java > 887d4303f4 > > standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java > 312936efa8 > storage-api/src/java/org/apache/hadoop/hive/common/ValidReadTxnList.java > b8ff03f9c4 > storage-api/src/java/org/apache/hadoop/hive/common/ValidTxnList.java > d4c3b09730 > > > Diff: https://reviews.apache.org/r/72528/diff/1/ > > > Testing > --- > > DbTxnManager tests. > > Faulty scenario: > 1. open and generate snapshot for t1 that merge inserts data from a source > table into the target one. > 2. Open, run and commit t2 that inserts source table data into the target > table. > 3. Run t1 - duplicate date would be inserted into target table as t2 changes > won't be visible by t1. > > > Thanks, > > Denys Kuzmenko > >
[jira] [Created] (HIVE-23725) ValidTxnManager snapshot outdating causing partial reads in merge inserst
Peter Varga created HIVE-23725: -- Summary: ValidTxnManager snapshot outdating causing partial reads in merge inserst Key: HIVE-23725 URL: https://issues.apache.org/jira/browse/HIVE-23725 Project: Hive Issue Type: Bug Reporter: Peter Varga Assignee: Peter Varga When the ValidTxnManager invalidates the snapshot during merge insert and starts to read committed transactions that were not committed when the query compilation happened, it can cause partial read problems if the committed transaction created new partition in the source or target table. The solution should be not only fix the snapshot but also recompile the query and acquire the locks again -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Reviewers and assignees of PRs
Hey Jagat! Thank you for your feedback - we should definetly need to improve on these fronts. I'll get to that when I could - but I'll also answer your questions here as well. On 6/19/20 3:16 AM, Jagat Singh wrote: One thing which needs improvement is updating of Hive Contributors wiki with whatever process happens on Github and Build server-side. The current confluence is silent on what to expect when we create a PR as a contributor, who will review, what will build system do? Where to look for errors? definetly - I've added some parts about opening PR but that's not enough. I think the "who will review" is a sweet spot alreadywe have do definetly work on that part. Based on my first PR experience, do you manually label PRs as test stable, unstable etc? I am not sure if that can be automated if not done already along with auto assigning of reviews as you intend to do with this current proposal. no; I don't do it manually :D the job is moving those labels around. Since adding/removing labels needs access to the repo - and ASF doesn't give this kind of access to non-committers; that have left me to configure my own user to do this. I can update a few things based on what I learnt as I recently started contributing and I feel these all things are missing. But there are many things for which I don't know the answer yet and will appreciate if someone experienced update the wiki to add details like above questions. It would be great if you could help in this - you know; someone new to a project see problems differently - and may ask about stuff which is just become like that. IIRC there was a thread some time ago about that the way we host our hive.apache.org pages have its days numbered - and we should be moving away from it to something more modern - like hugo (or some other alternative). I'm bringing this up here because moving to something like that will place the site inside the repo itself - that could enable to review site changes on PRs as well... I think we could move either parts of the wiki to a '.md' file based world - I think we could start with the contributors guide. There could be low hanging benefits of doing this - like asking for adjustments to the contribution guide withing a patch which changes the way we test stuff. cheers, Zoltan Thanks, Jagat Singh On Thu, 18 Jun 2020 at 20:43, Zoltan Haindrich wrote: Hey Panos! On 6/18/20 11:54 AM, Panos Garefalakis wrote: My only suggestion would be to make reviewing per package/label instead of files. This will make the process a bit more clear. we could use path globs to select the files - so it could match on packages as well I've not really used it '**/schq/**' I recently bumped into this GitHub action that lets you automatically label PRs based on what paths they modify and could help us towards that goal. https://github.com/actions/labeler Sure; we can also have that as well! they may fit for different purposes. Aactually - based on the "absence" of some labels (eg: metastore) we may "skip" some tests. cheers, Zoltan Thoughts? Cheers, Panagiotis On Thu, Jun 18, 2020 at 10:42 AM Zoltan Haindrich wrote: Hey all! I'm happy to see that (I guess) everyone is using the PR based stuff without issues - there are still some flaky stuff from time-to-time; but I feel that patches go in faster - and I have a feeling we have more reviewes going on as well - which is awesome! I've read a bit about github "reviewers" / "assignee" stuff - because it seemed somewhat confusing... Basically both of them could be a group of users - the meaning of these fields should be filled by the community. I would like to propose to use the "reviewers" to use it as people from whom reviews might be expected. And use the assignee field to list those who should approve the change to go in (anyone may add asignees/reviewers) We sometimes forget PRs and they may become "stale" most of them is just falling thru the cracks...to prevent this the best would be if everyone would self-assign PRs which are in his/her area of interest. There are some times when a give feature needs to change not closely related parts of the codebase - this is usually fine; but there are places which might need "more eyes" on reviews. In the past I was sometimes surprised by some interesting changes in say the thrift api / package.jdo / antlr stuff. Because the jira title may not suggest what files will be changed - I wanted to find a way to auto add some kind of notifications to PRs. Today I've found a neat solution to this [1] - which goes a little bit beyond what I anticipated - there is a small plugin which could enable to auto-add reviewers based on the changed files (adding a reviewer will also emit an email) - I had to fix a few small issues with it to ensure that it works/etc [2]. I really like this approach beacuase it could enable to change the direction of things - and could enable that contributors doesn't
Re: Hive Dev Unit tests parallel execution
Hey Jagat! that page is totally outdated...it takes almost a full day to execute the tests on a single machine; you can go parallel - but that may surface some other things (port already used and such) The easiest is to open a PR and let precommit testing do the job cheers, Zoltan On 6/17/20 10:07 PM, Jagat Singh wrote: Hello everyone, Is it possible to run Hive unit tests parallelly? Is this document an updated one? https://cwiki.apache.org/confluence/display/Hive/Unit+Test+Parallel+Execution Thanks in advance for your help. Regards, Jagat Singh
Re: Some tests started hanging recently
Hey Jagat! On 6/19/20 3:19 AM, Jagat Singh wrote: I was not expecting to hear this for my first PR :( No worries - this could happen; I think we've bumped into some nasty concurrency bug... I will also try to re-run the tests locally on my system and report back to you. thank you, it took a while but the flaky-checker have stuck after running 8 times with one of the tests - while the other (with the tez update patch reverted) have finished successfully and run it a 100 times. http://130.211.9.232/job/hive-flaky-check/51/ http://130.211.9.232/job/hive-flaky-check/52/ I'm going to revert tez 0.9.2 for now cheers, Zoltan Thanks, Jagat Singh On Fri, 19 Jun 2020 at 00:30, Zoltan Haindrich wrote: Hey all, Since yesterday some tests started to hang - most frequently TestCrudCompactorOnTez or TestMmCompactorOnTez but I've seen a replication test as well - so I don't think its limited to those 2 tests. I was not able to figure out what have caused this - my current guess is that somehow the tez 0.9.2 upgrade have caused it. To validate this guess I've started the flaky checker with and without that patch from the current state... I've collected some jstacks from the containers running for more than 20 hours https://termbin.com/z1eoc https://termbin.com/2m0j https://termbin.com/027t https://termbin.com/1dbe cheers, Zoltan