[jira] [Updated] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL
[ https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18675: -- Component/s: Metastore > make HIVE_LOCKS.HL_TXNID NOT NULL > - > > Key: HIVE-18675 > URL: https://issues.apache.org/jira/browse/HIVE-18675 > Project: Hive > Issue Type: Bug > Components: Metastore, Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Kryvenko Igor >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch > > > In Hive 3.0 all statements that may need locks run in a transaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL
[ https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18675: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Igor for the contribution > make HIVE_LOCKS.HL_TXNID NOT NULL > - > > Key: HIVE-18675 > URL: https://issues.apache.org/jira/browse/HIVE-18675 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Kryvenko Igor >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch > > > In Hive 3.0 all statements that may need locks run in a transaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL
[ https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395669#comment-16395669 ] Eugene Koifman commented on HIVE-18675: --- [~vbeshka], makes sense. +1 > make HIVE_LOCKS.HL_TXNID NOT NULL > - > > Key: HIVE-18675 > URL: https://issues.apache.org/jira/browse/HIVE-18675 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Kryvenko Igor >Priority: Major > Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch > > > In Hive 3.0 all statements that may need locks run in a transaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL
[ https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395256#comment-16395256 ] Eugene Koifman commented on HIVE-18675: --- why are there both metastore/scripts/upgrade/derby/upgrade-2.3.0-to-3.0.0.derby.sql and standalone-metastore/src/main/sql/derby/upgrade-2.3.0-to-3.0.0.derby.sql? should one of these be removed? cc [~alangates] > make HIVE_LOCKS.HL_TXNID NOT NULL > - > > Key: HIVE-18675 > URL: https://issues.apache.org/jira/browse/HIVE-18675 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Kryvenko Igor >Priority: Major > Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch > > > In Hive 3.0 all statements that may need locks run in a transaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (HIVE-18662) hive.acid.key.index is missing entries
[ https://issues.apache.org/jira/browse/HIVE-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman resolved HIVE-18662. --- Resolution: Fixed Assignee: Eugene Koifman Fix Version/s: 3.0.0 fixed in HIVE-18817 > hive.acid.key.index is missing entries > -- > > Key: HIVE-18662 > URL: https://issues.apache.org/jira/browse/HIVE-18662 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.0.0 > > > OrcRecordUpdater.KeyIndexBuilder stores an index in ORC footer where each > entry is the last ROW__ID of each stripe. In acid1 this is used to filter > the events from delta file when merging with part of the base. > > as can be seen in {{TestTxnCommands.testVersioning()}} (added in HIVE-18659) > the {{hive.acid.key.index}} is empty. > > This is because very little data is written and WriterImpl.flushStripe() is > not called except when { > {WriterImpl.close()} > is called. In the later, {{WriterCallback.preFooterWrite()}} is called > before {{preStripeWrite}} and so KeyIndexBuilder.preFooterWriter() records > nothing in \{{hive.acid.key.index}} > > need to investigate if this is an issue, in particular acid 2 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table
[ https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18739: -- Attachment: HIVE-18739.06.patch > Add support for Export from unpartitioned Acid table > > > Key: HIVE-18739 > URL: https://issues.apache.org/jira/browse/HIVE-18739 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, > HIVE-18739.04.patch, HIVE-18739.06.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-17206) make a version of Compactor specific to unbucketed tables
[ https://issues.apache.org/jira/browse/HIVE-17206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17206: -- Description: current Compactor will work but is not optimized/flexible enough The current compactor is designed to generate the number of splits equal to the number of buckets in the table. That is the degree of parallelism. For unbucketed tables, the same is used but the "number of buckets" is derived from the files found in the deltas. For small writes, there will likely be just 1 bucket_0 file. For large writes, the parallelism of the write determines the number of output files. Need to make sure Compactor can control parallelism for unbucketed tables as it wishes. For example, hash partition all records (by ROW__ID?) into N disjoint sets. was:current Compactor will work but is not optimized/flexible enough > make a version of Compactor specific to unbucketed tables > - > > Key: HIVE-17206 > URL: https://issues.apache.org/jira/browse/HIVE-17206 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > current Compactor will work but is not optimized/flexible enough > The current compactor is designed to generate the number of splits equal to > the number of buckets in the table. That is the degree of parallelism. > For unbucketed tables, the same is used but the "number of buckets" is > derived from the files found in the deltas. For small writes, there will > likely be just 1 bucket_0 file. For large writes, the parallelism of the > write determines the number of output files. > Need to make sure Compactor can control parallelism for unbucketed tables as > it wishes. For example, hash partition all records (by ROW__ID?) into N > disjoint sets. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18773) Support multiple instances of Cleaner
[ https://issues.apache.org/jira/browse/HIVE-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18773: - Assignee: Eugene Koifman > Support multiple instances of Cleaner > - > > Key: HIVE-18773 > URL: https://issues.apache.org/jira/browse/HIVE-18773 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > We support multiple Workers by making each Worker update the status of the > entry in COMPACTION_QUEUE to make sure only 1 worker grabs it. Once we have > HIVE-18772, Cleaner should not need any state we can easily have > 1 Cleaner > instance by introducing 1 more status type "being cleaned". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted
[ https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393381#comment-16393381 ] Eugene Koifman commented on HIVE-18693: --- [~steveyeom2017], I think the code changes are fine in general but the test is missing a few checks. Left some comments in RB. > Snapshot Isolation does not work for Micromanaged table when a insert > transaction is aborted > > > Key: HIVE-18693 > URL: https://issues.apache.org/jira/browse/HIVE-18693 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Steve Yeom >Assignee: Steve Yeom >Priority: Major > Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, > HIVE-18693.03.patch, HIVE-18693.04.patch > > > TestTxnCommands2#writeBetweenWorkerAndCleaner with minor > changes (changing delete command to insert command) fails on MM table. > Specifically the last SELECT commands returns wrong results. > But this test works fine with full ACID table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted
[ https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393189#comment-16393189 ] Eugene Koifman commented on HIVE-18693: --- [~steveyeom2017] could you update RB with latest patch > Snapshot Isolation does not work for Micromanaged table when a insert > transaction is aborted > > > Key: HIVE-18693 > URL: https://issues.apache.org/jira/browse/HIVE-18693 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Steve Yeom >Assignee: Steve Yeom >Priority: Major > Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, > HIVE-18693.03.patch, HIVE-18693.04.patch > > > TestTxnCommands2#writeBetweenWorkerAndCleaner with minor > changes (changing delete command to insert command) fails on MM table. > Specifically the last SELECT commands returns wrong results. > But this test works fine with full ACID table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()
[ https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18918: -- Resolution: Fixed Fix Version/s: 3.0.0 Target Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Jason for the review > Bad error message in CompactorMR.lanuchCompactionJob() > -- > > Key: HIVE-18918 > URL: https://issues.apache.org/jira/browse/HIVE-18918 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.2 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18918.01.patch > > > {noformat} > rj.waitForCompletion(); > if (!rj.isSuccessful()) { > throw new IOException(compactionType == CompactionType.MAJOR ? > "Major" : "Minor" + >" compactor job failed for " + jobName + "! Hadoop JobId: " + > rj.getID()); > } > {noformat} > produces no useful info in case of Major compaction > {noformat} > 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker > (Worker.java:run(191)) - Caught exception while trying to compact > id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\ > ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.io.IOException: Ma\ > jor > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269) > at > org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18723) CompactorOutputCommitter.commitJob() - check rename() ret val
[ https://issues.apache.org/jira/browse/HIVE-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393134#comment-16393134 ] Eugene Koifman commented on HIVE-18723: --- [~vbeshka], I don't understand the logic. If B exists, rename will create B/A. The delete you added will delete A (in B/A) so B will not have the results of the compaction. > CompactorOutputCommitter.commitJob() - check rename() ret val > - > > Key: HIVE-18723 > URL: https://issues.apache.org/jira/browse/HIVE-18723 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Kryvenko Igor >Priority: Major > Attachments: HIVE-18723.1.patch, HIVE-18723.2.patch, > HIVE-18723.3.patch, HIVE-18723.patch > > > right now ret val is ignored {{fs.rename(fileStatus.getPath(), newPath); }} > Should this use {{FileUtils.ename(FileSystem fs, Path sourcePath, Path > destPath, Configuration conf) }} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18571) stats issues for MM tables; ACID doesn't check state for CTAS
[ https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18571: -- Component/s: Transactions > stats issues for MM tables; ACID doesn't check state for CTAS > - > > Key: HIVE-18571 > URL: https://issues.apache.org/jira/browse/HIVE-18571 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, > HIVE-18571.03.patch, HIVE-18571.04.patch, HIVE-18571.05.patch, > HIVE-18571.06.patch, HIVE-18571.07.patch, HIVE-18571.patch > > > There are multiple stats aggregation issues with MM tables. > Some simple stats are double counted and some stats (simple stats) are > invalid for ACID table dirs altogether. > I have a patch almost ready, need to fix some more stuff and clean up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18911) LOAD.. code for MM has some suspect/dead code
[ https://issues.apache.org/jira/browse/HIVE-18911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18911: -- Component/s: Transactions > LOAD.. code for MM has some suspect/dead code > - > > Key: HIVE-18911 > URL: https://issues.apache.org/jira/browse/HIVE-18911 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Sergey Shelukhin >Priority: Major > > Discovered in HIVE-18571 and added TODO-s that need to be addressed. > E.g. {noformat} > if (isMmTableWrite) { >// We will load into MM directory, and delete from the parent if > needed. > // TODO: this looks invalid after ACID integration. What about base > dirs? >destPath = new Path(destPath, AcidUtils.deltaSubdir(writeId, > writeId, stmtId)); > ... > // TODO: loadFileType for MM table will no longer be REPLACE_ALL >filter = (loadFileType == LoadFileType.REPLACE_ALL) > {noformat} > 2 places like that > Also replaceFiles has isMmTableWrite flag that should no longer be needed > (since for a transactional table we should never replace files). Either > there's some invalid code path that relies on it (load table?), or it is just > unused and needs to be removed. > Also used in 2 places, "TODO: this should never run for MM tables anymore. > Remove the flag, and maybe the filter?" -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()
[ https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392219#comment-16392219 ] Eugene Koifman commented on HIVE-18918: --- [~jdere] could you review please > Bad error message in CompactorMR.lanuchCompactionJob() > -- > > Key: HIVE-18918 > URL: https://issues.apache.org/jira/browse/HIVE-18918 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.2 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18918.01.patch > > > {noformat} > rj.waitForCompletion(); > if (!rj.isSuccessful()) { > throw new IOException(compactionType == CompactionType.MAJOR ? > "Major" : "Minor" + >" compactor job failed for " + jobName + "! Hadoop JobId: " + > rj.getID()); > } > {noformat} > produces no useful info in case of Major compaction > {noformat} > 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker > (Worker.java:run(191)) - Caught exception while trying to compact > id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\ > ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.io.IOException: Ma\ > jor > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269) > at > org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()
[ https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18918: -- Description: {noformat} rj.waitForCompletion(); if (!rj.isSuccessful()) { throw new IOException(compactionType == CompactionType.MAJOR ? "Major" : "Minor" + " compactor job failed for " + jobName + "! Hadoop JobId: " + rj.getID()); } {noformat} produces no useful info in case of Major compaction {noformat} 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker (Worker.java:run(191)) - Caught exception while trying to compact id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\ ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. Marking failed to avoid repeated failures, java.io.IOException: Ma\ jor at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314) at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269) at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172) {noformat} was: {noformat} rj.waitForCompletion(); if (!rj.isSuccessful()) { throw new IOException(compactionType == CompactionType.MAJOR ? "Major" : "Minor" + " compactor job failed for " + jobName + "! Hadoop JobId: " + rj.getID()); } {noformat} produces no useful info in case of Major compaction > Bad error message in CompactorMR.lanuchCompactionJob() > -- > > Key: HIVE-18918 > URL: https://issues.apache.org/jira/browse/HIVE-18918 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.2 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18918.01.patch > > > {noformat} > rj.waitForCompletion(); > if (!rj.isSuccessful()) { > throw new IOException(compactionType == CompactionType.MAJOR ? > "Major" : "Minor" + >" compactor job failed for " + jobName + "! Hadoop JobId: " + > rj.getID()); > } > {noformat} > produces no useful info in case of Major compaction > {noformat} > 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker > (Worker.java:run(191)) - Caught exception while trying to compact > id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\ > ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. > Marking failed to avoid repeated failures, java.io.IOException: Ma\ > jor > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314) > at > org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269) > at > org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) > at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()
[ https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18918: -- Status: Patch Available (was: Open) > Bad error message in CompactorMR.lanuchCompactionJob() > -- > > Key: HIVE-18918 > URL: https://issues.apache.org/jira/browse/HIVE-18918 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.2 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18918.01.patch > > > {noformat} > rj.waitForCompletion(); > if (!rj.isSuccessful()) { > throw new IOException(compactionType == CompactionType.MAJOR ? > "Major" : "Minor" + >" compactor job failed for " + jobName + "! Hadoop JobId: " + > rj.getID()); > } > {noformat} > produces no useful info in case of Major compaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()
[ https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18918: -- Attachment: HIVE-18918.01.patch > Bad error message in CompactorMR.lanuchCompactionJob() > -- > > Key: HIVE-18918 > URL: https://issues.apache.org/jira/browse/HIVE-18918 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.2 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18918.01.patch > > > {noformat} > rj.waitForCompletion(); > if (!rj.isSuccessful()) { > throw new IOException(compactionType == CompactionType.MAJOR ? > "Major" : "Minor" + >" compactor job failed for " + jobName + "! Hadoop JobId: " + > rj.getID()); > } > {noformat} > produces no useful info in case of Major compaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()
[ https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18918: - > Bad error message in CompactorMR.lanuchCompactionJob() > -- > > Key: HIVE-18918 > URL: https://issues.apache.org/jira/browse/HIVE-18918 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.2.2 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > {noformat} > rj.waitForCompletion(); > if (!rj.isSuccessful()) { > throw new IOException(compactionType == CompactionType.MAJOR ? > "Major" : "Minor" + >" compactor job failed for " + jobName + "! Hadoop JobId: " + > rj.getID()); > } > {noformat} > produces no useful info in case of Major compaction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391880#comment-16391880 ] Eugene Koifman commented on HIVE-18825: --- The Lock manger is entirely based on Read/WriteEntities created by compiler. If you move lock acquisition to some place before they are available, you basically need to rewrite the entire logic in the LM that is used to figure out what to lock. It may be possible but it's an unpredictably large amount of work which makes pessimistic locking an impractically expensive feature. Incremental refresh is not prevented by this, though I understand that there are some possible performance optimizations that are difficult w/o this. > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, > HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table
[ https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18739: -- Status: Patch Available (was: Open) > Add support for Export from unpartitioned Acid table > > > Key: HIVE-18739 > URL: https://issues.apache.org/jira/browse/HIVE-18739 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, > HIVE-18739.04.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table
[ https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18739: -- Attachment: HIVE-18739.04.patch > Add support for Export from unpartitioned Acid table > > > Key: HIVE-18739 > URL: https://issues.apache.org/jira/browse/HIVE-18739 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, > HIVE-18739.04.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table
[ https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18739: -- Attachment: HIVE-18739.04.patch > Add support for Export from unpartitioned Acid table > > > Key: HIVE-18739 > URL: https://issues.apache.org/jira/browse/HIVE-18739 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables
[ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18814: -- Description: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] Add Partition command creates a {{Partition}} metadata object and sets the location to the directory containing data files. In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time the data is decorated with row__id but the original transaction is 0. I suspect in earlier Hive versions this will throw or return no data. Since this new partition didn't have data before, assigning txnid:0 isn't going to generate duplicate IDs but it could violate Snapshot Isolation in multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8 adds a partition to T. Now if txnid:7 runs the same query again, it will see the data in the new partition. This can't be release like this since a delete on this data (added via Add partition) will use row_ids with txnid:0 so a later upgrade that sees un-compacted may generate row_ids with different txnid (assuming this is fixed by then) One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data there. Another is to allocate a new writeid and save it in Partition metadata. This could then be used to decorate data with ROW__IDs. This avoids move/copy but retains data "outside" of the table tree which make it more likely that this data will be modified in some way which can really break things if done after and SQL update/delete on this data have happened. It performs no validations on add (except for partition spec) so any file with any format can be added. It allows add to bucketed tables as well. Seems like a very dangerous command. Maybe a better option is to block it and advise using Load Data. Alternatively, make this do Add partition metadata op followed by Load Data. was: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] Add Partition command creates a {{Partition}} metadata object and sets the location to the directory containing data files. In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time the data is decorated with row__id but the original transaction is 0. I suspect in earlier Hive versions this will throw or return no data. Since this new partition didn't have data before, assigning txnid:0 isn't going to generate duplicate IDs but it could violate Snapshot Isolation in multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8 adds a partition to T. Now if txnid:7 runs the same query again, it will see the data in the new partition. One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data there. Another is to allocate a new writeid and save it in Partition metadata. This could then be used to decorate data with ROW__IDs. This avoids move/copy but retains data "outside" of the table tree which make it more likely that this data will be modified in some way which can really break things if done after and SQL update/delete on this data have happened. It performs no validations on add (except for partition spec) so any file with any format can be added. It allows add to bucketed tables as well. Seems like a very dangerous command. Maybe a better option is to block it and advise using Load Data. Alternatively, make this do Add partition metadata op followed by Load Data. > Support Add Partition For Acid tables > - > > Key: HIVE-18814 > URL: https://issues.apache.org/jira/browse/HIVE-18814 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18814.wip.patch > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] > Add Partition command creates a {{Partition}} metadata object and sets the > location to the directory containing data files. > In current master (Hive 3.0), Add partition on an acid table doesn't fail and > at read time the data is decorated with row__id but the original transaction > is 0. I suspect in earlier Hive versions this will throw or return no data. > Since this new partition didn't have data before, assigning txnid:0 isn't > going to generate duplicate IDs but it could violate Snapshot Isolation in > multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8 > adds a partition to T. Now if txnid:7 runs the same query again, it will see > the data in the new partition. > This can't be release like this since a
[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables
[ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18814: -- Description: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] Add Partition command creates a {{Partition}} metadata object and sets the location to the directory containing data files. In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time the data is decorated with row__id but the original transaction is 0. I suspect in earlier Hive versions this will throw or return no data. Since this new partition didn't have data before, assigning txnid:0 isn't going to generate duplicate IDs but it could violate Snapshot Isolation in multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8 adds a partition to T. Now if txnid:7 runs the same query again, it will see the data in the new partition. One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data there. Another is to allocate a new writeid and save it in Partition metadata. This could then be used to decorate data with ROW__IDs. This avoids move/copy but retains data "outside" of the table tree which make it more likely that this data will be modified in some way which can really break things if done after and SQL update/delete on this data have happened. It performs no validations on add (except for partition spec) so any file with any format can be added. It allows add to bucketed tables as well. Seems like a very dangerous command. Maybe a better option is to block it and advise using Load Data. Alternatively, make this do Add partition metadata op followed by Load Data. was: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] Add Partition command creates a {{Partition}} metadata object and sets the location to the directory containing data files. In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time the data is decorated with row__id but the original transaction is 0. I suspect in earlier Hive versions this will throw or return no data. One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data there. Another is to allocate a new writeid and save it in Partition metadata. This could then be used to decorate data with ROW__IDs. This avoids move/copy but retains data "outside" of the table tree which make it more likely that this data will be modified in some way which can really break things if done after and SQL update/delete on this data have happened. It performs no validations on add (except for partition spec) so any file with any format can be added. It allows add to bucketed tables as well. Seems like a very dangerous command. Maybe a better option is to block it and advise using Load Data. Alternatively, make this do Add partition metadata op followed by Load Data. > Support Add Partition For Acid tables > - > > Key: HIVE-18814 > URL: https://issues.apache.org/jira/browse/HIVE-18814 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18814.wip.patch > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] > Add Partition command creates a {{Partition}} metadata object and sets the > location to the directory containing data files. > In current master (Hive 3.0), Add partition on an acid table doesn't fail and > at read time the data is decorated with row__id but the original transaction > is 0. I suspect in earlier Hive versions this will throw or return no data. > Since this new partition didn't have data before, assigning txnid:0 isn't > going to generate duplicate IDs but it could violate Snapshot Isolation in > multi stmt txns. Suppose txnid:7 runs {{select * from T}}. Then txnid:8 > adds a partition to T. Now if txnid:7 runs the same query again, it will see > the data in the new partition. > > One option is follow Load Data approach and create a new delta_x_x/ and > move/copy the data there. > > Another is to allocate a new writeid and save it in Partition metadata. This > could then be used to decorate data with ROW__IDs. This avoids move/copy but > retains data "outside" of the table tree which make it more likely that this > data will be modified in some way which can really break things if done after > and SQL update/delete on this data have happened. > > It performs no validations on add (except for partition spec) so any file > with any format can be added. It allows
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389911#comment-16389911 ] Eugene Koifman commented on HIVE-18825: --- Just because some feature doesn't exist yet, it doesn't mean we should make changes that will make that feature impossible in the future. For example, we don't have multi statement transactions fully supported, but i constantly pay attention to it to make sure it will be possible to finish it. > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, > HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18571) stats issues for MM tables; ACID doesn't check state for CTAS
[ https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388930#comment-16388930 ] Eugene Koifman commented on HIVE-18571: --- +1 pending tests > stats issues for MM tables; ACID doesn't check state for CTAS > - > > Key: HIVE-18571 > URL: https://issues.apache.org/jira/browse/HIVE-18571 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, > HIVE-18571.03.patch, HIVE-18571.04.patch, HIVE-18571.05.patch, > HIVE-18571.06.patch, HIVE-18571.patch > > > There are multiple stats aggregation issues with MM tables. > Some simple stats are double counted and some stats (simple stats) are > invalid for ACID table dirs altogether. > I have a patch almost ready, need to fix some more stuff and clean up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388925#comment-16388925 ] Eugene Koifman commented on HIVE-18825: --- I'm strongly against this. > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, > HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18886) ACID: NPE on unexplained mysql exceptions
[ https://issues.apache.org/jira/browse/HIVE-18886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388770#comment-16388770 ] Eugene Koifman commented on HIVE-18886: --- +1 > ACID: NPE on unexplained mysql exceptions > -- > > Key: HIVE-18886 > URL: https://issues.apache.org/jira/browse/HIVE-18886 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: HIVE-18886.1.patch > > > At 200+ sessions on a single HS2, the DbLock impl fails to propagate mysql > exceptions > {code} > 2018-03-06T22:55:16,197 ERROR [HiveServer2-Background-Pool: Thread-12867]: > ql.Driver (:()) - FAILED: Error in acquiring locks: null > java.lang.NullPointerException > at > org.apache.hadoop.hive.metastore.DatabaseProduct.isDeadlock(DatabaseProduct.java:56) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkRetryable(TxnHandler.java:2459) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.getOpenTxns(TxnHandler.java:499) > {code} > {code} > return e instanceof SQLTransactionRollbackException > || ((dbProduct == MYSQL || dbProduct == POSTGRES || dbProduct == > SQLSERVER) > && e.getSQLState().equals("40001")) > || (dbProduct == POSTGRES && e.getSQLState().equals("40P01")) > || (dbProduct == ORACLE && (e.getMessage().contains("deadlock > detected") > || e.getMessage().contains("can't serialize access for this > transaction"))); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388481#comment-16388481 ] Eugene Koifman commented on HIVE-18825: --- The lock manger uses Read/WriteEnity in the QueryPlan to know what to lock. Those are not there after parsing, so I don't see how that can work w/o rewriting the LM. > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, > HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18864) WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained after allocating writeId by current transaction.
[ https://issues.apache.org/jira/browse/HIVE-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388106#comment-16388106 ] Eugene Koifman commented on HIVE-18864: --- Yes, you are right. If you "fix" writeID_HWM=5 as I was suggesting, txn=10 won't be able to read it's own write. > WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained > after allocating writeId by current transaction. > --- > > Key: HIVE-18864 > URL: https://issues.apache.org/jira/browse/HIVE-18864 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID > Fix For: 3.0.0 > > > For multi-statement txns, it is possible that write on a table happens after > a read. Let's see the below scenario. > # Committed txn=9 writes on table T1 with writeId=5. > # Open txn=10. ValidTxnList(open:null, txn_HWM=10), > # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5). > # Open txn=11, writes on table T1 with writeid=6. > # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5). > # Write table T1 from txn=10 with writeId=7. > # Read table T1 from txn=10. {color:#d04437}*ValidWriteIdList(open:null, > write_HWM=7)*. – This read will able to see rows added by txn=11 which is > still open.{color} > {color:#d04437}So, it is needed to rebuild the open/aborted list of > ValidWriteIdList based on txn_HWM. Any writeId allocated by txnId > txn_HWM > should be marked as open. In this example, *ValidWriteIdList(open:6, > write_HWM=7)* should be generated.{color} > {color:#33}cc{color} [~ekoifman], [~thejas] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387145#comment-16387145 ] Eugene Koifman edited comment on HIVE-18825 at 3/6/18 2:02 AM: --- I think there is a problem here that I should've thought of earlier. Right now we lock in the snapshot after lock acquisition. This ordering is important if we ever want to support lock based concurrency control. (Something I think we should do) Suppose you have 2 concurrent transactions both running "update T1 set x = x + 1". If we acquire the update lock first, then record the snapshot, then 2nd txn to get the lock will see the result of write of the previous one. If we lock the snapshot before acquiring the lock, both transactions may lock in exactly the same snapshot and locking becomes useless as the 2nd will still read an "old" snapshot. Could the predicates you want be inserted at compile time, but bound to actual values as some post processing after (or at the end of) {{Driver.acquireLocks()}} as currently implemented? was (Author: ekoifman): I think there is a problem here that I should've thought of earlier. Right now we lock in the snapshot after lock acquisition. This ordering is important if we ever want to support lock based concurrency control. Suppose you have 2 concurrent transactions both running "update T1 set x = x + 1". If we acquire the update lock first, then record the snapshot, then 2nd txn to get the lock will see the result of write of the previous one. If we lock the snapshot before acquiring the lock, both transactions may lock in exactly the same snapshot and locking becomes useless as the 2nd will still read an "old" snapshot. Could the predicates you want be inserted at compile time, but bound to actual values as some post processing after (or at the end of) {{Driver.acquireLocks()}} as currently implemented? > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, > HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387145#comment-16387145 ] Eugene Koifman commented on HIVE-18825: --- I think there is a problem here that I should've thought of earlier. Right now we lock in the snapshot after lock acquisition. This ordering is important if we ever want to support lock based concurrency control. Suppose you have 2 concurrent transactions both running "update T1 set x = x + 1". If we acquire the update lock first, then record the snapshot, then 2nd txn to get the lock will see the result of write of the previous one. If we lock the snapshot before acquiring the lock, both transactions may lock in exactly the same snapshot and locking becomes useless as the 2nd will still read an "old" snapshot. Could the predicates you want be inserted at compile time, but bound to actual values as some post processing after (or at the end of) {{Driver.acquireLocks()}} as currently implemented? > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, > HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18864) WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained after allocating writeId by current transaction.
[ https://issues.apache.org/jira/browse/HIVE-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387081#comment-16387081 ] Eugene Koifman commented on HIVE-18864: --- Alternatively, write_HWM should always be set to that which corresponds txn_HWM, rather than explicitly marking it 'open'. > WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained > after allocating writeId by current transaction. > --- > > Key: HIVE-18864 > URL: https://issues.apache.org/jira/browse/HIVE-18864 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Blocker > Labels: ACID > Fix For: 3.0.0 > > > For multi-statement txns, it is possible that write on a table happens after > a read. Let's see the below scenario. > # Committed txn=9 writes on table T1 with writeId=5. > # Open txn=10. ValidTxnList(open:null, txn_HWM=10), > # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5). > # Open txn=11, writes on table T1 with writeid=6. > # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5). > # Write table T1 from txn=10 with writeId=7. > # Read table T1 from txn=10. {color:#d04437}*ValidWriteIdList(open:null, > write_HWM=7)*. – This read will able to see rows added by txn=11 which is > still open.{color} > {color:#d04437}So, it is needed to rebuild the open/aborted list of > ValidWriteIdList based on txn_HWM. Any writeId allocated by txnId > txn_HWM > should be marked as open.{color} > {color:#33}{color:#d04437}cc{color}{color} > [~ekoifman]{color:#33},{color} [~thejas] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18749) Need to replace transactionId with writeId in RecordIdentifier and other relevant contexts.
[ https://issues.apache.org/jira/browse/HIVE-18749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387067#comment-16387067 ] Eugene Koifman commented on HIVE-18749: --- +1 > Need to replace transactionId with writeId in RecordIdentifier and other > relevant contexts. > --- > > Key: HIVE-18749 > URL: https://issues.apache.org/jira/browse/HIVE-18749 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Minor > Labels: ACID, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18749.01.patch, HIVE-18749.02.patch > > > Per table write ID implementation (HIVE-18192) have replaced global > transaction ID with write ID for the primary key for a row marked by > RecordIdentifier.Field..transactionId. > Need to replace the same with writeId and update all test results file. > Also, need to update other references (methods/variables) which currently > uses transactionId instead of writeId. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18860) fix TestAcidOnTez#testGetSplitsLocks
[ https://issues.apache.org/jira/browse/HIVE-18860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386630#comment-16386630 ] Eugene Koifman commented on HIVE-18860: --- cc [~sankarh] > fix TestAcidOnTez#testGetSplitsLocks > > > Key: HIVE-18860 > URL: https://issues.apache.org/jira/browse/HIVE-18860 > Project: Hive > Issue Type: Bug > Components: Test, Transactions >Reporter: Zoltan Haindrich >Priority: Major > > it seems to me that HIVE-18665 patch have broken this test > https://travis-ci.org/kgyrtkirk/hive/builds/345287889 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18860) fix TestAcidOnTez#testGetSplitsLocks
[ https://issues.apache.org/jira/browse/HIVE-18860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18860: -- Component/s: Transactions > fix TestAcidOnTez#testGetSplitsLocks > > > Key: HIVE-18860 > URL: https://issues.apache.org/jira/browse/HIVE-18860 > Project: Hive > Issue Type: Bug > Components: Test, Transactions >Reporter: Zoltan Haindrich >Priority: Major > > it seems to me that HIVE-18665 patch have broken this test > https://travis-ci.org/kgyrtkirk/hive/builds/345287889 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18712) Design HMS Api v2
[ https://issues.apache.org/jira/browse/HIVE-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18712: - Assignee: Eugene Koifman > Design HMS Api v2 > - > > Key: HIVE-18712 > URL: https://issues.apache.org/jira/browse/HIVE-18712 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0 >Reporter: Alexander Kolbasov >Assignee: Eugene Koifman >Priority: Major > > This is an umbrella Jira covering the design of Hive Metastore API v2. > It is supposed to be a placeholder for discussion and design documents. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18712) Design HMS Api v2
[ https://issues.apache.org/jira/browse/HIVE-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18712: - Assignee: Alexander Kolbasov (was: Eugene Koifman) > Design HMS Api v2 > - > > Key: HIVE-18712 > URL: https://issues.apache.org/jira/browse/HIVE-18712 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 3.0.0 >Reporter: Alexander Kolbasov >Assignee: Alexander Kolbasov >Priority: Major > > This is an umbrella Jira covering the design of Hive Metastore API v2. > It is supposed to be a placeholder for discussion and design documents. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18851) make Hive basic stats valid for ACID; clean up and refactor the code
[ https://issues.apache.org/jira/browse/HIVE-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18851: -- Component/s: Transactions > make Hive basic stats valid for ACID; clean up and refactor the code > > > Key: HIVE-18851 > URL: https://issues.apache.org/jira/browse/HIVE-18851 > Project: Hive > Issue Type: Bug > Components: Statistics, Transactions >Reporter: Sergey Shelukhin >Priority: Major > Labels: ACID > > HIVE-18571 started as a couple small fixes for MM tables, but ended up as a > somewhat major cleanup of stats for ACID tables; however it doesn't do that > rigorously and not for all cases. > This is a follow-up JIRA to implement stats for ACID properly (potentially > also with ACID semantics similar to those of queries, but that could be > another follow-up - for now, at least they should be based on the correct set > of files). > Overall I've discovered that Hive stats code is spread all over in random > places in code base and is brittle and inconsistent, esp. for any complex > scenario like ACID tables. > So, instead of making ad-hoc fixes everywhere, I think at the minimum it > should be moved to a single spot (so that e.g. BasicStatsTask, > BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same > code with the same logic) and made valid for ACID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18851) make Hive basic stats valid for ACID; clean up and refactor the code
[ https://issues.apache.org/jira/browse/HIVE-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18851: -- Component/s: Statistics > make Hive basic stats valid for ACID; clean up and refactor the code > > > Key: HIVE-18851 > URL: https://issues.apache.org/jira/browse/HIVE-18851 > Project: Hive > Issue Type: Bug > Components: Statistics, Transactions >Reporter: Sergey Shelukhin >Priority: Major > Labels: ACID > > HIVE-18571 started as a couple small fixes for MM tables, but ended up as a > somewhat major cleanup of stats for ACID tables; however it doesn't do that > rigorously and not for all cases. > This is a follow-up JIRA to implement stats for ACID properly (potentially > also with ACID semantics similar to those of queries, but that could be > another follow-up - for now, at least they should be based on the correct set > of files). > Overall I've discovered that Hive stats code is spread all over in random > places in code base and is brittle and inconsistent, esp. for any complex > scenario like ACID tables. > So, instead of making ad-hoc fixes everywhere, I think at the minimum it > should be moved to a single spot (so that e.g. BasicStatsTask, > BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same > code with the same logic) and made valid for ACID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18723) CompactorOutputCommitter.commitJob() - check rename() ret val
[ https://issues.apache.org/jira/browse/HIVE-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384118#comment-16384118 ] Eugene Koifman commented on HIVE-18723: --- unfortunately Jenkins output is gone. Are you sure org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorWithOpenInMiddle (batchId=268) org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorWithOpenInMiddle (batchId=268) are not related? > CompactorOutputCommitter.commitJob() - check rename() ret val > - > > Key: HIVE-18723 > URL: https://issues.apache.org/jira/browse/HIVE-18723 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Kryvenko Igor >Priority: Major > Attachments: HIVE-18723.1.patch, HIVE-18723.2.patch, HIVE-18723.patch > > > right now ret val is ignored {{fs.rename(fileStatus.getPath(), newPath); }} > Should this use {{FileUtils.ename(FileSystem fs, Path sourcePath, Path > destPath, Configuration conf) }} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18817: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Jason for the review > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, > HIVE-18817.03.patch, HIVE-18817.04.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18845) SHOW COMAPCTIONS should show host name
[ https://issues.apache.org/jira/browse/HIVE-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18845: - > SHOW COMAPCTIONS should show host name > -- > > Key: HIVE-18845 > URL: https://issues.apache.org/jira/browse/HIVE-18845 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > > once the job starts, the WorkerId includes the hostname submitting the job > but before that there is no way to tell which of the Metastores in HA set up > has picked up a given item to compact. Should make it obvious to know which > log to look at. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables
[ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18814: -- Attachment: HIVE-18814.wip.patch > Support Add Partition For Acid tables > - > > Key: HIVE-18814 > URL: https://issues.apache.org/jira/browse/HIVE-18814 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18814.wip.patch > > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] > Add Partition command creates a {{Partition}} metadata object and sets the > location to the directory containing data files. > In current master (Hive 3.0), Add partition on an acid table doesn't fail and > at read time the data is decorated with row__id but the original transaction > is 0. I suspect in earlier Hive versions this will throw or return no data. > > One option is follow Load Data approach and create a new delta_x_x/ and > move/copy the data there. > > Another is to allocate a new writeid and save it in Partition metadata. This > could then be used to decorate data with ROW__IDs. This avoids move/copy but > retains data "outside" of the table tree which make it more likely that this > data will be modified in some way which can really break things if done after > and SQL update/delete on this data have happened. > > It performs no validations on add (except for partition spec) so any file > with any format can be added. It allows add to bucketed tables as well. > Seems like a very dangerous command. Maybe a better option is to block it > and advise using Load Data. Alternatively, make this do Add partition > metadata op followed by Load Data. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables
[ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18814: -- Description: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] Add Partition command creates a {{Partition}} metadata object and sets the location to the directory containing data files. In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time the data is decorated with row__id but the original transaction is 0. I suspect in earlier Hive versions this will throw or return no data. One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data there. Another is to allocate a new writeid and save it in Partition metadata. This could then be used to decorate data with ROW__IDs. This avoids move/copy but retains data "outside" of the table tree which make it more likely that this data will be modified in some way which can really break things if done after and SQL update/delete on this data have happened. It performs no validations on add (except for partition spec) so any file with any format can be added. It allows add to bucketed tables as well. Seems like a very dangerous command. Maybe a better option is to block it and advise using Load Data. Alternatively, make this do Add partition metadata op followed by Load Data. was: [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] Add Partition command creates a \{{Partition}} metadata object and set the location to the directory containing data files. In current master (Hive 3.0), Add partition on an acid table doesn't fail and at read time the data is decorated with row__id but the original transaction is 0. I suspect in earlier Hive versions this will throw or return no data. One option is follow Load Data approach and create a new delta_x_x/ and move/copy the data there. Another is to allocate a new writeid and save it in Partition metadata. This could then be used to decorate data with ROW__IDs. This avoids move/copy but retains data "outside" of the table tree which make it more likely that this data will be modified in some way which can really break things if done after and SQL update/delete on this data have happened. > Support Add Partition For Acid tables > - > > Key: HIVE-18814 > URL: https://issues.apache.org/jira/browse/HIVE-18814 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] > Add Partition command creates a {{Partition}} metadata object and sets the > location to the directory containing data files. > In current master (Hive 3.0), Add partition on an acid table doesn't fail and > at read time the data is decorated with row__id but the original transaction > is 0. I suspect in earlier Hive versions this will throw or return no data. > > One option is follow Load Data approach and create a new delta_x_x/ and > move/copy the data there. > > Another is to allocate a new writeid and save it in Partition metadata. This > could then be used to decorate data with ROW__IDs. This avoids move/copy but > retains data "outside" of the table tree which make it more likely that this > data will be modified in some way which can really break things if done after > and SQL update/delete on this data have happened. > > It performs no validations on add (except for partition spec) so any file > with any format can be added. It allows add to bucketed tables as well. > Seems like a very dangerous command. Maybe a better option is to block it > and advise using Load Data. Alternatively, make this do Add partition > metadata op followed by Load Data. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382614#comment-16382614 ] Eugene Koifman commented on HIVE-18825: --- I think so - if we have no txn, we should not need ValidTxnList for query processin > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization
[ https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382548#comment-16382548 ] Eugene Koifman commented on HIVE-18825: --- this assumption is not valid. For one thing if you open the txn after you get the list, you won't be able to read your writes, .i.e. your own txn will be above the HWM that is set for you txn. The txn is currently opened right after parsing - do you need ValidTxnList earlier than that? > Define ValidTxnList before starting query optimization > -- > > Key: HIVE-18825 > URL: https://issues.apache.org/jira/browse/HIVE-18825 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez >Priority: Major > Attachments: HIVE-18825.01.patch, HIVE-18825.patch > > > Consider a set of tables used by a materialized view where inserts happened > after the materialization was created. To compute incremental view > maintenance, we need to be able to filter only new rows from those base > tables. That can be done by inserting a filter operator with condition e.g. > {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT > IN()}} on top of the MVs query definition and triggering the > rewriting (which should in turn produce a partial rewriting). However, to do > that, we need to have a value for {{ValidTxnList}} during query compilation > so we know the snapshot that we are querying. > This patch aims to generate {{ValidTxnList}} before query optimization. There > should not be any visible changes for end user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18817: -- Attachment: HIVE-18817.04.patch > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, > HIVE-18817.03.patch, HIVE-18817.04.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.
[ https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381384#comment-16381384 ] Eugene Koifman commented on HIVE-18824: --- seem reasonable. [~sankarh]? > ValidWriteIdList config should be defined on tables which has to collect > stats after insert. > > > Key: HIVE-18824 > URL: https://issues.apache.org/jira/browse/HIVE-18824 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sergey Shelukhin >Priority: Major > Labels: ACID, isolation > Fix For: 3.0.0 > > Attachments: HIVE-18824.patch > > > In HIVE-18192 , per table write ID was introduced where snapshot isolation is > built using ValidWriteIdList on tables which are read with in a txn. > ReadEntity list is referred to decide which table is read within a txn. > For insert operation, table will be found only in WriteEntity, but the table > is read to collect stats. > So, it is needed to build the ValidWriteIdList for tables/partition part of > WriteEntity as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18817: -- Attachment: HIVE-18817.03.patch > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, > HIVE-18817.03.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381383#comment-16381383 ] Eugene Koifman commented on HIVE-18817: --- yes, you are right, we now write an index with 1 entry in a lot of case where we used to write an index w/o no entries patch3 updates the tests > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, > HIVE-18817.03.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.
[ https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381330#comment-16381330 ] Eugene Koifman commented on HIVE-18824: --- yes, that is basically the problem. > ValidWriteIdList config should be defined on tables which has to collect > stats after insert. > > > Key: HIVE-18824 > URL: https://issues.apache.org/jira/browse/HIVE-18824 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, isolation > Fix For: 3.0.0 > > > In HIVE-18192 , per table write ID was introduced where snapshot isolation is > built using ValidWriteIdList on tables which are read with in a txn. > ReadEntity list is referred to decide which table is read within a txn. > For insert operation, table will be found only in WriteEntity, but the table > is read to collect stats. > So, it is needed to build the ValidWriteIdList for tables/partition part of > WriteEntity as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18571) stats issues for MM tables
[ https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381315#comment-16381315 ] Eugene Koifman commented on HIVE-18571: --- I left some comments on RB. I think makes sense to wait for the fix to HIVE-18824 and clean up related code in this patch. Also this patch is full of todos that seem like they should be Jiras. > stats issues for MM tables > -- > > Key: HIVE-18571 > URL: https://issues.apache.org/jira/browse/HIVE-18571 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, > HIVE-18571.03.patch, HIVE-18571.patch > > > There are multiple stats aggregation issues with MM tables. > Some simple stats are double counted and some stats (simple stats) are > invalid for ACID table dirs altogether. > I have a patch almost ready, need to fix some more stuff and clean up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18750) Exchange partition should be disabled on ACID/Insert-only tables with per table write ID.
[ https://issues.apache.org/jira/browse/HIVE-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381107#comment-16381107 ] Eugene Koifman commented on HIVE-18750: --- +1 assuming tests are good > Exchange partition should be disabled on ACID/Insert-only tables with per > table write ID. > - > > Key: HIVE-18750 > URL: https://issues.apache.org/jira/browse/HIVE-18750 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DDL, TODOC3.0, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18750.01.patch, HIVE-18750.02.patch > > > Per table write id implementation (HIVE-18192) have introduced write ID per > table and used write ID to name the delta/base files and also as primary key > for each row. > Now, exchange partition have to move delta/base files across tables without > changing the write ID which causes incorrect results. > Also, this exchange partition feature is there to support the use-case of > atomic updates. But with ACID updates, we shall support atomic-updates and > hence it makes sense to not support exchange partition for ACID and MM tables. > The qtest file mm_exchangepartition.q test results to be updated after this > change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId
[ https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18158: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Gopal for the review > Remove OrcRawRecordMerger.ReaderPairAcid.statementId > > > Key: HIVE-18158 > URL: https://issues.apache.org/jira/browse/HIVE-18158 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch, > HIVE-18158.03.patch > > > * Need to get rid of this since we can always get this from the row > itself in Acid 2.0. > * For Acid 1.0, statementId == 0 in all deltas because both > multi-statement txns and > * Split Upate are only available in test mode so there is nothing can > create a > * deltas_x_x_M with M > 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380842#comment-16380842 ] Eugene Koifman commented on HIVE-18817: --- I don't think any of the failures are related but attaching the same patch again since ptest is not very stable lately > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18817: -- Attachment: HIVE-18817.02.patch > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379801#comment-16379801 ] Eugene Koifman edited comment on HIVE-18817 at 2/28/18 5:24 AM: patch1 fixes the issue The test case generates a file with 2 stripes. w/o the fix to OrcRecorUpdater.KeyIndexBuilder, the hive.acid.key.index in the file has 1 entry - with the fix it has 2 and no AIOOBException happens. todo: file an ORC ticket to fix this in ORC. [~jdere], [~prasanth_j] please review was (Author: ekoifman): patch1 fixes the issue todo: file an ORC ticket to fix this in ORC. > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18817: -- Status: Patch Available (was: Open) patch1 fixes the issue todo: file an ORC ticket to fix this in ORC. > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.
[ https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18817: -- Attachment: HIVE-18817.01.patch > ArrayIndexOutOfBounds exception during read of ACID table. > -- > > Key: HIVE-18817 > URL: https://issues.apache.org/jira/browse/HIVE-18817 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions >Reporter: Jason Dere >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18817.01.patch, repro.patch > > > Seeing some users hitting the following stack trace: > {noformat} > 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: java.io.IOException: > java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142) > at > org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66) > at > org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) > at > org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258) > at > org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193) > ... 19 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 7 > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388) > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255) > ... 20 more > {noformat} > Have a JUnit test that appears to produce a similar stack trace - looks like > this occurs if there is an OrcSplit of an ACID table where the split offset > is beyond the starting offset of the last stripe in the ORC file. > cc [~ekoifman] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18816) CREATE TABLE (ACID) doesn't work with TIMESTAMPLOCALTZ column type
[ https://issues.apache.org/jira/browse/HIVE-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18816: -- Component/s: Transactions > CREATE TABLE (ACID) doesn't work with TIMESTAMPLOCALTZ column type > -- > > Key: HIVE-18816 > URL: https://issues.apache.org/jira/browse/HIVE-18816 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Vineet Garg >Assignee: Jesus Camacho Rodriguez >Priority: Major > > *Reproducer* > {code:sql} > set hive.support.concurrency=true; > set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > CREATE TABLE table_acid(d int, tz timestamp with local time zone) > clustered by (d) into 2 buckets stored as orc TBLPROPERTIES > ('transactional'='true'); > {code} > *Error* > {code:sql} > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: > Unknown primitive type TIMESTAMPLOCALTZ > {code} > *Error stack* > {noformat} > org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: Unknown primitive type TIMESTAMPLOCALTZ > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:906) > ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4788) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:389) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2314) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1985) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1687) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1438) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1427) > [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) > [hive-cli-3.0.0-SNAPSHOT.jar:?] > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) > [hive-cli-3.0.0-SNAPSHOT.jar:?] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) > [hive-cli-3.0.0-SNAPSHOT.jar:?] > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) > [hive-cli-3.0.0-SNAPSHOT.jar:?] > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1345) > [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1319) > [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173) > [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) > [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59) > [test-classes/:?] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_101] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_101] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_101] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101] > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > [junit-4.11.jar:?] > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > [junit-4.11.jar:?] > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > [junit-4.11.jar:?] > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > [junit-4.11.jar:?] > at > org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92) > [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT] > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > [junit-4.11.jar:?] > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > [junit-4.11.jar:?] > at
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Prasanth for the review > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, > HIVE-18659.11.patch, HIVE-18659.12.patch, HIVE-18659.13.patch, > HIVE-18659.14.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18814) Support Add Partition For Acid tables
[ https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18814: - > Support Add Partition For Acid tables > - > > Key: HIVE-18814 > URL: https://issues.apache.org/jira/browse/HIVE-18814 > Project: Hive > Issue Type: New Feature > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions] > Add Partition command creates a \{{Partition}} metadata object and set the > location to the directory containing data files. > In current master (Hive 3.0), Add partition on an acid table doesn't fail and > at read time the data is decorated with row__id but the original transaction > is 0. I suspect in earlier Hive versions this will throw or return no data. > > One option is follow Load Data approach and create a new delta_x_x/ and > move/copy the data there. > > Another is to allocate a new writeid and save it in Partition metadata. This > could then be used to decorate data with ROW__IDs. This avoids move/copy but > retains data "outside" of the table tree which make it more likely that this > data will be modified in some way which can really break things if done after > and SQL update/delete on this data have happened. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails
[ https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18808: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Alan for the review > Make compaction more robust when stats update fails > --- > > Key: HIVE-18808 > URL: https://issues.apache.org/jira/browse/HIVE-18808 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.0.0 > > Attachments: HIVE-18808.01.patch > > > > {\{Worker.gatherStats()}} runs a "analyze table..." command to update stats > which requires SessionState. SessionState objects are cached in ThreadLocal. > If for some reason Session init fails, it may still get attached to the > thread which then causes a subsequent request that uses the same thread to > gather stats fail because it has a bad session object. HIVE-15658 describes > the same issue in a different context. > There is currently no way to recycle a session from outside HMS. > Failure to gather stats should not kill a compaction job which then prevents > Cleaner from running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails
[ https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18808: -- Description: {\{Worker.gatherStats()}} runs a "analyze table..." command to update stats which requires SessionState. SessionState objects are cached in ThreadLocal. If for some reason Session init fails, it may still get attached to the thread which then causes a subsequent request that uses the same thread to gather stats fail because it has a bad session object. HIVE-15658 describes the same issue in a different context. There is currently no way to recycle a session from outside HMS. Failure to gather stats should not kill a compaction job which then prevents Cleaner from running. was: Worker.gatherStats() runs a "analyze table..." command to update stats which requires SessionState. SessionState objects are cached in ThreadLocal. If for some reason Session init fails, it may still get attached to the thread which then causes a subsequent request that uses the same tread to gather stats fail because it has a bad session object. HIVE-15658 describes the same issue in a different context. There is currently no way to recycle a session from outside HMS. Failure to gather stats should not kill a compaction job which then prevents Cleaner from running. > Make compaction more robust when stats update fails > --- > > Key: HIVE-18808 > URL: https://issues.apache.org/jira/browse/HIVE-18808 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18808.01.patch > > > > {\{Worker.gatherStats()}} runs a "analyze table..." command to update stats > which requires SessionState. SessionState objects are cached in ThreadLocal. > If for some reason Session init fails, it may still get attached to the > thread which then causes a subsequent request that uses the same thread to > gather stats fail because it has a bad session object. HIVE-15658 describes > the same issue in a different context. > There is currently no way to recycle a session from outside HMS. > Failure to gather stats should not kill a compaction job which then prevents > Cleaner from running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId
[ https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379033#comment-16379033 ] Eugene Koifman commented on HIVE-18158: --- no related failures [~gopalv] could you review please > Remove OrcRawRecordMerger.ReaderPairAcid.statementId > > > Key: HIVE-18158 > URL: https://issues.apache.org/jira/browse/HIVE-18158 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch, > HIVE-18158.03.patch > > > * Need to get rid of this since we can always get this from the row > itself in Acid 2.0. > * For Acid 1.0, statementId == 0 in all deltas because both > multi-statement txns and > * Split Upate are only available in test mode so there is nothing can > create a > * deltas_x_x_M with M > 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.14.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, > HIVE-18659.11.patch, HIVE-18659.12.patch, HIVE-18659.13.patch, > HIVE-18659.14.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18750) Exchange partition should be disabled on ACID/Insert-only tables with per table write ID.
[ https://issues.apache.org/jira/browse/HIVE-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378983#comment-16378983 ] Eugene Koifman commented on HIVE-18750: --- I would make the error message tell the user to what to do next, for example mention Load Data > Exchange partition should be disabled on ACID/Insert-only tables with per > table write ID. > - > > Key: HIVE-18750 > URL: https://issues.apache.org/jira/browse/HIVE-18750 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DDL, TODOC3.0, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18750.01.patch > > > Per table write id implementation (HIVE-18192) have introduced write ID per > table and used write ID to name the delta/base files and also as primary key > for each row. > Now, exchange partition have to move delta/base files across tables without > changing the write ID which causes incorrect results. > Also, this exchange partition feature is there to support the use-case of > atomic updates. But with ACID updates, we shall support atomic-updates and > hence it makes sense to not support exchange partition for ACID and MM tables. > The qtest file mm_exchangepartition.q test results to be updated after this > change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.13.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, > HIVE-18659.11.patch, HIVE-18659.12.patch, HIVE-18659.13.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18808) Make compaction more robust when stats update fails
[ https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377981#comment-16377981 ] Eugene Koifman commented on HIVE-18808: --- no related failures [~alangates] could you review please > Make compaction more robust when stats update fails > --- > > Key: HIVE-18808 > URL: https://issues.apache.org/jira/browse/HIVE-18808 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18808.01.patch > > > > Worker.gatherStats() runs a "analyze table..." command to update stats which > requires SessionState. SessionState objects are cached in ThreadLocal. If > for some reason Session init fails, it may still get attached to the thread > which then causes a subsequent request that uses the same tread to gather > stats fail because it has a bad session object. HIVE-15658 describes the > same issue in a different context. > There is currently no way to recycle a session from outside HMS. > Failure to gather stats should not kill a compaction job which then prevents > Cleaner from running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID
[ https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377977#comment-16377977 ] Eugene Koifman commented on HIVE-18192: --- what property are you looking up? There is hive.txn.valid.txns and hive.txn.tables.valid.writeids > Introduce WriteID per table rather than using global transaction ID > --- > > Key: HIVE-18192 > URL: https://issues.apache.org/jira/browse/HIVE-18192 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DR, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, > HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, > HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, > HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, > HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, > HIVE-18192.15.patch, HIVE-18192.16.patch, HIVE-18192.17.patch > > > To support ACID replication, we will be introducing a per table write Id > which will replace the transaction id in the primary key for each row in a > ACID table. > The current primary key is determined via > > which will move to > > For each table modified by the given transaction will have a table level > write ID allocated and a persisted map of global txn id -> to table -> write > id for that table has to be maintained to allow Snapshot isolation. > Readers should use the combination of ValidTxnList and > ValidWriteIdList(Table) for snapshot isolation. > > [Hive Replication - ACID > Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf] > has a section "Per Table Sequences (Write-Id)" with more detials -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId
[ https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18158: -- Attachment: HIVE-18158.03.patch > Remove OrcRawRecordMerger.ReaderPairAcid.statementId > > > Key: HIVE-18158 > URL: https://issues.apache.org/jira/browse/HIVE-18158 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch, > HIVE-18158.03.patch > > > * Need to get rid of this since we can always get this from the row > itself in Acid 2.0. > * For Acid 1.0, statementId == 0 in all deltas because both > multi-statement txns and > * Split Upate are only available in test mode so there is nothing can > create a > * deltas_x_x_M with M > 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails
[ https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18808: -- Status: Patch Available (was: Open) > Make compaction more robust when stats update fails > --- > > Key: HIVE-18808 > URL: https://issues.apache.org/jira/browse/HIVE-18808 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18808.01.patch > > > > Worker.gatherStats() runs a "analyze table..." command to update stats which > requires SessionState. SessionState objects are cached in ThreadLocal. If > for some reason Session init fails, it may still get attached to the thread > which then causes a subsequent request that uses the same tread to gather > stats fail because it has a bad session object. HIVE-15658 describes the > same issue in a different context. > There is currently no way to recycle a session from outside HMS. > Failure to gather stats should not kill a compaction job which then prevents > Cleaner from running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails
[ https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18808: -- Attachment: HIVE-18808.01.patch > Make compaction more robust when stats update fails > --- > > Key: HIVE-18808 > URL: https://issues.apache.org/jira/browse/HIVE-18808 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18808.01.patch > > > > Worker.gatherStats() runs a "analyze table..." command to update stats which > requires SessionState. SessionState objects are cached in ThreadLocal. If > for some reason Session init fails, it may still get attached to the thread > which then causes a subsequent request that uses the same tread to gather > stats fail because it has a bad session object. HIVE-15658 describes the > same issue in a different context. > There is currently no way to recycle a session from outside HMS. > Failure to gather stats should not kill a compaction job which then prevents > Cleaner from running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (HIVE-18808) Make compaction more robust when stats update fails
[ https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18808: - > Make compaction more robust when stats update fails > --- > > Key: HIVE-18808 > URL: https://issues.apache.org/jira/browse/HIVE-18808 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > > Worker.gatherStats() runs a "analyze table..." command to update stats which > requires SessionState. SessionState objects are cached in ThreadLocal. If > for some reason Session init fails, it may still get attached to the thread > which then causes a subsequent request that uses the same tread to gather > stats fail because it has a bad session object. HIVE-15658 describes the > same issue in a different context. > There is currently no way to recycle a session from outside HMS. > Failure to gather stats should not kill a compaction job which then prevents > Cleaner from running. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18773) Support multiple instances of Cleaner
[ https://issues.apache.org/jira/browse/HIVE-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377278#comment-16377278 ] Eugene Koifman commented on HIVE-18773: --- once HIVE-18772 is in place, whey not have the Worker itself do a clean right after successful compaction and before stats gather. It improves parallelism and perhaps makes stats gather more accurate since the set of files on disk is more accurate wrt current state of the table. > Support multiple instances of Cleaner > - > > Key: HIVE-18773 > URL: https://issues.apache.org/jira/browse/HIVE-18773 > Project: Hive > Issue Type: Improvement > Components: Transactions >Reporter: Eugene Koifman >Priority: Major > > We support multiple Workers by making each Worker update the status of the > entry in COMPACTION_QUEUE to make sure only 1 worker grabs it. Once we have > HIVE-18772, Cleaner should not need any state we can easily have > 1 Cleaner > instance by introducing 1 more status type "being cleaned". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.12.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, > HIVE-18659.11.patch, HIVE-18659.12.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-15077) Acid LockManager is unfair
[ https://issues.apache.org/jira/browse/HIVE-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-15077: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) committed to master thanks Alan for the review > Acid LockManager is unfair > -- > > Key: HIVE-15077 > URL: https://issues.apache.org/jira/browse/HIVE-15077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Fix For: 3.0.0 > > Attachments: HIVE-15077.02.patch > > > HIVE-10242 made the acid LM unfair. > In TxnHandler.checkLock(), suppose we are trying to acquire SR5 (the number > is extLockId). > Then > LockInfo[] locks = lockSet.toArray(new LockInfo[lockSet.size()]); > may look like this (all explicitly listed locks are in Waiting state) > {, SR5 SW3 X4} > So the algorithm will find SR5 in the list and start looking backwards (to > the left). > According to IDs, SR5 should wait for X4 to be granted but X4 won't even be > examined and so SR5 may be granted. > Theoretically, this could cause starvation. > The query that generates the list already has > query.append(" and hl_lock_ext_id <= ").append(extLockId); > but it should use "<" rather than "<=" to exclude the locks being checked > from "locks" list which will make the algorithm look at all locks "in front" > of a given lock. > Here is an example (add to TestDbTxnManager2) > {noformat} > @Test > public void testFairness2() throws Exception { > dropTable(new String[]{"T7"}); > CommandProcessorResponse cpr = driver.run("create table if not exists T7 > (a int) partitioned by (p int) stored as orc TBLPROPERTIES > ('transactional'='true')"); > checkCmdOnDriver(cpr); > checkCmdOnDriver(driver.run("insert into T7 partition(p) > values(1,1),(1,2)"));//create 2 partitions > cpr = driver.compileAndRespond("select a from T7 "); > checkCmdOnDriver(cpr); > txnMgr.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T7 > HiveTxnManager txnMgr2 = > TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf); > swapTxnManager(txnMgr2); > cpr = driver.compileAndRespond("alter table T7 drop partition (p=1)"); > checkCmdOnDriver(cpr); > //tries to get X lock on T7.p=1 and gets Waiting state > LockState lockState = ((DbTxnManager) > txnMgr2).acquireLocks(driver.getPlan(), ctx, "Fiddler", false); > List locks = getLocks(); > Assert.assertEquals("Unexpected lock count", 4, locks.size()); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > null, locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=1", locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=2", locks); > checkLock(LockType.EXCLUSIVE, LockState.WAITING, "default", "T7", "p=1", > locks); > HiveTxnManager txnMgr3 = > TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf); > swapTxnManager(txnMgr3); > //this should block behind the X lock on T7.p=1 > cpr = driver.compileAndRespond("select a from T7"); > checkCmdOnDriver(cpr); > txnMgr3.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T6 > locks = getLocks(); > Assert.assertEquals("Unexpected lock count", 7, locks.size()); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > null, locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=1", locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=2", locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > null, locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=1", locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=2", locks); > checkLock(LockType.EXCLUSIVE, LockState.WAITING, "default", "T7", "p=1", > locks); > } > {noformat} > The 2nd {{locks = getLocks();}} output shows that all locks for the 2nd > {{select * from T7}} are all acquired while they should block behind the X > lock to be fair. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId
[ https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18158: -- Attachment: HIVE-18158.02.patch > Remove OrcRawRecordMerger.ReaderPairAcid.statementId > > > Key: HIVE-18158 > URL: https://issues.apache.org/jira/browse/HIVE-18158 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch > > > * Need to get rid of this since we can always get this from the row > itself in Acid 2.0. > * For Acid 1.0, statementId == 0 in all deltas because both > multi-statement txns and > * Split Upate are only available in test mode so there is nothing can > create a > * deltas_x_x_M with M > 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.11.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, > HIVE-18659.11.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.10.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId
[ https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18158: -- Status: Patch Available (was: Open) > Remove OrcRawRecordMerger.ReaderPairAcid.statementId > > > Key: HIVE-18158 > URL: https://issues.apache.org/jira/browse/HIVE-18158 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-18158.01.patch > > > * Need to get rid of this since we can always get this from the row > itself in Acid 2.0. > * For Acid 1.0, statementId == 0 in all deltas because both > multi-statement txns and > * Split Upate are only available in test mode so there is nothing can > create a > * deltas_x_x_M with M > 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId
[ https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18158: -- Attachment: HIVE-18158.01.patch > Remove OrcRawRecordMerger.ReaderPairAcid.statementId > > > Key: HIVE-18158 > URL: https://issues.apache.org/jira/browse/HIVE-18158 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Minor > Attachments: HIVE-18158.01.patch > > > * Need to get rid of this since we can always get this from the row > itself in Acid 2.0. > * For Acid 1.0, statementId == 0 in all deltas because both > multi-statement txns and > * Split Upate are only available in test mode so there is nothing can > create a > * deltas_x_x_M with M > 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID
[ https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373704#comment-16373704 ] Eugene Koifman commented on HIVE-18192: --- I added a couple of nits on pull request for patch 16 - can be done in a followup. +1 for patch 16 pending tests > Introduce WriteID per table rather than using global transaction ID > --- > > Key: HIVE-18192 > URL: https://issues.apache.org/jira/browse/HIVE-18192 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DR, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, > HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, > HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, > HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, > HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, > HIVE-18192.15.patch, HIVE-18192.16.patch > > > To support ACID replication, we will be introducing a per table write Id > which will replace the transaction id in the primary key for each row in a > ACID table. > The current primary key is determined via > > which will move to > > For each table modified by the given transaction will have a table level > write ID allocated and a persisted map of global txn id -> to table -> write > id for that table has to be maintained to allow Snapshot isolation. > Readers should use the combination of ValidTxnList and > ValidWriteIdList(Table) for snapshot isolation. > > [Hive Replication - ACID > Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf] > has a section "Per Table Sequences (Write-Id)" with more detials -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-15077) Acid LockManager is unfair
[ https://issues.apache.org/jira/browse/HIVE-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373307#comment-16373307 ] Eugene Koifman commented on HIVE-15077: --- I did make it unfair in HIVE-10242 but that was not intentional. (To ensure fairness the requested lock is only checked against locks with smaller extLockId. The problem was that HIVE-10242 made it such that not all locks with smaller extLockId that should have been checked, were checked.) I had to sort locks by weight in HIVE-10242 because the jumpTable doesn't have any info about the resource and so the logic that says "if looking to get S lock and see acquired S lock in front, acquire" doesn't work because the S lock in front may be on a different resource. The issue this caused is demonstrated in \{{testFairness2()}} in the Description of this ticket. So I'm aiming for a lock manager that is fair, correct and not more strict than necessary. The last part is a work in progress. What I think we really need is the use of Intention locks. That way what you are suggesting is possible. Right now we just "infer" a lock (which is not physically there) up/down the resource hierarchy based on the lock that is actually asked for (and of the same type). This way you'd only have to compare locks on resources with the same path. This is a bigger change. > Acid LockManager is unfair > -- > > Key: HIVE-15077 > URL: https://issues.apache.org/jira/browse/HIVE-15077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-15077.02.patch > > > HIVE-10242 made the acid LM unfair. > In TxnHandler.checkLock(), suppose we are trying to acquire SR5 (the number > is extLockId). > Then > LockInfo[] locks = lockSet.toArray(new LockInfo[lockSet.size()]); > may look like this (all explicitly listed locks are in Waiting state) > {, SR5 SW3 X4} > So the algorithm will find SR5 in the list and start looking backwards (to > the left). > According to IDs, SR5 should wait for X4 to be granted but X4 won't even be > examined and so SR5 may be granted. > Theoretically, this could cause starvation. > The query that generates the list already has > query.append(" and hl_lock_ext_id <= ").append(extLockId); > but it should use "<" rather than "<=" to exclude the locks being checked > from "locks" list which will make the algorithm look at all locks "in front" > of a given lock. > Here is an example (add to TestDbTxnManager2) > {noformat} > @Test > public void testFairness2() throws Exception { > dropTable(new String[]{"T7"}); > CommandProcessorResponse cpr = driver.run("create table if not exists T7 > (a int) partitioned by (p int) stored as orc TBLPROPERTIES > ('transactional'='true')"); > checkCmdOnDriver(cpr); > checkCmdOnDriver(driver.run("insert into T7 partition(p) > values(1,1),(1,2)"));//create 2 partitions > cpr = driver.compileAndRespond("select a from T7 "); > checkCmdOnDriver(cpr); > txnMgr.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T7 > HiveTxnManager txnMgr2 = > TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf); > swapTxnManager(txnMgr2); > cpr = driver.compileAndRespond("alter table T7 drop partition (p=1)"); > checkCmdOnDriver(cpr); > //tries to get X lock on T7.p=1 and gets Waiting state > LockState lockState = ((DbTxnManager) > txnMgr2).acquireLocks(driver.getPlan(), ctx, "Fiddler", false); > List locks = getLocks(); > Assert.assertEquals("Unexpected lock count", 4, locks.size()); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > null, locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=1", locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=2", locks); > checkLock(LockType.EXCLUSIVE, LockState.WAITING, "default", "T7", "p=1", > locks); > HiveTxnManager txnMgr3 = > TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf); > swapTxnManager(txnMgr3); > //this should block behind the X lock on T7.p=1 > cpr = driver.compileAndRespond("select a from T7"); > checkCmdOnDriver(cpr); > txnMgr3.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T6 > locks = getLocks(); > Assert.assertEquals("Unexpected lock count", 7, locks.size()); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > null, locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=1", locks); > checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", > "p=2", locks); > checkLock(LockType.SHARED_READ,
[jira] [Assigned] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL
[ https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman reassigned HIVE-18772: - > Make Acid Cleaner use MIN_HISTORY_LEVEL > --- > > Key: HIVE-18772 > URL: https://issues.apache.org/jira/browse/HIVE-18772 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > > Instead of using Lock Manager state as it currently does. > This will eliminate possible race conditions -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL.
[ https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18747: -- Summary: Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL. (was: Cleaner for TXN_TO_WRITE_ID table entries.) > Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL. > > > Key: HIVE-18747 > URL: https://issues.apache.org/jira/browse/HIVE-18747 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Minor > Labels: ACID > Fix For: 3.0.0 > > > Per table write ID implementation (HIVE-18192) maintains a map between txn ID > and table write ID in TXN_TO_WRITE_ID meta table. > The entries in this table is used to generate ValidWriteIdList for the given > ValidTxnList to ensure snapshot isolation. > When table or database is dropped, then these entries are cleaned-up. But, it > is necessary to clean-up for active tables too for better performance. > Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which > is referred by any active ValidTxnList snapshot as open/aborted txn. If no > references found in this table for any txn, then it is eligible for cleanup. > After clean-up, need to maintain just one entry per table to mark as LWM (low > water mark). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.09.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch, HIVE-18659.09.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372145#comment-16372145 ] Eugene Koifman commented on HIVE-18659: --- patch 9 fixes tests output and checkstyle > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.09.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, > HIVE-18659.09.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370826#comment-16370826 ] Eugene Koifman commented on HIVE-18659: --- {\{createdDeltaDirs.add(deltaDest)}} is how it was before this patch. I'm not sure what the original intent was patch 7 to see check style/tests - the above links got recycled > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories
[ https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18659: -- Attachment: HIVE-18659.07.patch > add acid version marker to acid files/directories > - > > Key: HIVE-18659 > URL: https://issues.apache.org/jira/browse/HIVE-18659 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, > HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch > > > add acid version marker to acid files so that we know which version of acid > wrote the file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18742) Vectorization acid/inputformat check should allow NullRowsInputFormat/OneNullRowInputFormat
[ https://issues.apache.org/jira/browse/HIVE-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18742: -- Affects Version/s: 2.0.0 > Vectorization acid/inputformat check should allow > NullRowsInputFormat/OneNullRowInputFormat > --- > > Key: HIVE-18742 > URL: https://issues.apache.org/jira/browse/HIVE-18742 > Project: Hive > Issue Type: Bug > Components: Transactions, Vectorization >Affects Versions: 2.0.0 >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18742.1.patch, HIVE-18742.2.patch > > > Vectorizer.verifyAndSetVectorPartDesc() has a Preconditions check to ensure > the InputFormat is ORC only. However there can be metadataonly or empty > result optimizations on Acid tables, which change the input format to > NullRows/OneNullRowInputFormat, which gets tripped up on this check. > Relaxing this check to allow nullrows and onenullrow input formats. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID
[ https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370291#comment-16370291 ] Eugene Koifman commented on HIVE-18192: --- is {{org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_exchangepartition]}} a related failure? > Introduce WriteID per table rather than using global transaction ID > --- > > Key: HIVE-18192 > URL: https://issues.apache.org/jira/browse/HIVE-18192 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: anishek >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DR, pull-request-available > Fix For: 3.0.0 > > Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, > HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, > HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, > HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, > HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch > > > To support ACID replication, we will be introducing a per table write Id > which will replace the transaction id in the primary key for each row in a > ACID table. > The current primary key is determined via > > which will move to > > For each table modified by the given transaction will have a table level > write ID allocated and a persisted map of global txn id -> to table -> write > id for that table has to be maintained to allow Snapshot isolation. > Readers should use the combination of ValidTxnList and > ValidWriteIdList(Table) for snapshot isolation. > > [Hive Replication - ACID > Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf] > has a section "Per Table Sequences (Write-Id)" with more detials -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18751) ACID table scan through get_splits UDF doesn't receive ValidWriteIdList configuration.
[ https://issues.apache.org/jira/browse/HIVE-18751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370279#comment-16370279 ] Eugene Koifman commented on HIVE-18751: --- [~jdere] do you have any input here? [~sankarh] should this be a blocker for HIVE-18192? > ACID table scan through get_splits UDF doesn't receive ValidWriteIdList > configuration. > -- > > Key: HIVE-18751 > URL: https://issues.apache.org/jira/browse/HIVE-18751 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, UDF > Fix For: 3.0.0 > > > Per table write ID (HIVE-18192) have replaced global transaction ID with > write ID to version data files in ACID/MM tables, > To ensure snapshot isolation, need to generate ValidWriteIdList for the given > txn/table and use it when scan the ACID/MM tables. > In case of get_splits UDF which runs on ACID table scan query won't receive > it properly through configuration (hive.txn.tables.valid.writeids) and hence > throws exception. > TestAcidOnTez.testGetSplitsLocks is the test failing for the same. Need to > fix it. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18750) Exchange partition should not be supported with per table write ID.
[ https://issues.apache.org/jira/browse/HIVE-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370272#comment-16370272 ] Eugene Koifman commented on HIVE-18750: --- Also, Load Data is supported for Acid tables - this can be used as an instant batch upload. > Exchange partition should not be supported with per table write ID. > --- > > Key: HIVE-18750 > URL: https://issues.apache.org/jira/browse/HIVE-18750 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DDL > Fix For: 3.0.0 > > > Per table write id implementation (HIVE-18192) have introduced write ID per > table and used write ID to name the delta/base files and also as primary key > for each row. > Now, exchange partition have to move delta/base files across tables without > changing the write ID which causes incorrect results. > Also, this exchange partition feature is there to support the use-case of > atomic updates. But with ACID updates, we shall support atomic-updates and > hence it makes sense to not support exchange partition for ACID and MM tables. > The qtest file mm_exchangepartition.q test results to be updated after this > change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HIVE-18748) Rename table should update the table names in NEXT_WRITE_ID and TXN_TO_WRITE_ID tables.
[ https://issues.apache.org/jira/browse/HIVE-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370268#comment-16370268 ] Eugene Koifman commented on HIVE-18748: --- How does "rename" get surfaced to the end user? Via Alter Table? I don't think there is anything else anywhere in the acid system that handles rename of db.table value. This probably needs to be a comprehensive change. (Or we can explore using table ID of some sort) > Rename table should update the table names in NEXT_WRITE_ID and > TXN_TO_WRITE_ID tables. > > > Key: HIVE-18748 > URL: https://issues.apache.org/jira/browse/HIVE-18748 > Project: Hive > Issue Type: Sub-task > Components: HiveServer2, Transactions >Affects Versions: 3.0.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan >Priority: Major > Labels: ACID, DDL > Fix For: 3.0.0 > > > Per table write ID implementation (HIVE-18192) introduces couple of > metatables such as NEXT_WRITE_ID and TXN_TO_WRITE_ID to manage write ids > allocated per table. > Now, when we rename any tables, it is necessary to update the corresponding > table names in these table as well. Otherwise, ACID table operations won't > work properly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18742) Vectorization acid/inputformat check should allow NullRowsInputFormat/OneNullRowInputFormat
[ https://issues.apache.org/jira/browse/HIVE-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18742: -- Component/s: Transactions > Vectorization acid/inputformat check should allow > NullRowsInputFormat/OneNullRowInputFormat > --- > > Key: HIVE-18742 > URL: https://issues.apache.org/jira/browse/HIVE-18742 > Project: Hive > Issue Type: Bug > Components: Transactions, Vectorization >Reporter: Jason Dere >Assignee: Jason Dere >Priority: Major > Attachments: HIVE-18742.1.patch, HIVE-18742.2.patch > > > Vectorizer.verifyAndSetVectorPartDesc() has a Preconditions check to ensure > the InputFormat is ORC only. However there can be metadataonly or empty > result optimizations on Acid tables, which change the input format to > NullRows/OneNullRowInputFormat, which gets tripped up on this check. > Relaxing this check to allow nullrows and onenullrow input formats. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table
[ https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-18739: -- Attachment: HIVE-18739.01.patch > Add support for Export from unpartitioned Acid table > > > Key: HIVE-18739 > URL: https://issues.apache.org/jira/browse/HIVE-18739 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Attachments: HIVE-18739.01.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)