[jira] [Updated] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL

2018-03-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18675:
--
Component/s: Metastore

> make HIVE_LOCKS.HL_TXNID NOT NULL
> -
>
> Key: HIVE-18675
> URL: https://issues.apache.org/jira/browse/HIVE-18675
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Kryvenko Igor
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch
>
>
> In Hive 3.0 all statements that may need locks run in a transaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL

2018-03-12 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18675:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Igor for the contribution

> make HIVE_LOCKS.HL_TXNID NOT NULL
> -
>
> Key: HIVE-18675
> URL: https://issues.apache.org/jira/browse/HIVE-18675
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Kryvenko Igor
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch
>
>
> In Hive 3.0 all statements that may need locks run in a transaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL

2018-03-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395669#comment-16395669
 ] 

Eugene Koifman commented on HIVE-18675:
---

[~vbeshka], makes sense.

+1

> make HIVE_LOCKS.HL_TXNID NOT NULL
> -
>
> Key: HIVE-18675
> URL: https://issues.apache.org/jira/browse/HIVE-18675
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Kryvenko Igor
>Priority: Major
> Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch
>
>
> In Hive 3.0 all statements that may need locks run in a transaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18675) make HIVE_LOCKS.HL_TXNID NOT NULL

2018-03-12 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16395256#comment-16395256
 ] 

Eugene Koifman commented on HIVE-18675:
---

why are there both 
metastore/scripts/upgrade/derby/upgrade-2.3.0-to-3.0.0.derby.sql and 
standalone-metastore/src/main/sql/derby/upgrade-2.3.0-to-3.0.0.derby.sql?  
should one of these be removed?
cc [~alangates]

> make HIVE_LOCKS.HL_TXNID NOT NULL
> -
>
> Key: HIVE-18675
> URL: https://issues.apache.org/jira/browse/HIVE-18675
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Kryvenko Igor
>Priority: Major
> Attachments: HIVE-18675.01.patch, HIVE-18675.02.patch
>
>
> In Hive 3.0 all statements that may need locks run in a transaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HIVE-18662) hive.acid.key.index is missing entries

2018-03-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman resolved HIVE-18662.
---
   Resolution: Fixed
 Assignee: Eugene Koifman
Fix Version/s: 3.0.0

fixed in HIVE-18817

> hive.acid.key.index is missing entries
> --
>
> Key: HIVE-18662
> URL: https://issues.apache.org/jira/browse/HIVE-18662
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
>
> OrcRecordUpdater.KeyIndexBuilder stores an index in ORC footer where each 
> entry is the last ROW__ID of each stripe.  In acid1 this is used to filter 
> the events from delta file when merging with part of the base.
>  
> as can be seen in {{TestTxnCommands.testVersioning()}} (added in HIVE-18659) 
> the {{hive.acid.key.index}} is empty.  
>  
> This is because very little data is written and WriterImpl.flushStripe() is 
> not called except when {
> {WriterImpl.close()}
> is called.  In the later, {{WriterCallback.preFooterWrite()}} is called 
> before {{preStripeWrite}} and so KeyIndexBuilder.preFooterWriter() records 
> nothing in \{{hive.acid.key.index}}
>  
> need to investigate if this is an issue, in particular acid 2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.06.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch, HIVE-18739.06.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-17206) make a version of Compactor specific to unbucketed tables

2018-03-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17206:
--
Description: 
current Compactor will work but is not optimized/flexible enough

The current compactor is designed to generate the number of splits equal to the 
number of buckets in the table.   That is the degree of parallelism.

For unbucketed tables, the same is used but the "number of buckets" is derived 
from the files found in the deltas.  For small writes, there will likely be 
just 1 bucket_0 file.  For large writes, the parallelism of the write 
determines the number of output files.

Need to make sure Compactor can control parallelism for unbucketed tables as it 
wishes.  For example, hash partition all records (by ROW__ID?) into N disjoint 
sets.


  was:current Compactor will work but is not optimized/flexible enough


> make a version of Compactor specific to unbucketed tables
> -
>
> Key: HIVE-17206
> URL: https://issues.apache.org/jira/browse/HIVE-17206
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> current Compactor will work but is not optimized/flexible enough
> The current compactor is designed to generate the number of splits equal to 
> the number of buckets in the table.   That is the degree of parallelism.
> For unbucketed tables, the same is used but the "number of buckets" is 
> derived from the files found in the deltas.  For small writes, there will 
> likely be just 1 bucket_0 file.  For large writes, the parallelism of the 
> write determines the number of output files.
> Need to make sure Compactor can control parallelism for unbucketed tables as 
> it wishes.  For example, hash partition all records (by ROW__ID?) into N 
> disjoint sets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18773) Support multiple instances of Cleaner

2018-03-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18773:
-

Assignee: Eugene Koifman

> Support multiple instances of Cleaner
> -
>
> Key: HIVE-18773
> URL: https://issues.apache.org/jira/browse/HIVE-18773
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> We support multiple Workers by making each Worker update the status of the 
> entry in COMPACTION_QUEUE to make sure only 1 worker grabs it.  Once we have 
> HIVE-18772, Cleaner should not need any state we can easily have  > 1 Cleaner 
> instance by introducing 1 more status type "being cleaned".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted

2018-03-09 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393381#comment-16393381
 ] 

Eugene Koifman commented on HIVE-18693:
---

[~steveyeom2017], I think the code changes are fine in general but the test is 
missing a few checks.  Left some comments in RB.

> Snapshot Isolation does not work for Micromanaged table when a insert 
> transaction is aborted
> 
>
> Key: HIVE-18693
> URL: https://issues.apache.org/jira/browse/HIVE-18693
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, 
> HIVE-18693.03.patch, HIVE-18693.04.patch
>
>
> TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
> changes (changing delete command to insert command) fails on MM table.
> Specifically the last SELECT commands returns wrong results. 
> But this test works fine with full ACID table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18693) Snapshot Isolation does not work for Micromanaged table when a insert transaction is aborted

2018-03-09 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393189#comment-16393189
 ] 

Eugene Koifman commented on HIVE-18693:
---

[~steveyeom2017] could you update RB with latest patch

> Snapshot Isolation does not work for Micromanaged table when a insert 
> transaction is aborted
> 
>
> Key: HIVE-18693
> URL: https://issues.apache.org/jira/browse/HIVE-18693
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Major
> Attachments: HIVE-18693.01.patch, HIVE-18693.02.patch, 
> HIVE-18693.03.patch, HIVE-18693.04.patch
>
>
> TestTxnCommands2#writeBetweenWorkerAndCleaner with minor 
> changes (changing delete command to insert command) fails on MM table.
> Specifically the last SELECT commands returns wrong results. 
> But this test works fine with full ACID table. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18918:
--
  Resolution: Fixed
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0
  Status: Resolved  (was: Patch Available)

committed to master
thanks Jason for the review

> Bad error message in CompactorMR.lanuchCompactionJob()
> --
>
> Key: HIVE-18918
> URL: https://issues.apache.org/jira/browse/HIVE-18918
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18918.01.patch
>
>
> {noformat}
>   rj.waitForCompletion();
>   if (!rj.isSuccessful()) {
> throw new IOException(compactionType == CompactionType.MAJOR ? 
> "Major" : "Minor" +
>" compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID());
>   }
> {noformat}
> produces no useful info in case of Major compaction
> {noformat}
> 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker 
> (Worker.java:run(191)) - Caught exception while trying to compact 
> id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\
> ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.io.IOException: Ma\
> jor
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18723) CompactorOutputCommitter.commitJob() - check rename() ret val

2018-03-09 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393134#comment-16393134
 ] 

Eugene Koifman commented on HIVE-18723:
---

[~vbeshka], I don't understand the logic.  If B exists, rename will create B/A. 
 The delete you added will delete A (in B/A) so B will not have the results of 
the compaction.

> CompactorOutputCommitter.commitJob() - check rename() ret val
> -
>
> Key: HIVE-18723
> URL: https://issues.apache.org/jira/browse/HIVE-18723
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Kryvenko Igor
>Priority: Major
> Attachments: HIVE-18723.1.patch, HIVE-18723.2.patch, 
> HIVE-18723.3.patch, HIVE-18723.patch
>
>
> right now ret val is ignored {{fs.rename(fileStatus.getPath(), newPath); }}
> Should this use {{FileUtils.ename(FileSystem fs, Path sourcePath, Path 
> destPath, Configuration conf) }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18571) stats issues for MM tables; ACID doesn't check state for CTAS

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18571:
--
Component/s: Transactions

> stats issues for MM tables; ACID doesn't check state for CTAS
> -
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.03.patch, HIVE-18571.04.patch, HIVE-18571.05.patch, 
> HIVE-18571.06.patch, HIVE-18571.07.patch, HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18911) LOAD.. code for MM has some suspect/dead code

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18911:
--
Component/s: Transactions

> LOAD.. code for MM has some suspect/dead code
> -
>
> Key: HIVE-18911
> URL: https://issues.apache.org/jira/browse/HIVE-18911
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Sergey Shelukhin
>Priority: Major
>
> Discovered in HIVE-18571 and added TODO-s that need to be addressed.
> E.g. {noformat}
> if (isMmTableWrite) {
>// We will load into MM directory, and delete from the parent if 
> needed.
>   // TODO: this looks invalid after ACID integration. What about base 
> dirs?
>destPath = new Path(destPath, AcidUtils.deltaSubdir(writeId, 
> writeId, stmtId));
> ...
>  // TODO: loadFileType for MM table will no longer be REPLACE_ALL
>filter = (loadFileType == LoadFileType.REPLACE_ALL)
> {noformat}
> 2 places like that
> Also replaceFiles has isMmTableWrite flag that should no longer be needed 
> (since for a transactional table we should never replace files). Either 
> there's some invalid code path that relies on it (load table?), or it is just 
> unused and needs to be removed.
> Also used in 2 places, "TODO: this should never run for MM tables anymore. 
> Remove the flag, and maybe the filter?"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16392219#comment-16392219
 ] 

Eugene Koifman commented on HIVE-18918:
---

[~jdere] could you review please

> Bad error message in CompactorMR.lanuchCompactionJob()
> --
>
> Key: HIVE-18918
> URL: https://issues.apache.org/jira/browse/HIVE-18918
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18918.01.patch
>
>
> {noformat}
>   rj.waitForCompletion();
>   if (!rj.isSuccessful()) {
> throw new IOException(compactionType == CompactionType.MAJOR ? 
> "Major" : "Minor" +
>" compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID());
>   }
> {noformat}
> produces no useful info in case of Major compaction
> {noformat}
> 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker 
> (Worker.java:run(191)) - Caught exception while trying to compact 
> id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\
> ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.io.IOException: Ma\
> jor
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18918:
--
Description: 
{noformat}
  rj.waitForCompletion();
  if (!rj.isSuccessful()) {
throw new IOException(compactionType == CompactionType.MAJOR ? "Major" 
: "Minor" +
   " compactor job failed for " + jobName + "! Hadoop JobId: " + 
rj.getID());
  }
{noformat}

produces no useful info in case of Major compaction

{noformat}
2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker 
(Worker.java:run(191)) - Caught exception while trying to compact 
id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\
ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
  Marking failed to avoid repeated failures, java.io.IOException: Ma\
jor
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
at 
org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)

{noformat}


  was:
{noformat}
  rj.waitForCompletion();
  if (!rj.isSuccessful()) {
throw new IOException(compactionType == CompactionType.MAJOR ? "Major" 
: "Minor" +
   " compactor job failed for " + jobName + "! Hadoop JobId: " + 
rj.getID());
  }
{noformat}

produces no useful info in case of Major compaction


> Bad error message in CompactorMR.lanuchCompactionJob()
> --
>
> Key: HIVE-18918
> URL: https://issues.apache.org/jira/browse/HIVE-18918
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18918.01.patch
>
>
> {noformat}
>   rj.waitForCompletion();
>   if (!rj.isSuccessful()) {
> throw new IOException(compactionType == CompactionType.MAJOR ? 
> "Major" : "Minor" +
>" compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID());
>   }
> {noformat}
> produces no useful info in case of Major compaction
> {noformat}
> 2018-02-28 00:59:16,416 ERROR [gdpr1-61]: compactor.Worker 
> (Worker.java:run(191)) - Caught exception while trying to compact 
> id:38602,dbname:audit,tableName:COMP_ENTRY_AF_A,partN\
> ame:partition_dt=2017-04-11,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.
>   Marking failed to avoid repeated failures, java.io.IOException: Ma\
> jor
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
> at 
> org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
> at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18918:
--
Status: Patch Available  (was: Open)

> Bad error message in CompactorMR.lanuchCompactionJob()
> --
>
> Key: HIVE-18918
> URL: https://issues.apache.org/jira/browse/HIVE-18918
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18918.01.patch
>
>
> {noformat}
>   rj.waitForCompletion();
>   if (!rj.isSuccessful()) {
> throw new IOException(compactionType == CompactionType.MAJOR ? 
> "Major" : "Minor" +
>" compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID());
>   }
> {noformat}
> produces no useful info in case of Major compaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18918:
--
Attachment: HIVE-18918.01.patch

> Bad error message in CompactorMR.lanuchCompactionJob()
> --
>
> Key: HIVE-18918
> URL: https://issues.apache.org/jira/browse/HIVE-18918
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18918.01.patch
>
>
> {noformat}
>   rj.waitForCompletion();
>   if (!rj.isSuccessful()) {
> throw new IOException(compactionType == CompactionType.MAJOR ? 
> "Major" : "Minor" +
>" compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID());
>   }
> {noformat}
> produces no useful info in case of Major compaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18918) Bad error message in CompactorMR.lanuchCompactionJob()

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18918:
-


> Bad error message in CompactorMR.lanuchCompactionJob()
> --
>
> Key: HIVE-18918
> URL: https://issues.apache.org/jira/browse/HIVE-18918
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.2.2
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> {noformat}
>   rj.waitForCompletion();
>   if (!rj.isSuccessful()) {
> throw new IOException(compactionType == CompactionType.MAJOR ? 
> "Major" : "Minor" +
>" compactor job failed for " + jobName + "! Hadoop JobId: " + 
> rj.getID());
>   }
> {noformat}
> produces no useful info in case of Major compaction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16391880#comment-16391880
 ] 

Eugene Koifman commented on HIVE-18825:
---

The Lock manger is entirely based on Read/WriteEntities created by compiler.  
If you move lock acquisition to some place before they are available, you 
basically need to rewrite the entire logic in the LM that is used to figure out 
what to lock.  It may be possible but it's an unpredictably large amount of 
work which makes pessimistic locking an impractically expensive feature.

Incremental refresh is not prevented by this, though I understand that there 
are some possible performance optimizations that are difficult w/o this.

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Status: Patch Available  (was: Open)

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.04.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch, 
> HIVE-18739.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-03-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.04.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch, HIVE-18739.04.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Description: 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a {{Partition}} metadata object and sets the 
location to the directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
at read time the data is decorated with row__id but the original transaction is 
0.  I suspect in earlier Hive versions this will throw or return no data.
Since this new partition didn't have data before, assigning txnid:0 isn't going 
to generate duplicate IDs but it could violate Snapshot Isolation in multi stmt 
txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 adds a partition 
to T.  Now if txnid:7 runs the same query again, it will see the data in the 
new partition.
This can't be release like this since a delete on this data (added via Add 
partition) will use row_ids with txnid:0 so a later upgrade that sees 
un-compacted may generate row_ids with different txnid (assuming this is fixed 
by then)

 

One option is follow Load Data approach and create a new delta_x_x/ and 
move/copy the data there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This 
could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
retains data "outside" of the table tree which make it more likely that this 
data will be modified in some way which can really break things if done after 
and SQL update/delete on this data have happened. 

 

It performs no validations on add (except for partition spec) so any file with 
any format can be added.  It allows add to bucketed tables as well.

Seems like a very dangerous command.  Maybe a better option is to block it and 
advise using Load Data.  Alternatively, make this do Add partition metadata op 
followed by Load Data. 

 

 

  was:
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a {{Partition}} metadata object and sets the 
location to the directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
at read time the data is decorated with row__id but the original transaction is 
0.  I suspect in earlier Hive versions this will throw or return no data.
Since this new partition didn't have data before, assigning txnid:0 isn't going 
to generate duplicate IDs but it could violate Snapshot Isolation in multi stmt 
txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 adds a partition 
to T.  Now if txnid:7 runs the same query again, it will see the data in the 
new partition.

 

One option is follow Load Data approach and create a new delta_x_x/ and 
move/copy the data there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This 
could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
retains data "outside" of the table tree which make it more likely that this 
data will be modified in some way which can really break things if done after 
and SQL update/delete on this data have happened. 

 

It performs no validations on add (except for partition spec) so any file with 
any format can be added.  It allows add to bucketed tables as well.

Seems like a very dangerous command.  Maybe a better option is to block it and 
advise using Load Data.  Alternatively, make this do Add partition metadata op 
followed by Load Data. 

 

 


> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.wip.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
> This can't be release like this since a 

[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-07 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Description: 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a {{Partition}} metadata object and sets the 
location to the directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
at read time the data is decorated with row__id but the original transaction is 
0.  I suspect in earlier Hive versions this will throw or return no data.
Since this new partition didn't have data before, assigning txnid:0 isn't going 
to generate duplicate IDs but it could violate Snapshot Isolation in multi stmt 
txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 adds a partition 
to T.  Now if txnid:7 runs the same query again, it will see the data in the 
new partition.

 

One option is follow Load Data approach and create a new delta_x_x/ and 
move/copy the data there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This 
could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
retains data "outside" of the table tree which make it more likely that this 
data will be modified in some way which can really break things if done after 
and SQL update/delete on this data have happened. 

 

It performs no validations on add (except for partition spec) so any file with 
any format can be added.  It allows add to bucketed tables as well.

Seems like a very dangerous command.  Maybe a better option is to block it and 
advise using Load Data.  Alternatively, make this do Add partition metadata op 
followed by Load Data. 

 

 

  was:
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a {{Partition}} metadata object and sets the 
location to the directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
at read time the data is decorated with row__id but the original transaction is 
0.  I suspect in earlier Hive versions this will throw or return no data.

 

One option is follow Load Data approach and create a new delta_x_x/ and 
move/copy the data there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This 
could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
retains data "outside" of the table tree which make it more likely that this 
data will be modified in some way which can really break things if done after 
and SQL update/delete on this data have happened. 

 

It performs no validations on add (except for partition spec) so any file with 
any format can be added.  It allows add to bucketed tables as well.

Seems like a very dangerous command.  Maybe a better option is to block it and 
advise using Load Data.  Alternatively, make this do Add partition metadata op 
followed by Load Data. 

 

 


> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.wip.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
> Since this new partition didn't have data before, assigning txnid:0 isn't 
> going to generate duplicate IDs but it could violate Snapshot Isolation in 
> multi stmt txns.  Suppose txnid:7 runs {{select * from T}}.  Then txnid:8 
> adds a partition to T.  Now if txnid:7 runs the same query again, it will see 
> the data in the new partition.
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows 

[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-07 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16389911#comment-16389911
 ] 

Eugene Koifman commented on HIVE-18825:
---

Just because some feature doesn't exist yet, it doesn't mean we should make 
changes that will make that feature impossible in the future.  For example, we 
don't have multi statement transactions fully supported, but i constantly pay 
attention to it to make sure it will be possible to finish it.

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18571) stats issues for MM tables; ACID doesn't check state for CTAS

2018-03-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388930#comment-16388930
 ] 

Eugene Koifman commented on HIVE-18571:
---

+1 pending tests

> stats issues for MM tables; ACID doesn't check state for CTAS
> -
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.03.patch, HIVE-18571.04.patch, HIVE-18571.05.patch, 
> HIVE-18571.06.patch, HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388925#comment-16388925
 ] 

Eugene Koifman commented on HIVE-18825:
---

I'm strongly against this.

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18886) ACID: NPE on unexplained mysql exceptions

2018-03-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388770#comment-16388770
 ] 

Eugene Koifman commented on HIVE-18886:
---

+1

> ACID: NPE on unexplained mysql exceptions 
> --
>
> Key: HIVE-18886
> URL: https://issues.apache.org/jira/browse/HIVE-18886
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
>Priority: Major
> Attachments: HIVE-18886.1.patch
>
>
> At 200+ sessions on a single HS2, the DbLock impl fails to propagate mysql 
> exceptions
> {code}
> 2018-03-06T22:55:16,197 ERROR [HiveServer2-Background-Pool: Thread-12867]: 
> ql.Driver (:()) - FAILED: Error in acquiring locks: null
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.metastore.DatabaseProduct.isDeadlock(DatabaseProduct.java:56)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkRetryable(TxnHandler.java:2459)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.getOpenTxns(TxnHandler.java:499)
> {code}
> {code}
> return e instanceof SQLTransactionRollbackException
> || ((dbProduct == MYSQL || dbProduct == POSTGRES || dbProduct == 
> SQLSERVER)
> && e.getSQLState().equals("40001"))
> || (dbProduct == POSTGRES && e.getSQLState().equals("40P01"))
> || (dbProduct == ORACLE && (e.getMessage().contains("deadlock 
> detected")
> || e.getMessage().contains("can't serialize access for this 
> transaction")));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388481#comment-16388481
 ] 

Eugene Koifman commented on HIVE-18825:
---

The lock manger uses Read/WriteEnity in the QueryPlan to know what to lock.  
Those are not there after parsing, so I don't see how that can work w/o 
rewriting the LM.

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18864) WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained after allocating writeId by current transaction.

2018-03-06 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388106#comment-16388106
 ] 

Eugene Koifman commented on HIVE-18864:
---

Yes, you are right.  If you "fix" writeID_HWM=5 as I was suggesting, txn=10 
won't be able to read it's own write.

> WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained 
> after allocating writeId by current transaction.
> ---
>
> Key: HIVE-18864
> URL: https://issues.apache.org/jira/browse/HIVE-18864
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID
> Fix For: 3.0.0
>
>
> For multi-statement txns, it is possible that write on a table happens after 
> a read. Let's see the below scenario.
>  # Committed txn=9 writes on table T1 with writeId=5.
>  # Open txn=10. ValidTxnList(open:null, txn_HWM=10),
>  # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5).
>  # Open txn=11, writes on table T1 with writeid=6.
>  # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5).
>  # Write table T1 from txn=10 with writeId=7.
>  # Read table T1 from txn=10. {color:#d04437}*ValidWriteIdList(open:null, 
> write_HWM=7)*. – This read will able to see rows added by txn=11 which is 
> still open.{color}
> {color:#d04437}So, it is needed to rebuild the open/aborted list of 
> ValidWriteIdList based on txn_HWM. Any writeId allocated by txnId > txn_HWM 
> should be marked as open. In this example, *ValidWriteIdList(open:6, 
> write_HWM=7)* should be generated.{color}
> {color:#33}cc{color} [~ekoifman], [~thejas]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387145#comment-16387145
 ] 

Eugene Koifman edited comment on HIVE-18825 at 3/6/18 2:02 AM:
---

I think there is a problem here that I should've thought of earlier.
Right now we lock in the snapshot after lock acquisition.  This ordering is 
important if we ever want to support lock based concurrency control.  
(Something I think we should do)

Suppose you have 2 concurrent transactions both running "update T1 set x = x + 
1".  If we acquire the update lock first, then record the snapshot, then 2nd 
txn to get the lock will see the result of write of the previous one.  If we 
lock the snapshot before acquiring the lock, both transactions may lock in 
exactly the same snapshot and locking becomes useless as the 2nd will still 
read an "old" snapshot.

Could the predicates you want be inserted at compile time, but bound to actual 
values as some post processing after (or at the end of) 
{{Driver.acquireLocks()}} as currently implemented?


was (Author: ekoifman):
I think there is a problem here that I should've thought of earlier.
Right now we lock in the snapshot after lock acquisition.  This ordering is 
important if we ever want to support lock based concurrency control.

Suppose you have 2 concurrent transactions both running "update T1 set x = x + 
1".  If we acquire the update lock first, then record the snapshot, then 2nd 
txn to get the lock will see the result of write of the previous one.  If we 
lock the snapshot before acquiring the lock, both transactions may lock in 
exactly the same snapshot and locking becomes useless as the 2nd will still 
read an "old" snapshot.

Could the predicates you want be inserted at compile time, but bound to actual 
values as some post processing after (or at the end of) 
{{Driver.acquireLocks()}} as currently implemented?

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387145#comment-16387145
 ] 

Eugene Koifman commented on HIVE-18825:
---

I think there is a problem here that I should've thought of earlier.
Right now we lock in the snapshot after lock acquisition.  This ordering is 
important if we ever want to support lock based concurrency control.

Suppose you have 2 concurrent transactions both running "update T1 set x = x + 
1".  If we acquire the update lock first, then record the snapshot, then 2nd 
txn to get the lock will see the result of write of the previous one.  If we 
lock the snapshot before acquiring the lock, both transactions may lock in 
exactly the same snapshot and locking becomes useless as the 2nd will still 
read an "old" snapshot.

Could the predicates you want be inserted at compile time, but bound to actual 
values as some post processing after (or at the end of) 
{{Driver.acquireLocks()}} as currently implemented?

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.02.patch, 
> HIVE-18825.03.patch, HIVE-18825.04.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18864) WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained after allocating writeId by current transaction.

2018-03-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387081#comment-16387081
 ] 

Eugene Koifman commented on HIVE-18864:
---

Alternatively, write_HWM should always be set to that which corresponds 
txn_HWM, rather than explicitly marking it 'open'.

> WriteId high water mark (HWM) is incorrect if ValidWriteIdList is obtained 
> after allocating writeId by current transaction.
> ---
>
> Key: HIVE-18864
> URL: https://issues.apache.org/jira/browse/HIVE-18864
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Blocker
>  Labels: ACID
> Fix For: 3.0.0
>
>
> For multi-statement txns, it is possible that write on a table happens after 
> a read. Let's see the below scenario.
>  # Committed txn=9 writes on table T1 with writeId=5.
>  # Open txn=10. ValidTxnList(open:null, txn_HWM=10),
>  # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5).
>  # Open txn=11, writes on table T1 with writeid=6.
>  # Read table T1 from txn=10. ValidWriteIdList(open:null, write_HWM=5).
>  # Write table T1 from txn=10 with writeId=7.
>  # Read table T1 from txn=10. {color:#d04437}*ValidWriteIdList(open:null, 
> write_HWM=7)*. – This read will able to see rows added by txn=11 which is 
> still open.{color}
> {color:#d04437}So, it is needed to rebuild the open/aborted list of 
> ValidWriteIdList based on txn_HWM. Any writeId allocated by txnId > txn_HWM 
> should be marked as open.{color}
> {color:#33}{color:#d04437}cc{color}{color} 
> [~ekoifman]{color:#33},{color} [~thejas]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18749) Need to replace transactionId with writeId in RecordIdentifier and other relevant contexts.

2018-03-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387067#comment-16387067
 ] 

Eugene Koifman commented on HIVE-18749:
---

+1

> Need to replace transactionId with writeId in RecordIdentifier and other 
> relevant contexts.
> ---
>
> Key: HIVE-18749
> URL: https://issues.apache.org/jira/browse/HIVE-18749
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18749.01.patch, HIVE-18749.02.patch
>
>
> Per table write ID implementation (HIVE-18192) have replaced global 
> transaction ID with write ID for the primary key for a row marked by 
> RecordIdentifier.Field..transactionId.
> Need to replace the same with writeId and update all test results file.
> Also, need to update other references (methods/variables) which currently 
> uses transactionId instead of writeId.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18860) fix TestAcidOnTez#testGetSplitsLocks

2018-03-05 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386630#comment-16386630
 ] 

Eugene Koifman commented on HIVE-18860:
---

cc [~sankarh]

> fix TestAcidOnTez#testGetSplitsLocks
> 
>
> Key: HIVE-18860
> URL: https://issues.apache.org/jira/browse/HIVE-18860
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Transactions
>Reporter: Zoltan Haindrich
>Priority: Major
>
> it seems to me that HIVE-18665 patch have broken this test 
> https://travis-ci.org/kgyrtkirk/hive/builds/345287889



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18860) fix TestAcidOnTez#testGetSplitsLocks

2018-03-05 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18860:
--
Component/s: Transactions

> fix TestAcidOnTez#testGetSplitsLocks
> 
>
> Key: HIVE-18860
> URL: https://issues.apache.org/jira/browse/HIVE-18860
> Project: Hive
>  Issue Type: Bug
>  Components: Test, Transactions
>Reporter: Zoltan Haindrich
>Priority: Major
>
> it seems to me that HIVE-18665 patch have broken this test 
> https://travis-ci.org/kgyrtkirk/hive/builds/345287889



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18712) Design HMS Api v2

2018-03-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18712:
-

Assignee: Eugene Koifman

> Design HMS Api v2
> -
>
> Key: HIVE-18712
> URL: https://issues.apache.org/jira/browse/HIVE-18712
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alexander Kolbasov
>Assignee: Eugene Koifman
>Priority: Major
>
> This is an umbrella Jira covering the design of Hive Metastore API v2.
> It is supposed to be a placeholder for discussion and design documents.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18712) Design HMS Api v2

2018-03-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18712:
-

Assignee: Alexander Kolbasov  (was: Eugene Koifman)

> Design HMS Api v2
> -
>
> Key: HIVE-18712
> URL: https://issues.apache.org/jira/browse/HIVE-18712
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Alexander Kolbasov
>Assignee: Alexander Kolbasov
>Priority: Major
>
> This is an umbrella Jira covering the design of Hive Metastore API v2.
> It is supposed to be a placeholder for discussion and design documents.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18851) make Hive basic stats valid for ACID; clean up and refactor the code

2018-03-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18851:
--
Component/s: Transactions

> make Hive basic stats valid for ACID; clean up and refactor the code
> 
>
> Key: HIVE-18851
> URL: https://issues.apache.org/jira/browse/HIVE-18851
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics, Transactions
>Reporter: Sergey Shelukhin
>Priority: Major
>  Labels: ACID
>
> HIVE-18571 started as a couple small fixes for MM tables, but ended up as a 
> somewhat major cleanup of stats for ACID tables; however it doesn't do that 
> rigorously and not for all cases.
> This is a follow-up JIRA to implement stats for ACID properly (potentially 
> also with ACID semantics similar to those of queries, but that could be 
> another follow-up - for now, at least they should be based on the correct set 
> of files).
> Overall I've discovered that Hive stats code is spread all over in random 
> places in code base and is brittle and inconsistent, esp. for any complex 
> scenario like ACID tables. 
> So, instead of making ad-hoc fixes everywhere, I think at the minimum it 
> should be moved to a single spot (so that e.g. BasicStatsTask, 
> BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same 
> code with the same logic) and made valid for ACID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18851) make Hive basic stats valid for ACID; clean up and refactor the code

2018-03-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18851:
--
Component/s: Statistics

> make Hive basic stats valid for ACID; clean up and refactor the code
> 
>
> Key: HIVE-18851
> URL: https://issues.apache.org/jira/browse/HIVE-18851
> Project: Hive
>  Issue Type: Bug
>  Components: Statistics, Transactions
>Reporter: Sergey Shelukhin
>Priority: Major
>  Labels: ACID
>
> HIVE-18571 started as a couple small fixes for MM tables, but ended up as a 
> somewhat major cleanup of stats for ACID tables; however it doesn't do that 
> rigorously and not for all cases.
> This is a follow-up JIRA to implement stats for ACID properly (potentially 
> also with ACID semantics similar to those of queries, but that could be 
> another follow-up - for now, at least they should be based on the correct set 
> of files).
> Overall I've discovered that Hive stats code is spread all over in random 
> places in code base and is brittle and inconsistent, esp. for any complex 
> scenario like ACID tables. 
> So, instead of making ad-hoc fixes everywhere, I think at the minimum it 
> should be moved to a single spot (so that e.g. BasicStatsTask, 
> BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same 
> code with the same logic) and made valid for ACID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18723) CompactorOutputCommitter.commitJob() - check rename() ret val

2018-03-02 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16384118#comment-16384118
 ] 

Eugene Koifman commented on HIVE-18723:
---

unfortunately Jenkins output is gone.  Are you sure
org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorWithOpenInMiddle 
(batchId=268)
org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorWithOpenInMiddle 
(batchId=268)
are not related?

> CompactorOutputCommitter.commitJob() - check rename() ret val
> -
>
> Key: HIVE-18723
> URL: https://issues.apache.org/jira/browse/HIVE-18723
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Kryvenko Igor
>Priority: Major
> Attachments: HIVE-18723.1.patch, HIVE-18723.2.patch, HIVE-18723.patch
>
>
> right now ret val is ignored {{fs.rename(fileStatus.getPath(), newPath); }}
> Should this use {{FileUtils.ename(FileSystem fs, Path sourcePath, Path 
> destPath, Configuration conf) }}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-03-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18817:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Jason for the review

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, 
> HIVE-18817.03.patch, HIVE-18817.04.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18845) SHOW COMAPCTIONS should show host name

2018-03-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18845:
-


> SHOW COMAPCTIONS should show host name
> --
>
> Key: HIVE-18845
> URL: https://issues.apache.org/jira/browse/HIVE-18845
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> once the job starts, the WorkerId includes the hostname submitting the job
> but before that there is no way to tell which of the Metastores in HA set up 
> has picked up a given item to compact.  Should make it obvious to know which 
> log to look at.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Attachment: HIVE-18814.wip.patch

> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18814.wip.patch
>
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18814) Support Add Partition For Acid tables

2018-03-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18814:
--
Description: 
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a {{Partition}} metadata object and sets the 
location to the directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
at read time the data is decorated with row__id but the original transaction is 
0.  I suspect in earlier Hive versions this will throw or return no data.

 

One option is follow Load Data approach and create a new delta_x_x/ and 
move/copy the data there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This 
could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
retains data "outside" of the table tree which make it more likely that this 
data will be modified in some way which can really break things if done after 
and SQL update/delete on this data have happened. 

 

It performs no validations on add (except for partition spec) so any file with 
any format can be added.  It allows add to bucketed tables as well.

Seems like a very dangerous command.  Maybe a better option is to block it and 
advise using Load Data.  Alternatively, make this do Add partition metadata op 
followed by Load Data. 

 

 

  was:
[https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]

Add Partition command creates a \{{Partition}} metadata object and set the 
location to the directory containing data files.

In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
at read time the data is decorated with row__id but the original transaction is 
0.  I suspect in earlier Hive versions this will throw or return no data.

 

One option is follow Load Data approach and create a new delta_x_x/ and 
move/copy the data there.

 

Another is to allocate a new writeid and save it in Partition metadata.  This 
could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
retains data "outside" of the table tree which make it more likely that this 
data will be modified in some way which can really break things if done after 
and SQL update/delete on this data have happened. 

 

 

 

 


> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a {{Partition}} metadata object and sets the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
> It performs no validations on add (except for partition spec) so any file 
> with any format can be added.  It allows add to bucketed tables as well.
> Seems like a very dangerous command.  Maybe a better option is to block it 
> and advise using Load Data.  Alternatively, make this do Add partition 
> metadata op followed by Load Data. 
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382614#comment-16382614
 ] 

Eugene Koifman commented on HIVE-18825:
---

I think so - if we have no txn, we should not need ValidTxnList for query 
processin

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18825) Define ValidTxnList before starting query optimization

2018-03-01 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16382548#comment-16382548
 ] 

Eugene Koifman commented on HIVE-18825:
---

this assumption is not valid. For one thing if you open the txn after you get 
the list, you won't be able to read your writes, .i.e. your own txn will be 
above the HWM that is set for you txn.  The txn is currently opened right after 
parsing - do you need ValidTxnList earlier than that?

> Define ValidTxnList before starting query optimization
> --
>
> Key: HIVE-18825
> URL: https://issues.apache.org/jira/browse/HIVE-18825
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
> Attachments: HIVE-18825.01.patch, HIVE-18825.patch
>
>
> Consider a set of tables used by a materialized view where inserts happened 
> after the materialization was created. To compute incremental view 
> maintenance, we need to be able to filter only new rows from those base 
> tables. That can be done by inserting a filter operator with condition e.g. 
> {{ROW\_\_ID.transactionId < highwatermark and ROW\_\_ID.transactionId NOT 
> IN()}} on top of the MVs query definition and triggering the 
> rewriting (which should in turn produce a partial rewriting). However, to do 
> that, we need to have a value for {{ValidTxnList}} during query compilation 
> so we know the snapshot that we are querying.
> This patch aims to generate {{ValidTxnList}} before query optimization. There 
> should not be any visible changes for end user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-03-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18817:
--
Attachment: HIVE-18817.04.patch

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, 
> HIVE-18817.03.patch, HIVE-18817.04.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.

2018-02-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381384#comment-16381384
 ] 

Eugene Koifman commented on HIVE-18824:
---

seem reasonable.  [~sankarh]?

> ValidWriteIdList config should be defined on tables which has to collect 
> stats after insert.
> 
>
> Key: HIVE-18824
> URL: https://issues.apache.org/jira/browse/HIVE-18824
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sergey Shelukhin
>Priority: Major
>  Labels: ACID, isolation
> Fix For: 3.0.0
>
> Attachments: HIVE-18824.patch
>
>
> In HIVE-18192 , per table write ID was introduced where snapshot isolation is 
> built using ValidWriteIdList on tables which are read with in a txn. 
> ReadEntity list is referred to decide which table is read within a txn.
> For insert operation, table will be found only in WriteEntity, but the table 
> is read to collect stats.
> So, it is needed to build the ValidWriteIdList for tables/partition part of 
> WriteEntity as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18817:
--
Attachment: HIVE-18817.03.patch

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, 
> HIVE-18817.03.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381383#comment-16381383
 ] 

Eugene Koifman commented on HIVE-18817:
---

yes, you are right, we now write an index with 1 entry in a lot of case where 
we used to write an index w/o no entries
patch3 updates the tests

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, 
> HIVE-18817.03.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18824) ValidWriteIdList config should be defined on tables which has to collect stats after insert.

2018-02-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381330#comment-16381330
 ] 

Eugene Koifman commented on HIVE-18824:
---

yes, that is basically the problem.

> ValidWriteIdList config should be defined on tables which has to collect 
> stats after insert.
> 
>
> Key: HIVE-18824
> URL: https://issues.apache.org/jira/browse/HIVE-18824
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, isolation
> Fix For: 3.0.0
>
>
> In HIVE-18192 , per table write ID was introduced where snapshot isolation is 
> built using ValidWriteIdList on tables which are read with in a txn. 
> ReadEntity list is referred to decide which table is read within a txn.
> For insert operation, table will be found only in WriteEntity, but the table 
> is read to collect stats.
> So, it is needed to build the ValidWriteIdList for tables/partition part of 
> WriteEntity as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18571) stats issues for MM tables

2018-02-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381315#comment-16381315
 ] 

Eugene Koifman commented on HIVE-18571:
---

I left some comments on RB.  I think makes sense to wait for the fix to 
HIVE-18824 and clean up related code in this patch.

Also this patch is full of todos that seem like they should be Jiras.

> stats issues for MM tables
> --
>
> Key: HIVE-18571
> URL: https://issues.apache.org/jira/browse/HIVE-18571
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: HIVE-18571.01.patch, HIVE-18571.02.patch, 
> HIVE-18571.03.patch, HIVE-18571.patch
>
>
> There are multiple stats aggregation issues with MM tables.
> Some simple stats are double counted and some stats (simple stats) are 
> invalid for ACID table dirs altogether. 
> I have a patch almost ready, need to fix some more stuff and clean up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18750) Exchange partition should be disabled on ACID/Insert-only tables with per table write ID.

2018-02-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16381107#comment-16381107
 ] 

Eugene Koifman commented on HIVE-18750:
---

+1 assuming tests are good

> Exchange partition should be disabled on ACID/Insert-only tables with per 
> table write ID.
> -
>
> Key: HIVE-18750
> URL: https://issues.apache.org/jira/browse/HIVE-18750
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DDL, TODOC3.0, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18750.01.patch, HIVE-18750.02.patch
>
>
> Per table write id implementation (HIVE-18192) have introduced write ID per 
> table and used write ID to name the delta/base files and also as primary key 
> for each row.
> Now, exchange partition have to move delta/base files across tables without 
> changing the write ID which causes incorrect results. 
> Also, this exchange partition feature is there to support the use-case of 
> atomic updates. But with ACID updates, we shall support atomic-updates and 
> hence it makes sense to not support exchange partition for ACID and MM tables.
> The qtest file mm_exchangepartition.q test results to be updated after this 
> change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId

2018-02-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18158:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

thanks Gopal for the review

> Remove OrcRawRecordMerger.ReaderPairAcid.statementId
> 
>
> Key: HIVE-18158
> URL: https://issues.apache.org/jira/browse/HIVE-18158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch, 
> HIVE-18158.03.patch
>
>
>  * Need to get rid of this since we can always get this from the row 
> itself in Acid 2.0.
>  * For Acid 1.0, statementId == 0 in all deltas because both 
> multi-statement txns and
>  * Split Upate are only available in test mode so there is nothing can 
> create a
>  * deltas_x_x_M with M > 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-28 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16380842#comment-16380842
 ] 

Eugene Koifman commented on HIVE-18817:
---

I don't think any of the failures are related but attaching the same patch 
again since ptest is not very stable lately

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18817:
--
Attachment: HIVE-18817.02.patch

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, HIVE-18817.02.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-27 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379801#comment-16379801
 ] 

Eugene Koifman edited comment on HIVE-18817 at 2/28/18 5:24 AM:


patch1 fixes the issue

The test case generates a file with 2 stripes.  w/o the fix to 
OrcRecorUpdater.KeyIndexBuilder, the hive.acid.key.index in the file has 1 
entry - with the fix it has 2 and no AIOOBException happens.

todo: file an ORC ticket to fix this in ORC.

 

[~jdere], [~prasanth_j] please review


was (Author: ekoifman):
patch1 fixes the issue

todo: file an ORC ticket to fix this in ORC.

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18817:
--
Status: Patch Available  (was: Open)

patch1 fixes the issue

todo: file an ORC ticket to fix this in ORC.

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18817) ArrayIndexOutOfBounds exception during read of ACID table.

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18817:
--
Attachment: HIVE-18817.01.patch

> ArrayIndexOutOfBounds exception during read of ACID table.
> --
>
> Key: HIVE-18817
> URL: https://issues.apache.org/jira/browse/HIVE-18817
> Project: Hive
>  Issue Type: Bug
>  Components: ORC, Transactions
>Reporter: Jason Dere
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18817.01.patch, repro.patch
>
>
> Seeing some users hitting the following stack trace:
> {noformat}
> 2018-02-26 05:49:45,876 [ERROR] [TezChild] |tez.TezProcessor|: 
> java.lang.RuntimeException: java.io.IOException: 
> java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:196)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:142)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:113)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:66)
> at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:325)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
> at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
> at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:193)
> ... 19 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 7
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:388)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.(OrcRawRecordMerger.java:457)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1456)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1342)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:255)
> ... 20 more
> {noformat}
> Have a JUnit test that appears to produce a similar stack trace - looks like 
> this occurs if there is an OrcSplit of an ACID table where the split offset 
> is beyond the starting offset of the last stripe in the ORC file.
> cc [~ekoifman]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18816) CREATE TABLE (ACID) doesn't work with TIMESTAMPLOCALTZ column type

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18816:
--
Component/s: Transactions

> CREATE TABLE (ACID) doesn't work with TIMESTAMPLOCALTZ column type
> --
>
> Key: HIVE-18816
> URL: https://issues.apache.org/jira/browse/HIVE-18816
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Vineet Garg
>Assignee: Jesus Camacho Rodriguez
>Priority: Major
>
> *Reproducer*
> {code:sql}
> set hive.support.concurrency=true;
> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
> CREATE TABLE table_acid(d int, tz timestamp with local time zone)
> clustered by (d) into 2 buckets stored as orc TBLPROPERTIES 
> ('transactional'='true');
> {code}
> *Error*
> {code:sql}
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.IllegalArgumentException: 
> Unknown primitive type TIMESTAMPLOCALTZ
> {code}
> *Error stack*
> {noformat}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Unknown primitive type TIMESTAMPLOCALTZ
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:906) 
> ~[hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4788) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:389) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2314) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1985) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1687) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1438) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1427) 
> [hive-exec-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:239) 
> [hive-cli-3.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:188) 
> [hive-cli-3.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:402) 
> [hive-cli-3.0.0-SNAPSHOT.jar:?]
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:335) 
> [hive-cli-3.0.0-SNAPSHOT.jar:?]
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:1345)
>  [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:1319) 
> [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:173)
>  [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) 
> [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
>  [test-classes/:?]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> ~[?:1.8.0_101]
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> ~[?:1.8.0_101]
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_101]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_101]
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
>  [junit-4.11.jar:?]
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  [junit-4.11.jar:?]
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
>  [junit-4.11.jar:?]
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  [junit-4.11.jar:?]
>   at 
> org.apache.hadoop.hive.cli.control.CliAdapter$2$1.evaluate(CliAdapter.java:92)
>  [hive-it-util-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20) 
> [junit-4.11.jar:?]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) 
> [junit-4.11.jar:?]
>   at 

[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

thanks Prasanth for the review

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, 
> HIVE-18659.11.patch, HIVE-18659.12.patch, HIVE-18659.13.patch, 
> HIVE-18659.14.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18814) Support Add Partition For Acid tables

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18814:
-


> Support Add Partition For Acid tables
> -
>
> Key: HIVE-18814
> URL: https://issues.apache.org/jira/browse/HIVE-18814
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/Hive/LanguageManual%2BDDL#LanguageManualDDL-AddPartitions]
> Add Partition command creates a \{{Partition}} metadata object and set the 
> location to the directory containing data files.
> In current master (Hive 3.0), Add partition on an acid table doesn't fail and 
> at read time the data is decorated with row__id but the original transaction 
> is 0.  I suspect in earlier Hive versions this will throw or return no data.
>  
> One option is follow Load Data approach and create a new delta_x_x/ and 
> move/copy the data there.
>  
> Another is to allocate a new writeid and save it in Partition metadata.  This 
> could then be used to decorate data with ROW__IDs.  This avoids move/copy but 
> retains data "outside" of the table tree which make it more likely that this 
> data will be modified in some way which can really break things if done after 
> and SQL update/delete on this data have happened. 
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18808:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

thanks Alan for the review

> Make compaction more robust when stats update fails
> ---
>
> Key: HIVE-18808
> URL: https://issues.apache.org/jira/browse/HIVE-18808
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HIVE-18808.01.patch
>
>
>  
> {\{Worker.gatherStats()}} runs a "analyze table..." command to update stats 
> which requires SessionState.  SessionState objects are cached in ThreadLocal. 
>  If for some reason Session init fails, it may still get attached to the 
> thread which then causes a subsequent request that uses the same thread to 
> gather stats fail because it has a bad session object.  HIVE-15658 describes 
> the same issue in a different context.  
> There is currently no way to recycle a session from outside HMS.
> Failure to gather stats should not kill a compaction job which then prevents 
> Cleaner from running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18808:
--
Description: 
 

{\{Worker.gatherStats()}} runs a "analyze table..." command to update stats 
which requires SessionState.  SessionState objects are cached in ThreadLocal.  
If for some reason Session init fails, it may still get attached to the thread 
which then causes a subsequent request that uses the same thread to gather 
stats fail because it has a bad session object.  HIVE-15658 describes the same 
issue in a different context.  

There is currently no way to recycle a session from outside HMS.

Failure to gather stats should not kill a compaction job which then prevents 
Cleaner from running.

  was:
 

Worker.gatherStats() runs a "analyze table..." command to update stats which 
requires SessionState.  SessionState objects are cached in ThreadLocal.  If for 
some reason Session init fails, it may still get attached to the thread which 
then causes a subsequent request that uses the same tread to gather stats fail 
because it has a bad session object.  HIVE-15658 describes the same issue in a 
different context.  

There is currently no way to recycle a session from outside HMS.

Failure to gather stats should not kill a compaction job which then prevents 
Cleaner from running.


> Make compaction more robust when stats update fails
> ---
>
> Key: HIVE-18808
> URL: https://issues.apache.org/jira/browse/HIVE-18808
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18808.01.patch
>
>
>  
> {\{Worker.gatherStats()}} runs a "analyze table..." command to update stats 
> which requires SessionState.  SessionState objects are cached in ThreadLocal. 
>  If for some reason Session init fails, it may still get attached to the 
> thread which then causes a subsequent request that uses the same thread to 
> gather stats fail because it has a bad session object.  HIVE-15658 describes 
> the same issue in a different context.  
> There is currently no way to recycle a session from outside HMS.
> Failure to gather stats should not kill a compaction job which then prevents 
> Cleaner from running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId

2018-02-27 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16379033#comment-16379033
 ] 

Eugene Koifman commented on HIVE-18158:
---

no related failures

[~gopalv] could you review please

> Remove OrcRawRecordMerger.ReaderPairAcid.statementId
> 
>
> Key: HIVE-18158
> URL: https://issues.apache.org/jira/browse/HIVE-18158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch, 
> HIVE-18158.03.patch
>
>
>  * Need to get rid of this since we can always get this from the row 
> itself in Acid 2.0.
>  * For Acid 1.0, statementId == 0 in all deltas because both 
> multi-statement txns and
>  * Split Upate are only available in test mode so there is nothing can 
> create a
>  * deltas_x_x_M with M > 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.14.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, 
> HIVE-18659.11.patch, HIVE-18659.12.patch, HIVE-18659.13.patch, 
> HIVE-18659.14.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18750) Exchange partition should be disabled on ACID/Insert-only tables with per table write ID.

2018-02-27 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378983#comment-16378983
 ] 

Eugene Koifman commented on HIVE-18750:
---

I would make the error message tell the user to what to do next, for example 
mention Load Data

> Exchange partition should be disabled on ACID/Insert-only tables with per 
> table write ID.
> -
>
> Key: HIVE-18750
> URL: https://issues.apache.org/jira/browse/HIVE-18750
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DDL, TODOC3.0, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18750.01.patch
>
>
> Per table write id implementation (HIVE-18192) have introduced write ID per 
> table and used write ID to name the delta/base files and also as primary key 
> for each row.
> Now, exchange partition have to move delta/base files across tables without 
> changing the write ID which causes incorrect results. 
> Also, this exchange partition feature is there to support the use-case of 
> atomic updates. But with ACID updates, we shall support atomic-updates and 
> hence it makes sense to not support exchange partition for ACID and MM tables.
> The qtest file mm_exchangepartition.q test results to be updated after this 
> change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.13.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, 
> HIVE-18659.11.patch, HIVE-18659.12.patch, HIVE-18659.13.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18808) Make compaction more robust when stats update fails

2018-02-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377981#comment-16377981
 ] 

Eugene Koifman commented on HIVE-18808:
---

no related failures

[~alangates] could you review please

> Make compaction more robust when stats update fails
> ---
>
> Key: HIVE-18808
> URL: https://issues.apache.org/jira/browse/HIVE-18808
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18808.01.patch
>
>
>  
> Worker.gatherStats() runs a "analyze table..." command to update stats which 
> requires SessionState.  SessionState objects are cached in ThreadLocal.  If 
> for some reason Session init fails, it may still get attached to the thread 
> which then causes a subsequent request that uses the same tread to gather 
> stats fail because it has a bad session object.  HIVE-15658 describes the 
> same issue in a different context.  
> There is currently no way to recycle a session from outside HMS.
> Failure to gather stats should not kill a compaction job which then prevents 
> Cleaner from running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-02-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377977#comment-16377977
 ] 

Eugene Koifman commented on HIVE-18192:
---

what property are you looking up?  There is 

hive.txn.valid.txns and 

hive.txn.tables.valid.writeids

> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, 
> HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, 
> HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, 
> HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, 
> HIVE-18192.15.patch, HIVE-18192.16.patch, HIVE-18192.17.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId

2018-02-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18158:
--
Attachment: HIVE-18158.03.patch

> Remove OrcRawRecordMerger.ReaderPairAcid.statementId
> 
>
> Key: HIVE-18158
> URL: https://issues.apache.org/jira/browse/HIVE-18158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch, 
> HIVE-18158.03.patch
>
>
>  * Need to get rid of this since we can always get this from the row 
> itself in Acid 2.0.
>  * For Acid 1.0, statementId == 0 in all deltas because both 
> multi-statement txns and
>  * Split Upate are only available in test mode so there is nothing can 
> create a
>  * deltas_x_x_M with M > 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails

2018-02-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18808:
--
Status: Patch Available  (was: Open)

> Make compaction more robust when stats update fails
> ---
>
> Key: HIVE-18808
> URL: https://issues.apache.org/jira/browse/HIVE-18808
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18808.01.patch
>
>
>  
> Worker.gatherStats() runs a "analyze table..." command to update stats which 
> requires SessionState.  SessionState objects are cached in ThreadLocal.  If 
> for some reason Session init fails, it may still get attached to the thread 
> which then causes a subsequent request that uses the same tread to gather 
> stats fail because it has a bad session object.  HIVE-15658 describes the 
> same issue in a different context.  
> There is currently no way to recycle a session from outside HMS.
> Failure to gather stats should not kill a compaction job which then prevents 
> Cleaner from running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18808) Make compaction more robust when stats update fails

2018-02-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18808:
--
Attachment: HIVE-18808.01.patch

> Make compaction more robust when stats update fails
> ---
>
> Key: HIVE-18808
> URL: https://issues.apache.org/jira/browse/HIVE-18808
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18808.01.patch
>
>
>  
> Worker.gatherStats() runs a "analyze table..." command to update stats which 
> requires SessionState.  SessionState objects are cached in ThreadLocal.  If 
> for some reason Session init fails, it may still get attached to the thread 
> which then causes a subsequent request that uses the same tread to gather 
> stats fail because it has a bad session object.  HIVE-15658 describes the 
> same issue in a different context.  
> There is currently no way to recycle a session from outside HMS.
> Failure to gather stats should not kill a compaction job which then prevents 
> Cleaner from running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (HIVE-18808) Make compaction more robust when stats update fails

2018-02-26 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18808:
-


> Make compaction more robust when stats update fails
> ---
>
> Key: HIVE-18808
> URL: https://issues.apache.org/jira/browse/HIVE-18808
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
>  
> Worker.gatherStats() runs a "analyze table..." command to update stats which 
> requires SessionState.  SessionState objects are cached in ThreadLocal.  If 
> for some reason Session init fails, it may still get attached to the thread 
> which then causes a subsequent request that uses the same tread to gather 
> stats fail because it has a bad session object.  HIVE-15658 describes the 
> same issue in a different context.  
> There is currently no way to recycle a session from outside HMS.
> Failure to gather stats should not kill a compaction job which then prevents 
> Cleaner from running.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18773) Support multiple instances of Cleaner

2018-02-26 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377278#comment-16377278
 ] 

Eugene Koifman commented on HIVE-18773:
---

once HIVE-18772  is in place, whey not have the Worker itself do a clean right 
after successful compaction and before stats gather.  It improves parallelism 
and perhaps makes stats gather more accurate since the set of files on disk is 
more accurate wrt current state of the table.

 

 

> Support multiple instances of Cleaner
> -
>
> Key: HIVE-18773
> URL: https://issues.apache.org/jira/browse/HIVE-18773
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Reporter: Eugene Koifman
>Priority: Major
>
> We support multiple Workers by making each Worker update the status of the 
> entry in COMPACTION_QUEUE to make sure only 1 worker grabs it.  Once we have 
> HIVE-18772, Cleaner should not need any state we can easily have  > 1 Cleaner 
> instance by introducing 1 more status type "being cleaned".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.12.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, 
> HIVE-18659.11.patch, HIVE-18659.12.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-15077) Acid LockManager is unfair

2018-02-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15077:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master

thanks Alan for the review

> Acid LockManager is unfair
> --
>
> Key: HIVE-15077
> URL: https://issues.apache.org/jira/browse/HIVE-15077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-15077.02.patch
>
>
> HIVE-10242 made the acid LM unfair.
> In TxnHandler.checkLock(), suppose we are trying to acquire SR5  (the number 
> is extLockId).  
> Then 
> LockInfo[] locks = lockSet.toArray(new LockInfo[lockSet.size()]);
> may look like this (all explicitly listed locks are in Waiting state)
> {, SR5 SW3 X4}
> So the algorithm will find SR5 in the list and start looking backwards (to 
> the left).
> According to IDs, SR5 should wait for X4 to be granted but X4 won't even be 
> examined and so SR5 may be granted.
> Theoretically, this could cause starvation.
> The query that generates the list already has
> query.append(" and hl_lock_ext_id <= ").append(extLockId);
> but it should use "<" rather than "<=" to exclude the locks being checked 
> from "locks" list which will make the algorithm look at all locks "in front" 
> of a given lock.
> Here is an example (add to TestDbTxnManager2)
> {noformat}
>   @Test
>   public void testFairness2() throws Exception {
> dropTable(new String[]{"T7"});
> CommandProcessorResponse cpr = driver.run("create table if not exists T7 
> (a int) partitioned by (p int) stored as orc TBLPROPERTIES 
> ('transactional'='true')");
> checkCmdOnDriver(cpr);
> checkCmdOnDriver(driver.run("insert into T7 partition(p) 
> values(1,1),(1,2)"));//create 2 partitions
> cpr = driver.compileAndRespond("select a from T7 ");
> checkCmdOnDriver(cpr);
> txnMgr.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T7
> HiveTxnManager txnMgr2 = 
> TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf);
> swapTxnManager(txnMgr2);
> cpr = driver.compileAndRespond("alter table T7 drop partition (p=1)");
> checkCmdOnDriver(cpr);
> //tries to get X lock on T7.p=1 and gets Waiting state
> LockState lockState = ((DbTxnManager) 
> txnMgr2).acquireLocks(driver.getPlan(), ctx, "Fiddler", false);
> List locks = getLocks();
> Assert.assertEquals("Unexpected lock count", 4, locks.size());
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> null, locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=1", locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=2", locks);
> checkLock(LockType.EXCLUSIVE, LockState.WAITING, "default", "T7", "p=1", 
> locks);
> HiveTxnManager txnMgr3 = 
> TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf);
> swapTxnManager(txnMgr3);
> //this should block behind the X lock on  T7.p=1
> cpr = driver.compileAndRespond("select a from T7");
> checkCmdOnDriver(cpr);
> txnMgr3.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T6
> locks = getLocks();
> Assert.assertEquals("Unexpected lock count", 7, locks.size());
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> null, locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=1", locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=2", locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> null, locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=1", locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=2", locks);
> checkLock(LockType.EXCLUSIVE, LockState.WAITING, "default", "T7", "p=1", 
> locks);
>   }
> {noformat}
> The 2nd {{locks = getLocks();}} output shows that all locks for the 2nd 
> {{select * from T7}} are all acquired while they should block behind the X 
> lock to be fair.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId

2018-02-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18158:
--
Attachment: HIVE-18158.02.patch

> Remove OrcRawRecordMerger.ReaderPairAcid.statementId
> 
>
> Key: HIVE-18158
> URL: https://issues.apache.org/jira/browse/HIVE-18158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-18158.01.patch, HIVE-18158.02.patch
>
>
>  * Need to get rid of this since we can always get this from the row 
> itself in Acid 2.0.
>  * For Acid 1.0, statementId == 0 in all deltas because both 
> multi-statement txns and
>  * Split Upate are only available in test mode so there is nothing can 
> create a
>  * deltas_x_x_M with M > 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-23 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.11.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch, 
> HIVE-18659.11.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.10.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch, HIVE-18659.10.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId

2018-02-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18158:
--
Status: Patch Available  (was: Open)

> Remove OrcRawRecordMerger.ReaderPairAcid.statementId
> 
>
> Key: HIVE-18158
> URL: https://issues.apache.org/jira/browse/HIVE-18158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-18158.01.patch
>
>
>  * Need to get rid of this since we can always get this from the row 
> itself in Acid 2.0.
>  * For Acid 1.0, statementId == 0 in all deltas because both 
> multi-statement txns and
>  * Split Upate are only available in test mode so there is nothing can 
> create a
>  * deltas_x_x_M with M > 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18158) Remove OrcRawRecordMerger.ReaderPairAcid.statementId

2018-02-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18158:
--
Attachment: HIVE-18158.01.patch

> Remove OrcRawRecordMerger.ReaderPairAcid.statementId
> 
>
> Key: HIVE-18158
> URL: https://issues.apache.org/jira/browse/HIVE-18158
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-18158.01.patch
>
>
>  * Need to get rid of this since we can always get this from the row 
> itself in Acid 2.0.
>  * For Acid 1.0, statementId == 0 in all deltas because both 
> multi-statement txns and
>  * Split Upate are only available in test mode so there is nothing can 
> create a
>  * deltas_x_x_M with M > 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-02-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373704#comment-16373704
 ] 

Eugene Koifman commented on HIVE-18192:
---

I added a couple of nits on pull request for patch 16 - can be done in a 
followup.

+1 for patch 16 pending tests

> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, 
> HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, 
> HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, 
> HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch, 
> HIVE-18192.15.patch, HIVE-18192.16.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-15077) Acid LockManager is unfair

2018-02-22 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373307#comment-16373307
 ] 

Eugene Koifman commented on HIVE-15077:
---

I did make it unfair in HIVE-10242 but that was not intentional.  (To ensure 
fairness the requested lock is only checked against locks with smaller 
extLockId.  The problem was that HIVE-10242 made it such that not all locks 
with smaller extLockId that should have been checked, were checked.)  I had to 
sort locks by weight in HIVE-10242 because the jumpTable doesn't have any info 
about the resource and so the logic that says "if looking to get S lock and see 
acquired S lock in front, acquire" doesn't work because the S lock in front may 
be on a different resource.  The issue this caused is demonstrated in 
\{{testFairness2()}} in the Description of this ticket.

So I'm aiming for a lock manager that is fair, correct and not more strict than 
necessary.  The last part is a work in progress.

What I think we really need is the use of Intention locks.  That way what you 
are suggesting is possible.  Right now we just "infer" a lock (which is not 
physically there) up/down the resource hierarchy based on the lock that is 
actually asked for (and of the same type).  This way you'd only have to compare 
locks on resources with the same path.  This is a bigger change.

 

 

> Acid LockManager is unfair
> --
>
> Key: HIVE-15077
> URL: https://issues.apache.org/jira/browse/HIVE-15077
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.3.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-15077.02.patch
>
>
> HIVE-10242 made the acid LM unfair.
> In TxnHandler.checkLock(), suppose we are trying to acquire SR5  (the number 
> is extLockId).  
> Then 
> LockInfo[] locks = lockSet.toArray(new LockInfo[lockSet.size()]);
> may look like this (all explicitly listed locks are in Waiting state)
> {, SR5 SW3 X4}
> So the algorithm will find SR5 in the list and start looking backwards (to 
> the left).
> According to IDs, SR5 should wait for X4 to be granted but X4 won't even be 
> examined and so SR5 may be granted.
> Theoretically, this could cause starvation.
> The query that generates the list already has
> query.append(" and hl_lock_ext_id <= ").append(extLockId);
> but it should use "<" rather than "<=" to exclude the locks being checked 
> from "locks" list which will make the algorithm look at all locks "in front" 
> of a given lock.
> Here is an example (add to TestDbTxnManager2)
> {noformat}
>   @Test
>   public void testFairness2() throws Exception {
> dropTable(new String[]{"T7"});
> CommandProcessorResponse cpr = driver.run("create table if not exists T7 
> (a int) partitioned by (p int) stored as orc TBLPROPERTIES 
> ('transactional'='true')");
> checkCmdOnDriver(cpr);
> checkCmdOnDriver(driver.run("insert into T7 partition(p) 
> values(1,1),(1,2)"));//create 2 partitions
> cpr = driver.compileAndRespond("select a from T7 ");
> checkCmdOnDriver(cpr);
> txnMgr.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T7
> HiveTxnManager txnMgr2 = 
> TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf);
> swapTxnManager(txnMgr2);
> cpr = driver.compileAndRespond("alter table T7 drop partition (p=1)");
> checkCmdOnDriver(cpr);
> //tries to get X lock on T7.p=1 and gets Waiting state
> LockState lockState = ((DbTxnManager) 
> txnMgr2).acquireLocks(driver.getPlan(), ctx, "Fiddler", false);
> List locks = getLocks();
> Assert.assertEquals("Unexpected lock count", 4, locks.size());
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> null, locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=1", locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=2", locks);
> checkLock(LockType.EXCLUSIVE, LockState.WAITING, "default", "T7", "p=1", 
> locks);
> HiveTxnManager txnMgr3 = 
> TxnManagerFactory.getTxnManagerFactory().getTxnManager(conf);
> swapTxnManager(txnMgr3);
> //this should block behind the X lock on  T7.p=1
> cpr = driver.compileAndRespond("select a from T7");
> checkCmdOnDriver(cpr);
> txnMgr3.acquireLocks(driver.getPlan(), ctx, "Fifer");//gets S lock on T6
> locks = getLocks();
> Assert.assertEquals("Unexpected lock count", 7, locks.size());
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> null, locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=1", locks);
> checkLock(LockType.SHARED_READ, LockState.ACQUIRED, "default", "T7", 
> "p=2", locks);
> checkLock(LockType.SHARED_READ, 

[jira] [Assigned] (HIVE-18772) Make Acid Cleaner use MIN_HISTORY_LEVEL

2018-02-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-18772:
-


> Make Acid Cleaner use MIN_HISTORY_LEVEL
> ---
>
> Key: HIVE-18772
> URL: https://issues.apache.org/jira/browse/HIVE-18772
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
>
> Instead of using Lock Manager state as it currently does.
> This will eliminate possible race conditions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18747) Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL.

2018-02-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18747:
--
Summary: Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL.  
(was: Cleaner for TXN_TO_WRITE_ID table entries.)

> Cleaner for TXN_TO_WRITE_ID table entries/MIN_HISTORY_LEVEL.
> 
>
> Key: HIVE-18747
> URL: https://issues.apache.org/jira/browse/HIVE-18747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Minor
>  Labels: ACID
> Fix For: 3.0.0
>
>
> Per table write ID implementation (HIVE-18192) maintains a map between txn ID 
> and table write ID in TXN_TO_WRITE_ID meta table. 
> The entries in this table is used to generate ValidWriteIdList for the given 
> ValidTxnList to ensure snapshot isolation. 
> When table or database is dropped, then these entries are cleaned-up. But, it 
> is necessary to clean-up for active tables too for better performance.
> Need to have another table MIN_HISTORY_LEVEL to maintain the least txn which 
> is referred by any active ValidTxnList snapshot as open/aborted txn. If no 
> references found in this table for any txn, then it is eligible for cleanup.
> After clean-up, need to maintain just one entry per table to mark as LWM (low 
> water mark).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.09.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch, HIVE-18659.09.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18659) add acid version marker to acid files/directories

2018-02-21 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16372145#comment-16372145
 ] 

Eugene Koifman commented on HIVE-18659:
---

patch 9 fixes tests output and checkstyle

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.09.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch, 
> HIVE-18659.09.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18659) add acid version marker to acid files/directories

2018-02-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370826#comment-16370826
 ] 

Eugene Koifman commented on HIVE-18659:
---

{\{createdDeltaDirs.add(deltaDest)}}  is how it was before this patch.  I'm not 
sure what the original intent was

patch 7 to see check style/tests - the above links got recycled

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18659) add acid version marker to acid files/directories

2018-02-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18659:
--
Attachment: HIVE-18659.07.patch

> add acid version marker to acid files/directories
> -
>
> Key: HIVE-18659
> URL: https://issues.apache.org/jira/browse/HIVE-18659
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18659.01.patch, HIVE-18659.04.patch, 
> HIVE-18659.05.patch, HIVE-18659.06.patch, HIVE-18659.07.patch
>
>
> add acid version marker to acid files so that we know which version of acid 
> wrote the file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18742) Vectorization acid/inputformat check should allow NullRowsInputFormat/OneNullRowInputFormat

2018-02-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18742:
--
Affects Version/s: 2.0.0

> Vectorization acid/inputformat check should allow 
> NullRowsInputFormat/OneNullRowInputFormat
> ---
>
> Key: HIVE-18742
> URL: https://issues.apache.org/jira/browse/HIVE-18742
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions, Vectorization
>Affects Versions: 2.0.0
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18742.1.patch, HIVE-18742.2.patch
>
>
> Vectorizer.verifyAndSetVectorPartDesc() has a Preconditions check to ensure 
> the InputFormat is ORC only. However there can be metadataonly or empty 
> result optimizations on Acid tables, which change the input format to 
> NullRows/OneNullRowInputFormat, which gets tripped up on this check.
> Relaxing this check to allow nullrows and onenullrow input formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18192) Introduce WriteID per table rather than using global transaction ID

2018-02-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370291#comment-16370291
 ] 

Eugene Koifman commented on HIVE-18192:
---

is 
{{org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mm_exchangepartition]}}
 a related failure?

> Introduce WriteID per table rather than using global transaction ID
> ---
>
> Key: HIVE-18192
> URL: https://issues.apache.org/jira/browse/HIVE-18192
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DR, pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18192.01.patch, HIVE-18192.02.patch, 
> HIVE-18192.03.patch, HIVE-18192.04.patch, HIVE-18192.05.patch, 
> HIVE-18192.06.patch, HIVE-18192.07.patch, HIVE-18192.08.patch, 
> HIVE-18192.09.patch, HIVE-18192.10.patch, HIVE-18192.11.patch, 
> HIVE-18192.12.patch, HIVE-18192.13.patch, HIVE-18192.14.patch
>
>
> To support ACID replication, we will be introducing a per table write Id 
> which will replace the transaction id in the primary key for each row in a 
> ACID table.
> The current primary key is determined via 
>  
> which will move to 
>  
> For each table modified by the given transaction will have a table level 
> write ID allocated and a persisted map of global txn id -> to table -> write 
> id for that table has to be maintained to allow Snapshot isolation.
> Readers should use the combination of ValidTxnList and 
> ValidWriteIdList(Table) for snapshot isolation.
>  
>  [Hive Replication - ACID 
> Tables.pdf|https://issues.apache.org/jira/secure/attachment/12903157/Hive%20Replication-%20ACID%20Tables.pdf]
>  has a section "Per Table Sequences (Write-Id)" with more detials



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18751) ACID table scan through get_splits UDF doesn't receive ValidWriteIdList configuration.

2018-02-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370279#comment-16370279
 ] 

Eugene Koifman commented on HIVE-18751:
---

[~jdere] do you have any input here?

[~sankarh] should this be a blocker for HIVE-18192?

 

> ACID table scan through get_splits UDF doesn't receive ValidWriteIdList 
> configuration.
> --
>
> Key: HIVE-18751
> URL: https://issues.apache.org/jira/browse/HIVE-18751
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, UDF
> Fix For: 3.0.0
>
>
> Per table write ID (HIVE-18192) have replaced global transaction ID with 
> write ID to version data files in ACID/MM tables,
> To ensure snapshot isolation, need to generate ValidWriteIdList for the given 
> txn/table and use it when scan the ACID/MM tables.
> In case of get_splits UDF which runs on ACID table scan query won't receive 
> it properly through configuration (hive.txn.tables.valid.writeids) and hence 
> throws exception. 
> TestAcidOnTez.testGetSplitsLocks is the test failing for the same. Need to 
> fix it.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18750) Exchange partition should not be supported with per table write ID.

2018-02-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370272#comment-16370272
 ] 

Eugene Koifman commented on HIVE-18750:
---

Also, Load Data is supported for Acid tables - this can be used as an instant 
batch upload.

 

 

> Exchange partition should not be supported with per table write ID.
> ---
>
> Key: HIVE-18750
> URL: https://issues.apache.org/jira/browse/HIVE-18750
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DDL
> Fix For: 3.0.0
>
>
> Per table write id implementation (HIVE-18192) have introduced write ID per 
> table and used write ID to name the delta/base files and also as primary key 
> for each row.
> Now, exchange partition have to move delta/base files across tables without 
> changing the write ID which causes incorrect results. 
> Also, this exchange partition feature is there to support the use-case of 
> atomic updates. But with ACID updates, we shall support atomic-updates and 
> hence it makes sense to not support exchange partition for ACID and MM tables.
> The qtest file mm_exchangepartition.q test results to be updated after this 
> change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-18748) Rename table should update the table names in NEXT_WRITE_ID and TXN_TO_WRITE_ID tables. 

2018-02-20 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370268#comment-16370268
 ] 

Eugene Koifman commented on HIVE-18748:
---

How does "rename" get surfaced to the end user?  Via Alter Table?

I don't think there is anything else anywhere in the acid system that handles 
rename of db.table value.  This probably needs to be a comprehensive change.  
(Or we can explore using table ID of some sort)

> Rename table should update the table names in NEXT_WRITE_ID and 
> TXN_TO_WRITE_ID tables. 
> 
>
> Key: HIVE-18748
> URL: https://issues.apache.org/jira/browse/HIVE-18748
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, Transactions
>Affects Versions: 3.0.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>Priority: Major
>  Labels: ACID, DDL
> Fix For: 3.0.0
>
>
> Per table write ID implementation (HIVE-18192) introduces couple of 
> metatables such as NEXT_WRITE_ID and TXN_TO_WRITE_ID to manage write ids 
> allocated per table.
> Now, when we rename any tables, it is necessary to update the corresponding 
> table names in these table as well. Otherwise, ACID table operations won't 
> work properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18742) Vectorization acid/inputformat check should allow NullRowsInputFormat/OneNullRowInputFormat

2018-02-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18742:
--
Component/s: Transactions

> Vectorization acid/inputformat check should allow 
> NullRowsInputFormat/OneNullRowInputFormat
> ---
>
> Key: HIVE-18742
> URL: https://issues.apache.org/jira/browse/HIVE-18742
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions, Vectorization
>Reporter: Jason Dere
>Assignee: Jason Dere
>Priority: Major
> Attachments: HIVE-18742.1.patch, HIVE-18742.2.patch
>
>
> Vectorizer.verifyAndSetVectorPartDesc() has a Preconditions check to ensure 
> the InputFormat is ORC only. However there can be metadataonly or empty 
> result optimizations on Acid tables, which change the input format to 
> NullRows/OneNullRowInputFormat, which gets tripped up on this check.
> Relaxing this check to allow nullrows and onenullrow input formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-18739) Add support for Export from unpartitioned Acid table

2018-02-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18739:
--
Attachment: HIVE-18739.01.patch

> Add support for Export from unpartitioned Acid table
> 
>
> Key: HIVE-18739
> URL: https://issues.apache.org/jira/browse/HIVE-18739
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Attachments: HIVE-18739.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    8   9   10   11   12   13   14   15   16   17   >