JvmPauseMonitor

2019-04-09 Thread Eugene Koifman
Hi,
Hive has 2 JvmPauseMonitor classes
https://github.com/apache/hive/blob/master/common/src/java/org/apache/hadoop/hive/common/JvmPauseMonitor.java
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/metrics/JvmPauseMonitor.java

both of which are close to copies of Hadoop JvmPauseMonitor
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Is there a reason not to use one from Hadoop?

Thanks,
Eugene


[jira] [Created] (HIVE-21266) Issue with single delta file

2019-02-13 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21266:
-

 Summary: Issue with single delta file
 Key: HIVE-21266
 URL: https://issues.apache.org/jira/browse/HIVE-21266
 Project: Hive
  Issue Type: Sub-task
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Vaibhav Gumashta


[https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java#L353-L357]

 
{noformat}
if ((deltaCount + (dir.getBaseDirectory() == null ? 0 : 1)) + origCount <= 1) {
  LOG.debug("Not compacting {}; current base is {} and there are {} deltas 
and {} originals", sd.getLocation(), dir
  .getBaseDirectory(), deltaCount, origCount);
  return;
}
 {noformat}

Is problematic.
Suppose you have 1 delta file from streaming ingest: {{delta_11_20}} where 
{{txnid:13}} was aborted.  The code above will not rewrite the delta (which 
drops anything that belongs to the aborted txn) and transition the compaction 
to "ready_for_cleaning" which will drop the metadata about the aborted txn.  
Now aborted data will come back as committed.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21226) Exclude read-only transactions from ValidTxnList

2019-02-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21226:
-

 Summary: Exclude read-only transactions from ValidTxnList
 Key: HIVE-21226
 URL: https://issues.apache.org/jira/browse/HIVE-21226
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


Once HIVE-21114 is done, we should make sure that ValidTxnList doesn't contain 
any read-only txns in the exceptions list since by definition there is no data 
tagged with such txnid.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21177) Optimize AcidUtils.getLogicalLength()

2019-01-29 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21177:
-

 Summary: Optimize AcidUtils.getLogicalLength()
 Key: HIVE-21177
 URL: https://issues.apache.org/jira/browse/HIVE-21177
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


{{AcidUtils.getLogicalLength()}} - tries look for the side file 
{{OrcAcidUtils.getSideFile()}} on the file system even when the file couldn't 
possibly be there, e.g. when the path is delta_x_x or base_x.  It could only be 
there in delta_x_y, x != y.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

2019-01-28 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review212399
---




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 299 (patched)
<https://reviews.apache.org/r/69367/#comment298161>

testMoreBucketsThanReducers/testMoreBucketsThanReducers2 in TestTxnCommands 
force a specific number of reducers



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 185 (patched)
<https://reviews.apache.org/r/69367/#comment298162>

nit: since this is filtering for 'base' it's not checking if it 'only' 
contains base...



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 195 (patched)
<https://reviews.apache.org/r/69367/#comment298163>

I still don't understand what this comment is conveying.  This is just a 
normal read, so I would assume TezSplitGrouper is not running in compactor mode


- Eugene Koifman


On Jan. 28, 2019, 11:49 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> ---
> 
> (Updated Jan. 28, 2019, 11:49 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
> https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> d6a41919bf 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 
> 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> db3b427adc 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> a0df82cb20 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out c9716e904c 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/9/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Created] (HIVE-21172) DEFAULT keyword handling in MERGE UPDATE clause issues

2019-01-25 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21172:
-

 Summary: DEFAULT keyword handling in MERGE UPDATE clause issues
 Key: HIVE-21172
 URL: https://issues.apache.org/jira/browse/HIVE-21172
 Project: Hive
  Issue Type: Sub-task
  Components: SQL, Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


once HIVE-21159 lands, enable {{HiveConf.MERGE_SPLIT_UPDATE}} and run these 
tests.

TestMiniLlapLocalCliDriver.testCliDriver[sqlmerge_stats]
mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=insert_into_default_keyword.q

Merge is rewritten as a multi-insert.  When Update clause has DEFAULT, it's not 
properly replaced with a value in the muli-insert - it's treated as a literal
{noformat}
INSERT INTO `default`.`acidTable`-- update clause(insert part)
 SELECT `t`.`key`, `DEFAULT`, `t`.`value`
   WHERE `t`.`key` = `s`.`key` AND `s`.`key` > 3 AND NOT(`s`.`key` < 3)
{noformat}

See {{LOG.info("Going to reparse <" + originalQuery + "> as \n<" + 
rewrittenQueryStr.toString() + ">");}} in hive.log

{{MergeSemanticAnalyzer.replaceDefaultKeywordForMerge()}} is only called in 
{{handleInsert}} but not {{handleUpdate()}}.  Why does issue only show up with 
{{MERGE_SPLIT_UPDATE}}?




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21161) Remove checks that disallow updating bucketing and partitioning columns

2019-01-23 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21161:
-

 Summary: Remove checks that disallow updating bucketing and 
partitioning columns
 Key: HIVE-21161
 URL: https://issues.apache.org/jira/browse/HIVE-21161
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


once both update and merge do Update split early, we can remove checks (in 
SemanticAnalyzer?) that prevent updating of partition/bucketing columns



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21160) Rewrite Update statement as Multi-insert and do Update split early

2019-01-23 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21160:
-

 Summary: Rewrite Update statement as Multi-insert and do Update 
split early
 Key: HIVE-21160
 URL: https://issues.apache.org/jira/browse/HIVE-21160
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21159) Modify Merge statement logic to perform Update split early

2019-01-23 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21159:
-

 Summary: Modify Merge statement logic to perform Update split early
 Key: HIVE-21159
 URL: https://issues.apache.org/jira/browse/HIVE-21159
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21158) Perform update split early

2019-01-23 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21158:
-

 Summary: Perform update split early
 Key: HIVE-21158
 URL: https://issues.apache.org/jira/browse/HIVE-21158
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Currently Acid 2.0 does U=D+I in the OrcRecordUpdater.  This means that all 
Updates (wide rows) are shuffled AND sorted.
We could modify the the multi-insert statement which results from Merge 
statement so that instead of having one of the legs represent Update, we create 
2 legs - 1 representing Delete of original row and 1 representing Insert of the 
new version.
Delete events are very small so sorting them is cheap.  The Insert are written 
to disk in a sorted way by virtue of how ROW__IDs are generated.

Exactly the same idea applies to regular Update statement.

Note that the U=D+I in OrcRecordUpdater needs to be kept to keep [Streaming 
Mutate API 
|https://cwiki.apache.org/confluence/display/Hive/HCatalog+Streaming+Mutation+API]
 working on 2.0.

*This requires that TxnHandler flags 2 Deletes as a conflict - it doesn't 
currently*

Incidentally, 2.0 + early split allows updating all columns including bucketing 
and partition columns

What is lock acquisition based on?  Need to make sure that conflict detection 
(write set tracking) still works

So we want to transform
{noformat}
update T set B = 7 where A=1
{noformat}
into 
{noformat}
from T
insert into T select ROW__ID where a = 1 SORT BY ROW__ID
insert into T select a, 7 where a = 1
{noformat}

even better to
{noformat}
from T where a = 1
insert into T select ROW__ID SORT BY ROW__ID
insert into T select a, 7
{noformat}
but this won't parse currently.

This is very similar to how MERGE stmt is handled.

Need some though on on how WriteSet tracking works.  If we don't allow updating 
partition column, then even with dynamic partitions 
TxnHandler.addDynamicPartitions() should see 1 entry (in Update type) for each 
partition since both the insert and delete land in the same partition.  If part 
cols can be updated, then then we may insert a Delete event into P1 and 
corresponding Insert event into P2 so addDynamicPartitions() should see both 
parts.  I guess both need to be recored in Write_Set but with different types.  
The delete as 'update' and insert as insert so that it can conflict with some 
IOW on the 'new' partition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21154) Investigate using object IDs in Acid HMS schema instead of names

2019-01-23 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21154:
-

 Summary: Investigate using object IDs in Acid HMS schema instead 
of names
 Key: HIVE-21154
 URL: https://issues.apache.org/jira/browse/HIVE-21154
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman


Currently all Acid related tables in HMS DB (HIVE_LOCKS, TXN_COMPONENTS, etc) 
use db_name/table_name/partition_name to identify the metastore object that is 
being tracked (these are potentially long strings, esp partition name.  It 
would improve perf to use object ID such as TBLS.TBL_ID which is exposed in 
Thrift since HIVE-20556.  It would also make handling object rename operations 
no-op (currently handled in {{TxnHandler.onRename()}} from {{AcidEventListener 
extends MetaStoreEventListener}}).  This would require significant HMS schema 
changes and surfacing the ID of Database/Partition objects.

Need to think how this affects replication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

2019-01-22 Thread Eugene Koifman
#comment297891>

What throws the IAE?  Above I see
if (!reader.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) {

shouldn't it bail out there if there is no index?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 638 (patched)
<https://reviews.apache.org/r/69367/#comment297892>

is there a followup Jira for this?



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
Lines 2201 (patched)
<https://reviews.apache.org/r/69367/#comment297893>

it would be helpful to add COMPACTOR_CRUD_QUERY_BASED property name to the 
error msg



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 248 (patched)
<https://reviews.apache.org/r/69367/#comment297919>

What does this do for MM table?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 371 (patched)
<https://reviews.apache.org/r/69367/#comment297922>

should 'conf' be cloned?  will this affect 'conf' for something else?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 537 (patched)
<https://reviews.apache.org/r/69367/#comment297923>

why does it need "0+ ..."



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
Lines 101 (patched)
<https://reviews.apache.org/r/69367/#comment297895>

Useful to include table/part name in the msg



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 17 (patched)
<https://reviews.apache.org/r/69367/#comment297912>

ROW__ID.bucket_column - you mean bucketId?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 57 (patched)
<https://reviews.apache.org/r/69367/#comment297915>

This doesn't compare statemetId anywhere but it should.

I think the easiest is to compare bucketProperty or you could extract 
statemetId from it and do it explicitly



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 60 (patched)
<https://reviews.apache.org/r/69367/#comment297917>

I don't think equals makes sense



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 61 (patched)
<https://reviews.apache.org/r/69367/#comment297918>

it maybe useful to include both ROW__IDs in the message.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 74 (patched)
<https://reviews.apache.org/r/69367/#comment297916>

nit: make class and fields final to make sure compareTo is inlined?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 91 (patched)
<https://reviews.apache.org/r/69367/#comment297914>

when is it ok for 2 consecutive ROW_IDs to be equal?


- Eugene Koifman


On Jan. 21, 2019, 11:04 p.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> ---
> 
> (Updated Jan. 21, 2019, 11:04 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
> https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> d6a41919bf 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 
> 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 
> 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 0e5b3e5473 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> a0df82cb20 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
>  PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/7/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Created] (HIVE-21146) Enforce TransactionBatch size=1 for blob stores

2019-01-22 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21146:
-

 Summary: Enforce TransactionBatch size=1 for blob stores
 Key: HIVE-21146
 URL: https://issues.apache.org/jira/browse/HIVE-21146
 Project: Hive
  Issue Type: Bug
  Components: Streaming, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


Streaming Ingest API supports a concept of {{TransactionBatch}} where N 
transactions can be opened at once and the data in all of them will be written 
to the same delta_x_y directory where each transaction in the batch can be 
committed/aborted independently.  The implementation relies on 
{{FSDataOutputStream.hflush()}} (called from OrcRecordUpdater}} which is 
available on HDFS but is often implemented as no-op in Blob store backed 
{{FileSystem}} objects.

Need to add a check to {{HiveStreamingConnection()}} constructor to raise an 
error if {{builder.transactionBatchSize > 1}} and the target table/partitions 
are backed by something that doesn't support {{hflush()}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69704: HIVE-21052

2019-01-17 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69704/#review212120
---




ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
Line 129 (original), 129 (patched)
<https://reviews.apache.org/r/69704/#comment297738>

This doesn't check 'p' type compactions so you could enqueue multiple ones 
for the same table, but see my Jira comments.



standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
Lines 53 (patched)
<https://reviews.apache.org/r/69704/#comment297739>

why is this needed?  when is the writeId list ever get passed over the wire?


- Eugene Koifman


On Jan. 16, 2019, 10:08 a.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69704/
> ---
> 
> (Updated Jan. 16, 2019, 10:08 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Make sure transaction get cleaned if they are aborted before addPartitions is 
> called
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  dc7b2877bf 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 5dbf634825 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 3482cfce36 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 06b0209aa0 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> a0df82cb20 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 5e085f84af 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> b6f70ebe63 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> c569b242ae 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java
>  9c33229270 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java
>  f7d9ed2e2e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java
>  f4e3d6bd71 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java
>  2b394449a3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  4aee45ce5f 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionType.java
>  7450b27cf3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CreationMetadata.java
>  9595a5dc10 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FindSchemasByColsResp.java
>  42073db544 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java
>  dd6658d636 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse.java
>  68146e4561 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java
>  ee535a0c80 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java
>  71e92b6c03 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java
>  0ea6ef5fb3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java
>  759b495bf6 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsFilterSpec.java
>  b5a2b68efd 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsProjectionSpec.java
>  e6c9c06beb 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsRequest.java
>  7ec107ea6c 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/o

Re: Review Request 69704: HIVE-21052

2019-01-17 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69704/#review212117
---




ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java
Lines 2539 (patched)
<https://reviews.apache.org/r/69704/#comment297735>

JavaDoc



ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java
Lines 97 (patched)
<https://reviews.apache.org/r/69704/#comment297736>

JavaDoc



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
Lines 83 (patched)
<https://reviews.apache.org/r/69704/#comment297728>

Since you only have a single HMS connection (I assume this is what this 
locks is protecting), wouldn't it be better to get the table/partition path 
before parallelizing the work that can actually be parallelized?  This way you 
fork threads and then synch them immediately.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java
Lines 140 (patched)
<https://reviews.apache.org/r/69704/#comment297730>

I'm not sure this achieves what the commnet says.  For normal clean (as we 
had before) you may have > 1 compaction_queue entry in ready for cleaning.  You 
should not have > 1 entry in Working state for the same partition, you may have 
> 1 entry in ready-for-cleaning since you have more workers than Cleaners.

It's perhaps made even worse by the new "table level" clean.  I think you 
are right to worry about this though.  I'll make a more detail comment on the 
Jira



shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java
Lines 797 (patched)
<https://reviews.apache.org/r/69704/#comment297734>

Why is this needed?  It should have some JavaDoc



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java
Line 91 (original), 91 (patched)
<https://reviews.apache.org/r/69704/#comment297733>

I don't think this is right.  You are now counting aborted txns by type, so 
that you need > maxAborted aborted Inserts or  > maxAborted aborted Updates, 
etc to trigger compaction rather than ( > maxAborted of (aborted inserts + 
updates+ deletes)



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Line 1228 (original), 1231 (patched)
<https://reviews.apache.org/r/69704/#comment297737>

exclude 'p' type here


- Eugene Koifman


On Jan. 16, 2019, 10:08 a.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69704/
> ---
> 
> (Updated Jan. 16, 2019, 10:08 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Make sure transaction get cleaned if they are aborted before addPartitions is 
> called
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  dc7b2877bf 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 5dbf634825 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HdfsUtils.java 3482cfce36 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 06b0209aa0 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> a0df82cb20 
>   ql/src/test/org/apache/hadoop/hive/ql/lockmgr/TestDbTxnManager2.java 
> 5e085f84af 
>   shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java 
> b6f70ebe63 
>   shims/common/src/main/java/org/apache/hadoop/hive/shims/HadoopShims.java 
> c569b242ae 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AddDynamicPartitions.java
>  9c33229270 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java
>  f7d9ed2e2e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java
>  f4e3d6bd71 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java
>  2b394449a3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  4aee45ce5f 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionType.java
>  7450b27cf3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CreationMetadata.java
>  9595a5dc1

[jira] [Created] (HIVE-21114) Create read-only transactions

2019-01-10 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21114:
-

 Summary: Create read-only transactions
 Key: HIVE-21114
 URL: https://issues.apache.org/jira/browse/HIVE-21114
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


With HIVE-21036 we have a way to indicate that a txn is read only.
We should (at least in auto-commit mode) determine if the single stmt is a read 
and mark the txn accordingly.  
Then we can optimize {{TxnHandler.commitTxn()}} so that it doesn't do any 
checks in write_set etc.
HiveOperation only has QUERY, which includes Insert and Select, so this 
requires figuring out how to determine if a query is a SELECT.  By the time 
{{Driver.openTransaction();}} is called, we have already parsed the query so 
there should be a way to know if the statement only reads.

For multi-stmt txns (once these are supported) we should allow user to indicate 
that a txn is read-only and then not allow any statements that can make 
modifications in this txn.  This should be a different jira.

cc [~ikryvenko]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21106) Potential NEP in VectorizedOrcAcidRowBatchReader.ColumnizedDeleteEventRegistry

2019-01-08 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21106:
-

 Summary: Potential NEP in 
VectorizedOrcAcidRowBatchReader.ColumnizedDeleteEventRegistry
 Key: HIVE-21106
 URL: https://issues.apache.org/jira/browse/HIVE-21106
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


{{VectorizedOrcAcidRowBatchReader.ColumnizedDeleteEventRegistry()}}

{noformat}
AcidStats acidStats = OrcAcidUtils.parseAcidStats(deleteDeltaReader);
if (acidStats.deletes == 0) {
 continue; // just a safe check to ensure that we are not reading empty delete 
files.
}
{noformat}

If the {{delete_delta../bucket_x}} is empty, it may not have a 
{{hive.acid.index}} and {{OrcAcidUtils.parseAcidStats()}} will return null 
which causes NPE.







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69462: HIVE-20936

2018-12-21 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69462/#review211519
---


Ship it!




Ship It!

- Eugene Koifman


On Dec. 21, 2018, 4:30 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69462/
> ---
> 
> (Updated Dec. 21, 2018, 4:30 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Allow the Worker thread in the metastore to run outside of it
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  b290a40734 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  d3800cdf2a 
>   jdbc/src/java/org/apache/hive/jdbc/Utils.java 852942e6a2 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 42ce1746fd 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> f5b901d6e8 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> cdcc0e9548 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 49662cd68b 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java a3034fb195 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> 287aeaecb0 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java
>  d85dda5acd 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java
>  3eb55b1b59 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java
>  17f8b7730a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FindSchemasByColsResp.java
>  f2f8fb475e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java
>  f7e188dfda 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse.java
>  bd38bbe45d 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java
>  fb591dcec5 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java
>  e8dfba523d 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java
>  3d32f372d6 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java
>  2b176efee4 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsFilterSpec.java
>  c0fe726f8a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsProjectionSpec.java
>  db91e0bf89 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsRequest.java
>  d26cde23fc 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsResponse.java
>  3db9095b5c 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetTablesRequest.java
>  c3f71fe13e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetTablesResult.java
>  5716922bd3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/InsertEventRequestData.java
>  3ef24310b2 

Re: Review Request 69462: HIVE-20936

2018-12-21 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69462/#review211508
---



it looks like it has merge conflits


standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java
Line 34 (original), 42 (patched)
<https://reviews.apache.org/r/69462/#comment296731>

Why does this have more state fields than CompactorInfoStruct?  Perhaps it 
can done in HIVE-21056



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 586 (patched)
<https://reviews.apache.org/r/69462/#comment296732>

"rj.getID().toString()" shouldn't be inside the quotes



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java
Lines 216 (patched)
<https://reviews.apache.org/r/69462/#comment296733>

this seems unused anywhere


- Eugene Koifman


On Dec. 20, 2018, 5:02 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69462/
> ---
> 
> (Updated Dec. 20, 2018, 5:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Allow the Worker thread in the metastore to run outside of it
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  b290a40734 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  5af047f465 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 42ce1746fd 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> f5b901d6e8 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> cdcc0e9548 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 49662cd68b 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java a3034fb195 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> 287aeaecb0 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/AlterPartitionsRequest.java
>  d85dda5acd 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClearFileMetadataRequest.java
>  3eb55b1b59 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ClientCapabilities.java
>  17f8b7730a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FindSchemasByColsResp.java
>  f2f8fb475e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/FireEventRequest.java
>  f7e188dfda 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetAllFunctionsResponse.java
>  bd38bbe45d 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprRequest.java
>  fb591dcec5 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataByExprResult.java
>  e8dfba523d 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataRequest.java
>  3d32f372d6 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetFileMetadataResult.java
>  2b176efee4 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsFilterSpec.java
>  c0fe726f8a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPartitionsProjectionSpec.java
>  db91e0bf89 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/GetPart

[jira] [Created] (HIVE-21058) Make Compactor run in a transaction (Umbrella)

2018-12-18 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21058:
-

 Summary: Make Compactor run in a transaction (Umbrella)
 Key: HIVE-21058
 URL: https://issues.apache.org/jira/browse/HIVE-21058
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 4.0.0


Ensure that files produced by the compactor have their visibility controlled 
via Hive transaction commit like any other write to an ACID table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69462: HIVE-20936

2018-12-17 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69462/#review211381
---




ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
Line 184 (original), 184 (patched)
<https://reviews.apache.org/r/69462/#comment296389>

Just realized this needs a new metastore connection.  Thrift connections 
are not thread safe - when you mulitplex calls on a single connection, response 
messages sometimes get lost or matched to the wrong request. 

If you look at how Heartbeating is done in DbTxnHandler, it does something 
similar except that it relies on ThreadLocal in Hive.get(conf).getMSC().



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
Line 187 (original), 187 (patched)
<https://reviews.apache.org/r/69462/#comment296388>

why is this added here?  The CompactionHeartbeater should do this


- Eugene Koifman


On Dec. 17, 2018, 9:59 a.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69462/
> ---
> 
> (Updated Dec. 17, 2018, 9:59 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Allow the Worker thread in the metastore to run outside of it
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  b290a40734 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  5af047f465 
>   jdbc/src/java/org/apache/hive/jdbc/Utils.java 852942e6a2 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 42ce1746fd 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> f5b901d6e8 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> cdcc0e9548 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 21043415d3 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 546ff955b7 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> 52453a2ec4 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/OptionalCompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
>  b6a0893524 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php
>  3170798663 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  39f8b1f05a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
>  d57de353c6 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
>  a896849989 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  4ef4aadfee 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  97dc0696b7 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
>  a5f976bc5c 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  9eb1193a27 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  fa19440ba2 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  e25a8cf9a1 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> cb899d791f 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  598847df03 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreThread.java
>  6ef2e3560d 
>   
> standalone-metastor

Re: Review Request 69462: HIVE-20936

2018-12-17 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69462/#review211379
---




standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Lines 1019 (patched)
<https://reviews.apache.org/r/69462/#comment296383>

you can have Conf validate the values for you, for exaple 
MATERIALIZATIONS_INVALIDATION_CACHE_IMPL



standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
Lines 1079 (patched)
<https://reviews.apache.org/r/69462/#comment296384>

nit: it seems like a returning a (possibly) list of CompactionInfoStruct is 
simpler/easier to understand.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/CompactionInfo.java
Lines 172 (patched)
<https://reviews.apache.org/r/69462/#comment296385>

could you add a comment at the top next to the member variables to indicate 
that these methods should be modfied to be in sync


- Eugene Koifman


On Dec. 17, 2018, 9:59 a.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69462/
> ---
> 
> (Updated Dec. 17, 2018, 9:59 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Allow the Worker thread in the metastore to run outside of it
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  b290a40734 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  5af047f465 
>   jdbc/src/java/org/apache/hive/jdbc/Utils.java 852942e6a2 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 42ce1746fd 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> f5b901d6e8 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> cdcc0e9548 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 21043415d3 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java 546ff955b7 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> 52453a2ec4 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/OptionalCompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
>  b6a0893524 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php
>  3170798663 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  39f8b1f05a 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
>  d57de353c6 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
>  a896849989 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  4ef4aadfee 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  97dc0696b7 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
>  a5f976bc5c 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  9eb1193a27 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  fa19440ba2 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
>  e25a8cf9a1 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> cb899d791f 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  598847df03 
>   
> standalone-metastore/metastore-server/sr

Re: Review Request 69462: HIVE-20936

2018-12-12 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69462/#review211256
---




ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java
Lines 60 (patched)
<https://reviews.apache.org/r/69462/#comment296188>

@Override



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java
Line 80 (original), 79 (patched)
<https://reviews.apache.org/r/69462/#comment296189>

@Override



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
Lines 35 (patched)
<https://reviews.apache.org/r/69462/#comment296199>

Perhaps add that this can run inside HMS as well.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
Line 97 (original), 94 (patched)
<https://reviews.apache.org/r/69462/#comment296190>

this is also intialzied in init() - should it throw here instead?  It seems 
that the contract is that init() must be called first



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java
Lines 241 (patched)
<https://reviews.apache.org/r/69462/#comment296193>

What is the purpose of this?  Why doesn't the existing catch(Throwable) 
with the same log msg work?



service/src/java/org/apache/hive/service/server/HiveServer2.java
Lines 1016 (patched)
<https://reviews.apache.org/r/69462/#comment296200>

Are there any tests that actually enable HIVE_MAPREDUCE_AVAILABLE ?



service/src/java/org/apache/hive/service/server/HiveServer2.java
Lines 1019 (patched)
<https://reviews.apache.org/r/69462/#comment296198>

Why do you need reflection for this?  Why not just do
Worker w = new Worker();, etc?



standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Lines 1020 (patched)
<https://reviews.apache.org/r/69462/#comment296194>

Perhaps this should be called hive.metastore.runworker.remotely - you can 
run 'remote' worker even with MR or 'hive.metastore.runworker.in" and support 
values "metastore" and "hs2" - this is probable more extensivble



standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift
Lines 2448 (patched)
<https://reviews.apache.org/r/69462/#comment296187>

nit: could you move these to around line 2312 where the rest of TxnStore 
methods are



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreServerUtils.java
Lines 1476 (patched)
<https://reviews.apache.org/r/69462/#comment296191>

I would put both of these in CompactionInfo.  If someone adds fields to 
CompactionInfo, they are unlikely to ever find these methods and so some info 
will be lost in the marshalling back and forth. 
    
Alternatively, could CompactionInfo be a subclass of CompactionInfoStruct?


- Eugene Koifman


On Dec. 11, 2018, 3:45 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69462/
> ---
> 
> (Updated Dec. 11, 2018, 3:45 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Allow the Worker thread in the metastore to run outside of it
> 
> 
> Diffs
> -
> 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  b290a40734 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  beb36d7674 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Cleaner.java 18253c9bab 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> c6cb7c5254 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> f5b901d6e8 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java 
> cdcc0e9548 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/MetaStoreCompactorThread.java
>  PRE-CREATION 
>   
> ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/RemoteCompactorThread.java
>  PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Worker.java 4a1cac123c 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java dc39f5ef61 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 
> 52453a2ec4 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 0c55654475 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CompactionInfoStruct.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/ha

[jira] [Created] (HIVE-21036) extend OpenTxnRequest with transaction type

2018-12-12 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21036:
-

 Summary: extend OpenTxnRequest with transaction type
 Key: HIVE-21036
 URL: https://issues.apache.org/jira/browse/HIVE-21036
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


There is a {{TXN_TYPE}} field in {{TXNS}} table.

There is {{TxnHandler.TxnType}} with legal values.  It would be useful to 
TxnType a {{Thrift}}, add a new {{COMPACTION}} type object and allow setting it 
in {{OpenTxnRequest}}.

Since HIVE-20823 compactor starts a txn and should set this.

Down the road we may want to set READ_ONLY either based on parsing of the query 
or user input which can make {{TxnHandler.commitTxn}} faster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21025) LLAP IO fails on read if partition column is included in the table and the query has a predicate on the partition column

2018-12-10 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21025:
-

 Summary: LLAP IO fails on read if partition column is included in 
the table and the query has a predicate on the partition column
 Key: HIVE-21025
 URL: https://issues.apache.org/jira/browse/HIVE-21025
 Project: Hive
  Issue Type: Bug
  Components: llap
Affects Versions: 2.3.4
Reporter: Eugene Koifman


Hive doesn't officially support the case when a partitioning column is also 
included in the data itself, though it works in some cases. Hive would never 
write a data file with partition column in it but this can happen for external 
tables where data is added by the end user.

Consider improving validation (at least for schema-aware files) on read to 
produce a better error than {[ArrayIndexOutOfBoundsException}}

{code:java}
Caused by: java.lang.ArrayIndexOutOfBoundsException 
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : 
attempt_1539023000868_24675_3_01_07_3:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
java.io.IOException: java.lang.ArrayIndexOutOfBoundsException 
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
 
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) 
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:422) 
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
 
at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
 
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) 
at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: java.io.IOException: 
java.lang.ArrayIndexOutOfBoundsException 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:80)
 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:419)
 
at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
 
... 15 more 
Caused by: java.io.IOException: java.io.IOException: 
java.lang.ArrayIndexOutOfBoundsException 
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
 
at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
 
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:355)
 
at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) 
at 
org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) 
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:116)
 
at 
org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:151)
 
at org.apache.tez.mapreduce.lib.MRReaderMapred.next(MRReaderMapred.java:116) 
at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
 
... 17 more 
Caused by: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException 
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:355)
 
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:310)
 
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:250)
 
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:67)
 
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
 
... 23 more 
Caused by: java.lang.ArrayIndexOutOfBoundsException
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-21020) log which table/partition is being processed by a txn in Worker

2018-12-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-21020:
-

 Summary: log which table/partition is being processed by a txn in 
Worker
 Key: HIVE-21020
 URL: https://issues.apache.org/jira/browse/HIVE-21020
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


Make sure we have info in the log that ties cat.table.part with txnid of the 
compactor



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20960) remove CompactorMR.createCompactorMarker()

2018-11-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20960:
-

 Summary: remove CompactorMR.createCompactorMarker()
 Key: HIVE-20960
 URL: https://issues.apache.org/jira/browse/HIVE-20960
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Now that we have HIVE-20941, we know if a dir is produced by compactor from the 
name and {{CompactorMR.createCompactorMarker()}} can be removed.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

2018-11-20 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210740
---




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Line 165 (original), 165 (patched)
<https://reviews.apache.org/r/69367/#comment295490>

?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Line 170 (original), 170 (patched)
<https://reviews.apache.org/r/69367/#comment295491>

why are all these test made non-tests?
or does this do somethign else?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 533 (patched)
<https://reviews.apache.org/r/69367/#comment295492>

were you going to do "0+validate_acid_sort_order(...)" instead?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 54 (patched)
<https://reviews.apache.org/r/69367/#comment295494>

I'm guessing if compareTo returns 0 that's bad - we should have unique row 
ids



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 61 (patched)
<https://reviews.apache.org/r/69367/#comment295493>

should this return 0?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 80 (patched)
<https://reviews.apache.org/r/69367/#comment295489>

I think comparison should include 'bucketProperty' since we sort on 
'bucketProperty' not just bucketId.
In particular, if you have > 1 statement per txn, we expect that rows from 
2nd stmt follow those from 1st.


- Eugene Koifman


On Nov. 19, 2018, 3:49 a.m., Vaibhav Gumashta wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> ---
> 
> (Updated Nov. 19, 2018, 3:49 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
> https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 
> 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 
> 8cabf960db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
> 6e7c78bd17 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 
> 92c74e1d06 
>   
> ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>



[jira] [Created] (HIVE-20948) Eliminate file rename in compactor

2018-11-20 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20948:
-

 Summary: Eliminate file rename in compactor
 Key: HIVE-20948
 URL: https://issues.apache.org/jira/browse/HIVE-20948
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


Once HIVE-20823 is committed, we should investigate if it's possible to have 
compactor write directly to base_x_cZ or delta_x_y_cZ



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20943) Handle Compactor transaction abort properly

2018-11-19 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20943:
-

 Summary: Handle Compactor transaction abort properly
 Key: HIVE-20943
 URL: https://issues.apache.org/jira/browse/HIVE-20943
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


A transactions in which the Worker runs may fail after base_x_cZ (delta_x_y_xZ) 
is created but before files are fully written.  Need to make sure to write to 
TXN_COMPONENTS an entry for corresponding to Z so "_cZ" directories are not 
read by anyone and cleaned by Cleaner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20942) Worker should heartbeat its own txn

2018-11-19 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20942:
-

 Summary: Worker should heartbeat its own txn
 Key: HIVE-20942
 URL: https://issues.apache.org/jira/browse/HIVE-20942
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


Since HIVE-20823 \{{Worker.java}} starts a txn - should either add a heartbeat 
thread to it or use HiveTxnManager to start txn which will set up heartbeat 
automatically.  In the later case make sure it's properly cancelled on failures.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20941) Compactor produces a delete_delta_x_y even if there are no input delete events

2018-11-19 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20941:
-

 Summary: Compactor produces a delete_delta_x_y even if there are 
no input delete events
 Key: HIVE-20941
 URL: https://issues.apache.org/jira/browse/HIVE-20941
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

2018-11-15 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210589
---




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2685 (patched)
<https://reviews.apache.org/r/69367/#comment295290>

"And minor compaction will be disabled." - should make sure Initiator 
doesn't start minor and that Alter Table commands requesting Minor are no-op or 
throw so that these don't get into the compactor queue.  We should also, 
perhaps think about how Initiator triggers Major compactions - are current 
config params adequate?  Should do at least the 2nd part in a follow up jira, 
maybe both.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Line 180 (original), 183 (patched)
<https://reviews.apache.org/r/69367/#comment295291>

I guess all this should be no-op for compactor since it only looks at 1 
partition at a time and for acid serde and IF/OF don't change.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 197 (patched)
<https://reviews.apache.org/r/69367/#comment295292>

bucketSplitMultiMap?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 206 (patched)
<https://reviews.apache.org/r/69367/#comment295293>

the error should include table name if easily available here or if not 
maybe a file path from any of the splits...



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 214 (patched)
<https://reviews.apache.org/r/69367/#comment295294>

should we assert that schemaSplitMultiMap has size=1 since that is what we 
expect for compactor?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 276 (patched)
<https://reviews.apache.org/r/69367/#comment295295>

Add a comment that this is trully a bucketId (rather than bucket property - 
BucketCodec.java since 3.0) that is derived from file name

WriteId is also from containing file name and for files that have min/max 
wrieid, it's the starting one.  Now that I look at the code in 
TransactionMetadata.findWriteIDForSynthetcRowIDs() - the assert there will 
throw.  It should be removed since where we have to handle files that come from 
compacted dirs so min <> max for all deltas.

maybe these comments should be on OrcSplit where getter methods are defined.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java
Lines 68 (patched)
<https://reviews.apache.org/r/69367/#comment295296>

mark these transient for clarity since we don't serialize them



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 245 (patched)
<https://reviews.apache.org/r/69367/#comment295297>

Ideally this should be prevented before it gets into the compction_queue. 
throwing here will cause failed compactions to accumulate in SHOW COMPACTIONS 
and prevent auto-scheduling of new ones.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 399 (patched)
<https://reviews.apache.org/r/69367/#comment295298>

should this be in a finally{}?  SessionState is threadLocal so it may get 
reused... or do we shutdown the session each time?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 481 (patched)
<https://reviews.apache.org/r/69367/#comment295299>

current write id should always be the same as original.  Only delete event 
can have these be different but major compaction absorbs delete events.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 503 (patched)
<https://reviews.apache.org/r/69367/#comment295300>

what's the value of specifying location for tmp table?  I'm surprised it's 
even legal.  Would this be a security hole potentially?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 510 (patched)
<https://reviews.apache.org/r/69367/#comment295302>

why overwrite?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 513 (patched)
<https://reviews.apache.org/r/69367/#comment295301>

why do you need partition key/values in the query? we are always reading a 
single partition.  This is achieved by getAcidState() which takes partition dir 
as input (i.e. all the files it returns are within a given partition)



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 542 (patched)
<https://reviews.apache.org/r/69367/#comment295303>

need to think about this.  maybe it's ok...



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 565 (patched)
<https://reviews.apache.org/r/69367/#comment295304>

there should be something in AcidUtils to parse original bucket file name


- Eugene Koifman


On Nov. 15, 2018, 4:59 p.m., Vaibhav Gumashta wrote:
> 
> -

[jira] [Created] (HIVE-20901) running compactor when there is nothing to do produces duplicate data

2018-11-09 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20901:
-

 Summary: running compactor when there is nothing to do produces 
duplicate data
 Key: HIVE-20901
 URL: https://issues.apache.org/jira/browse/HIVE-20901
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


suppose we run minor compaction 2 times, via alter table

The 2nd request to compaction should have nothing to do but I don't think there 
is a check for that.  It's visible in the context of HIVE-20823, where each 
compactor run produces a delta with new visibility suffix so we end up with 
something like
{noformat}
target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/

├── delete_delta_001_002_v019
│   ├── _orc_acid_version
│   └── bucket_0
├── delete_delta_001_002_v021
│   ├── _orc_acid_version
│   └── bucket_0
├── delta_001_001_
│   ├── _orc_acid_version
│   └── bucket_0
├── delta_001_002_v019
│   ├── _orc_acid_version
│   └── bucket_0
├── delta_001_002_v021
│   ├── _orc_acid_version
│   └── bucket_0
└── delta_002_002_
    ├── _orc_acid_version
    └── bucket_0{noformat}
i.e. 2 deltas with the same write ID range

this is bad.  Probably happens today as well but new run produces a delta with 
the same name and clobbers the previous one, which may interfere with writers

 

need to investigate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20885) ql.txn.compactor.TestCompactor runs most tests 2 times

2018-11-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20885:
-

 Summary: ql.txn.compactor.TestCompactor runs most tests 2 times
 Key: HIVE-20885
 URL: https://issues.apache.org/jira/browse/HIVE-20885
 Project: Hive
  Issue Type: Improvement
  Components: Streaming, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


HIVE-19211 added {{@RunWith(Parameterized.class)}} so that it runs once with 
{{newStreamingAPI=true}} and once with \{{newStreamingAPI==false}} but only 
about 5 tests out of 23 make use of this variable.  All other tests are 
executed 2 times for no reason

 

cc [~prasanth_j]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20874) Add ability to to run high priority compaction

2018-11-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20874:
-

 Summary: Add ability to to run high priority compaction
 Key: HIVE-20874
 URL: https://issues.apache.org/jira/browse/HIVE-20874
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman


currently all compaction requests (via Alter Table command or auto initiated 
(\{{Initiator.java}}) land in a queue (\{{COMPACTION_QUEUE}} metastore DB 
table) and are executed in order.

If the queue is long and some table/partition needs to e compacted urgently, 
there is no way to send it to the beginning of the queue.

Need a way to address this.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20862) QueryId no longer shows up in the logs

2018-11-02 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20862:
-

 Summary: QueryId no longer shows up in the logs
 Key: HIVE-20862
 URL: https://issues.apache.org/jira/browse/HIVE-20862
 Project: Hive
  Issue Type: Bug
  Components: Logging
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20863) remove dead code

2018-11-02 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20863:
-

 Summary: remove dead code
 Key: HIVE-20863
 URL: https://issues.apache.org/jira/browse/HIVE-20863
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20859) clean up invocation of Worker/Cleaner/Initiator in test code

2018-11-01 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20859:
-

 Summary: clean up invocation of Worker/Cleaner/Initiator in test 
code
 Key: HIVE-20859
 URL: https://issues.apache.org/jira/browse/HIVE-20859
 Project: Hive
  Issue Type: Improvement
  Components: Test, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


there are many places like {{CompactorTest}} that use
{code:java|title=CompactorTest.java}
AtomicBoolean stop = new AtomicBoolean(true);
Worker t = new Worker();
t.setThreadId((int) t.getId());
t.setConf(hiveConf);
AtomicBoolean looped = new AtomicBoolean();
t.init(stop, looped);
t.run();
{code}
should instead standardize on {{TestTxnCommands2.runWorker()}}
 same for {{Cleaner}} and {{Initiator}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20856) ValidReaderWriteIdList() is not valid in most places

2018-11-01 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20856:
-

 Summary: ValidReaderWriteIdList() is not valid in most places
 Key: HIVE-20856
 URL: https://issues.apache.org/jira/browse/HIVE-20856
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


Most of the time it's something like this:
{code:java}
String txnString = conf.get(ValidWriteIdList.VALID_WRITEIDS_KEY);
this.validWriteIdList = (txnString == null) ? 
   new ValidReaderWriteIdList() : new ValidReaderWriteIdList(txnString);
{code}

but ValidReaderWriteIdList() (no arg c'tor) creates a write ID list that 
considers every base/delta valid - this unlikely to be the correct for a 
general read of acid data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20823) Make Compactor run in a transaction

2018-10-26 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20823:
-

 Summary: Make Compactor run in a transaction
 Key: HIVE-20823
 URL: https://issues.apache.org/jira/browse/HIVE-20823
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Have compactor open a transaction and run the job in that transaction.
# make compactor produced base/delta include this txn id in the folder name, 
e.g. base_7_c17 where 17 is the txnid.
# add {{CQ_TXN_ID bigint}} to COMPACTION_QUEUE and COMPLETED_COMPACTIONS to 
record this txn id
# make sure {{AcidUtils.getAcidState()}} pays attention to this transaction on 
read and ignores this dir if this txn id is not committed in the current 
snapshot
## this means not only validWriteIdList but ValidTxnIdList should be passed 
along in config (if it isn't yet)
# once this is done, {{CompactorMR.createCompactorMarker()}} can be eliminated 
and {{AcidUtils.isValidBase}} modified accordingly
# modify Cleaner so that it doesn't clean old files until new file is visible 
to all readers
# 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20769) TxnHandler.checkLock() will re-acquire the same lock

2018-10-17 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20769:
-

 Summary: TxnHandler.checkLock() will re-acquire the same lock
 Key: HIVE-20769
 URL: https://issues.apache.org/jira/browse/HIVE-20769
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Eugene Koifman


as currently implemented, this will acquire the same type of lock on the same 
resource if requested by another stmt in the same txn.  Need to fix that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68805: HIVE-20538

2018-10-15 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68805/#review209558
---


Ship it!




Ship It!

- Eugene Koifman


On Sept. 21, 2018, 3:51 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68805/
> ---
> 
> (Updated Sept. 21, 2018, 3:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20538: Allow to store a key value together with a transaction.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnKeyValue.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnRequest.java
>  db47f9db8b 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  936f7c5a40 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  958f13c18e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  a3dddf54e4 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  d226db50a5 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  54e7eda0da 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> ad83162ec3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  1df1ebce49 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  080cc5284b 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  ce590d0f55 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
>  db4dd9ec42 
> 
> 
> Diff: https://reviews.apache.org/r/68805/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jaume Marhuenda
> 
>



[jira] [Created] (HIVE-20738) Enable Delete Event filtering in VectorizedOrcAcidRowBatchReader

2018-10-12 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20738:
-

 Summary: Enable Delete Event filtering in 
VectorizedOrcAcidRowBatchReader
 Key: HIVE-20738
 URL: https://issues.apache.org/jira/browse/HIVE-20738
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Currently DeleteEventRegistry loads all delete events which can take time and 
use a lot of memory.  Should minimize the number of deletes loaded based on the 
insert events included in the Split.
This is an umbrella jira for several tasks that make up the work.  See 
individual tasks for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20730) Do delete event filtering even if hive.acid.index is not there

2018-10-11 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20730:
-

 Summary: Do delete event filtering even if hive.acid.index is not 
there
 Key: HIVE-20730
 URL: https://issues.apache.org/jira/browse/HIVE-20730
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman


since HIVE-16812 {{VectorizedOrcAcidRowBatchReader}} filters delete events 
based on min/max ROW__ID in the split which relies on {{hive.acid.index}} to be 
in the ORC footer.  

There is no way to generate {{hive.acid.index}} from a plain query as in 
HIVE-20699 and so we need to make sure that we generate a SARG into 
delete_delta/bucket_x based on stripe stats even the index is missing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20723) Allow per table specification of compaction yarn queue

2018-10-10 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20723:
-

 Summary: Allow per table specification of compaction yarn queue
 Key: HIVE-20723
 URL: https://issues.apache.org/jira/browse/HIVE-20723
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Eugene Koifman


Currently compactions of full CRUD transactional tables are Map-Reduce jobs 
submitted to a yarn queue defined by hive.compactor.job.queue property.

If would be useful to be able to override this on per table basis by putting it 
into table properties so that compactions for different tables can use 
different queues.

 

There is already ability to override other compaction related configs via table 
props, though this will need additional handling.

[https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-TableProperties]

 

See {{CopactorMR.COMPACTOR_PREFIX}} and {{Initiator.COMPACTORTHRESHOLD_PREFIX}}

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68805: HIVE-20538

2018-10-09 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68805/#review209387
---




ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsWithSplitUpdateAndVectorization.java
Line 25 (original), 30 (patched)
<https://reviews.apache.org/r/68805/#comment293781>

What is this change for?  TestTxnCommands is a subclass of 
TxnCommandsBaseForTests.  I think this means none of the TestTxnCommands tests 
run in vectorized mode any more

More generally, what is the point of other changes in this class?


- Eugene Koifman


On Sept. 21, 2018, 3:51 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68805/
> ---
> 
> (Updated Sept. 21, 2018, 3:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20538: Allow to store a key value together with a transaction.
> 
> 
> Diffs
> -
> 
>   
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommandsWithSplitUpdateAndVectorization.java
>  a013230025 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnKeyValue.java
>  PRE-CREATION 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnRequest.java
>  db47f9db8b 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  936f7c5a40 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  958f13c18e 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  a3dddf54e4 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  d226db50a5 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  54e7eda0da 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> ad83162ec3 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  1df1ebce49 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnStore.java
>  080cc5284b 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  ce590d0f55 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
>  db4dd9ec42 
> 
> 
> Diff: https://reviews.apache.org/r/68805/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jaume Marhuenda
> 
>



[jira] [Created] (HIVE-20699) Query based compactor for full CRUD Acid tables

2018-10-05 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20699:
-

 Summary: Query based compactor for full CRUD Acid tables
 Key: HIVE-20699
 URL: https://issues.apache.org/jira/browse/HIVE-20699
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68834: HIVE-20556

2018-10-04 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68834/#review209243
---


Ship it!




Ship It!

- Eugene Koifman


On Sept. 24, 2018, 8:04 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68834/
> ---
> 
> (Updated Sept. 24, 2018, 8:04 p.m.)
> 
> 
> Review request for hive, Daniel Dai and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Expose an API to retrieve the TBL_ID from TBLS in the metastore tables
> 
> 
> Diffs
> -
> 
>   data/files/exported_table/_metadata 81fbf63a54 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestAuthorizationPreEventListener.java
>  05c00094d6 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestMetastoreAuthorizationProvider.java
>  767321332c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java f72e08c14f 
>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java ca4d36f30d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 
> ff411f62d5 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java
>  78ac909f72 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  22deffe1d3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  38fac465d7 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  0192c6da31 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 85a5c601e0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  ba82a9327c 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  64945060f7 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTable.java
>  deeb97133d 
>   standalone-metastore/metastore-server/src/main/resources/package.jdo 
> 2a5f016b1f 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  4937d9d861 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java
>  df83171648 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesCreateDropAlterTruncate.java
>  bf302ed491 
> 
> 
> Diff: https://reviews.apache.org/r/68834/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jaume Marhuenda
> 
>



Re: Review Request 68834: HIVE-20556

2018-09-28 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68834/#review209103
---




standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
Lines 1808 (patched)
<https://reviews.apache.org/r/68834/#comment293363>

It would help debugging if the msg included cat.db.table + tblid that was 
set.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTable.java
Lines 296 (patched)
<https://reviews.apache.org/r/68834/#comment293364>

I don't see anyone calling this - is thsi needed?



standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
Lines 1715 (patched)
<https://reviews.apache.org/r/68834/#comment293365>

it would make sense to check in the catch that you are getting the expected 
error msg


- Eugene Koifman


On Sept. 24, 2018, 8:04 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68834/
> ---
> 
> (Updated Sept. 24, 2018, 8:04 p.m.)
> 
> 
> Review request for hive, Daniel Dai and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Expose an API to retrieve the TBL_ID from TBLS in the metastore tables
> 
> 
> Diffs
> -
> 
>   data/files/exported_table/_metadata 81fbf63a54 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestAuthorizationPreEventListener.java
>  05c00094d6 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/security/TestMetastoreAuthorizationProvider.java
>  767321332c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java f72e08c14f 
>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHive.java ca4d36f30d 
>   
> ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 
> ff411f62d5 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java
>  78ac909f72 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  22deffe1d3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  38fac465d7 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  0192c6da31 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 85a5c601e0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  ba82a9327c 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  d27224b235 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/model/MTable.java
>  deeb97133d 
>   standalone-metastore/metastore-server/src/main/resources/package.jdo 
> 2a5f016b1f 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
>  4937d9d861 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStorePartitionSpecs.java
>  df83171648 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/client/TestTablesCreateDropAlterTruncate.java
>  bf302ed491 
> 
> 
> Diff: https://reviews.apache.org/r/68834/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jaume Marhuenda
> 
>



[jira] [Created] (HIVE-20655) Optimize arrayCopy in LlapRecordReader

2018-09-28 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20655:
-

 Summary: Optimize arrayCopy in LlapRecordReader
 Key: HIVE-20655
 URL: https://issues.apache.org/jira/browse/HIVE-20655
 Project: Hive
  Issue Type: Improvement
  Components: llap, Transactions
Affects Versions: 4.0.0
 Environment: followup to HIVE-19985
See Gopal's comment on 8/3/2018
https://issues.apache.org/jira/browse/HIVE-19985?focusedCommentId=16568707=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16568707

Reporter: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20654) remove masking of "Masked writeid"

2018-09-28 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20654:
-

 Summary: remove masking of "Masked writeid"
 Key: HIVE-20654
 URL: https://issues.apache.org/jira/browse/HIVE-20654
 Project: Hive
  Issue Type: Improvement
  Components: Test, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


{{QOutProcessor}} has
{noformat}
ppm.add(new 
PatternReplacementPair(Pattern.compile("\\{\"writeid\":[1-9][0-9]*,\"bucketid\":"),
"{\"writeid\":### Masked writeid ###,\"bucketid\":"));
{noformat}

which causes something like 
{noformat}
{"writeid":### Masked writeid ###,"bucketid":536870912,"rowid":0}   2
{"writeid":### Masked writeid ###,"bucketid":536870912,"rowid":1}   3
{noformat}

in the {{*.q.out}} file.  For example, {{acid_meta_columns_decode.q}}

This was needed when the {{ROW__ID}} contained global transaction ID which
changed depending on which tests were ran previously.  Since 3.0, {{ROW__ID}} 
uses per table writeId which is stable and is safe to put in the .out file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20640) Upgrade Hive to use ORC 1.5.3

2018-09-26 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20640:
-

 Summary: Upgrade Hive to use ORC 1.5.3
 Key: HIVE-20640
 URL: https://issues.apache.org/jira/browse/HIVE-20640
 Project: Hive
  Issue Type: Improvement
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20635) VectorizedOrcAcidRowBatchReader doesn't filter delete events for original files

2018-09-25 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20635:
-

 Summary: VectorizedOrcAcidRowBatchReader doesn't filter delete 
events for original files
 Key: HIVE-20635
 URL: https://issues.apache.org/jira/browse/HIVE-20635
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


this is a followup to HIVE-16812 which adds support for delete event filtering 
for splits from native acid files

need to add the same for {{OrcSplit.isOriginal()}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68805: HIVE-20538

2018-09-24 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68805/#review208955
---




standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
Lines 2929 (patched)
<https://reviews.apache.org/r/68805/#comment293223>

This comment seems confusing to me.  Maybe give Kafka offset as a concrete 
example of point to some wiki where this API is documented.

for example,
"...for example to know if a transaction has
  already been committed"
which transaction is this talking about?



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 1095 (patched)
<https://reviews.apache.org/r/68805/#comment293218>

I think a MetaException would be better (or IllegalState/Argument).  
SQLException is generally produced by the DB and has sqlstate/sqlcode that 
various handlers try to examine.



standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
Lines 1105 (patched)
<https://reviews.apache.org/r/68805/#comment293219>

MetaException.  Also, it should at least include info to help identify what 
exactly failed, i.e. txnid, tableid, param/value.  W/o it's impossible to 
correlate this error batch id, etc. I'ld also add a LOG.warn() so that it's 
visible in the log file.
It seems you have a requirement that the parameter exist.  Perhaps as part 
of the error code path, you can do another query to see if does exist - I bet 
that would be a common error.



standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
Lines 2289 (patched)
<https://reviews.apache.org/r/68805/#comment293220>

why not make (tbleid,key,value) it's own object.  Then this object in 
CommitTxnRequest can be optional but all 3 fields in it can be mandatory.  as 
it is you are checking if they are set here and in TxnHandler.commit...



standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
Lines 101 (patched)
<https://reviews.apache.org/r/68805/#comment293221>

Nit: what is the advantage of using direct jdbc calls to modify the 
metastore DBMS.  Why not run "cretate table ...", "Alter table..." though 
Driver and "describe table to see the value"



standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
Lines 135 (patched)
<https://reviews.apache.org/r/68805/#comment293222>

should probably check that you got the right exception not just "any 
exception", i.e. check the message.


- Eugene Koifman


On Sept. 21, 2018, 3:51 p.m., Jaume Marhuenda wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68805/
> ---
> 
> (Updated Sept. 21, 2018, 3:51 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20538: Allow to store a key value together with a transaction.
> 
> 
> Diffs
> -
> 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/CommitTxnRequest.java
>  db47f9db8b 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-php/metastore/Types.php
>  22deffe1d3 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-py/hive_metastore/ttypes.py
>  38fac465d7 
>   
> standalone-metastore/metastore-common/src/gen/thrift/gen-rb/hive_metastore_types.rb
>  0192c6da31 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  df6d56b679 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/IMetaStoreClient.java
>  54e7eda0da 
>   standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift 
> 85a5c601e0 
>   
> standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/txn/TxnHandler.java
>  d76049eda1 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClientPreCatalog.java
>  ce590d0f55 
>   
> standalone-metastore/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStoreTxns.java
>  db4dd9ec42 
> 
> 
> Diff: https://reviews.apache.org/r/68805/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Jaume Marhuenda
> 
>



[jira] [Created] (HIVE-20604) Minor compaction disables ORC column stats

2018-09-19 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20604:
-

 Summary: Minor compaction disables ORC column stats
 Key: HIVE-20604
 URL: https://issues.apache.org/jira/browse/HIVE-20604
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 4.0.0


{noformat}
  @Override
  public org.apache.hadoop.hive.ql.exec.FileSinkOperator.RecordWriter
getRawRecordWriter(Path path, Options options) throws IOException {
final Path filename = AcidUtils.createFilename(path, options);
final OrcFile.WriterOptions opts =
OrcFile.writerOptions(options.getTableProperties(), 
options.getConfiguration());
if (!options.isWritingBase()) {
  opts.bufferSize(OrcRecordUpdater.DELTA_BUFFER_SIZE)
  .stripeSize(OrcRecordUpdater.DELTA_STRIPE_SIZE)
  .blockPadding(false)
  .compress(CompressionKind.NONE)
  .rowIndexStride(0)
  ;
}
{noformat}

{{rowIndexStride(0)}} makes {{StripeStatistics.getColumnStatistics()}} return 
objects but with meaningless values, like min/max for 
{{IntegerColumnStatistics}} set to MIN_LONG/MAX_LONG.

This interferes with ability to infer min ROW_ID for a split but also creates 
inefficient files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20581) Eliminate rename() from full CRUD transactional tables

2018-09-17 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20581:
-

 Summary: Eliminate rename() from full CRUD transactional tables
 Key: HIVE-20581
 URL: https://issues.apache.org/jira/browse/HIVE-20581
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Reporter: Eugene Koifman


The {{MoveTask}} in a query writing to full CRUD transactional table still 
performs a {{FileSystem.rename()}}.  Full CRUD should follow the insert-only 
transactional table implementation and write directly to delta_x_x in the 
partition dir.  If the txn fails, this delta will be marked aborted and will 
not be read.

There are several places that rely on this rename.  For example, support for 
{{Insert ... select ... Union All ... Select }} which creates multiple dirs, 1 
for each leg of the union.

Others?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20580) OrcInputFormat.isOriginal() should not rely on hive.acid.key.index

2018-09-17 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20580:
-

 Summary: OrcInputFormat.isOriginal() should not rely on 
hive.acid.key.index
 Key: HIVE-20580
 URL: https://issues.apache.org/jira/browse/HIVE-20580
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Eugene Koifman


{{org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.isOriginal()}} is checking 
for presence of {{hive.acid.key.index}} in the footer.  This is only created 
when the file is written by {{OrcRecordUpdater}}.  It should instead check for 
presence of Acid metadata columns so that a file can be produced by something 
other than {{OrcRecordUpater}}.

Also, {{hive.acid.key.index}} counts number of different type of events which 
is not really useful for Acid V2 (as of Hive 3) since each file only has 1 type 
of event.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20579) VectorizedOrcAcidRowBatchReader.checkBucketId() should run for unbucketed tables

2018-09-17 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20579:
-

 Summary: VectorizedOrcAcidRowBatchReader.checkBucketId() should 
run for unbucketed tables
 Key: HIVE-20579
 URL: https://issues.apache.org/jira/browse/HIVE-20579
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.1.0
Reporter: Eugene Koifman


VectorizedOrcAcidRowBatchReader.checkBucketId() currently bails for unbucketed 
tables
since HIVE-19890 all BucketCodec.decodeWriterId(ROW__ID.bucketid) should match 
the writer ID in the file name (e.g. bucket_1)

so it should still perform the check




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20553) more acid stats tests

2018-09-13 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20553:
-

 Summary: more acid stats tests
 Key: HIVE-20553
 URL: https://issues.apache.org/jira/browse/HIVE-20553
 Project: Hive
  Issue Type: Improvement
  Components: Statistics, Transactions
Affects Versions: 4.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-20553.01.patch





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20460) AcidUtils.Directory.getAbortedDirectories() may be missed for full CRUD tables

2018-08-24 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20460:
-

 Summary: AcidUtils.Directory.getAbortedDirectories() may be missed 
for full CRUD tables
 Key: HIVE-20460
 URL: https://issues.apache.org/jira/browse/HIVE-20460
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


{{Directory.getAbortedDirectories()}} lists deltas where all txns in the range 
are aborted.

These are then purged by {{Worker}} (\{{CompactorMR}} but only for insert-only 
tables.

Full CRUD tables currently rely on {{FileSystem.rename()}} in {{MoveTask}} and 
so no reader (or {{Cleaner}} should every see a delta where all data is 
aborted.  

 

Once rename() is eliminated for full CRUD (just like insert-only) transactional 
tables, Cleaner (or Worker) should take care of these.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20459) add ThriftHiveMetastore.get_open_txns(long txnid)

2018-08-24 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20459:
-

 Summary: add ThriftHiveMetastore.get_open_txns(long txnid)
 Key: HIVE-20459
 URL: https://issues.apache.org/jira/browse/HIVE-20459
 Project: Hive
  Issue Type: Improvement
  Components: Metastore, Transactions
Reporter: Eugene Koifman


we currently have {{ThriftHiveMetastore.get_open_txns()}} which maps to 
{{TxnHandler.getOpenTxns()}}.  The usual usage is 
{{TxnUtils.createValidReadTxnList(GetOpenTxnsResponse txns, long currentTxn)}} 
where the complete list transactions is obtained from Metastore and then 
anything above currentTxn is thrown away.  
Would be useful to add {{ThriftHiveMetastore.get_open_txns(long txnid)}} and 
{{TxnHandler.getOpenTxns(long)}} to not retrieve things that will be thrown 
away.  Especially when there are a lot of running transactions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20458) hive-schema-3.1.0.postgres.sql - some tables are not quoted

2018-08-24 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20458:
-

 Summary: hive-schema-3.1.0.postgres.sql - some tables are not 
quoted
 Key: HIVE-20458
 URL: https://issues.apache.org/jira/browse/HIVE-20458
 Project: Hive
  Issue Type: Improvement
  Components: Standalone Metastore, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


a number of tables related to transactional metadata are not quoted in this 
script:

COMPACTION_QUEUE, HIVE_LOCKS, etc

this causes Postgres to create the tables in lower case.  The table creation 
scripts should follow the same convention as other tables.

hive-schema...mysql.sql also doesn't quote Create Table for acid meta tables




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20454) CLONE - extend inheritPerms to ACID in Hive 1.X

2018-08-23 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20454:
-

 Summary: CLONE - extend inheritPerms to ACID in Hive 1.X
 Key: HIVE-20454
 URL: https://issues.apache.org/jira/browse/HIVE-20454
 Project: Hive
  Issue Type: Bug
Reporter: Eugene Koifman
Assignee: Sergey Shelukhin
 Fix For: 2.4.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20436) Lock Manager scalability - linear

2018-08-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20436:
-

 Summary: Lock Manager scalability - linear
 Key: HIVE-20436
 URL: https://issues.apache.org/jira/browse/HIVE-20436
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Hive TransactionManager currently has a mix of lock based and optimistic 
concurrency management techniques (which at times overlap).
For inserts with Dynamic Partitions that represents update/merge it acquires 
locks on each existing partition which can flood the metastore DB.
Need to clean up the logical model and the implementation.

This will be an umbrella Jira for this



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20435) Failed Dynamic Partition Insert into insert only table may looks transaction metadata

2018-08-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20435:
-

 Summary: Failed Dynamic Partition Insert into insert only table 
may looks transaction metadata
 Key: HIVE-20435
 URL: https://issues.apache.org/jira/browse/HIVE-20435
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


{{TxnHandler.enqueueLockWithRetry()}} has an optimization where it doesn't writ 
to {{TXN_COMPONENTS}} if the write is a dynamic partition insert because it 
expects to write to this table from {{addDynamicPartitions()}}.  

For insert-only, transactional tables, we create the target dir and start 
writing to it before {{addDynamicPartitions()}} is called.  So if a txn is 
aborted, we may have a delta dir in the partition but no corresponding entry in 
{{TXN_COMPONENTS}}.  This means {{TxnStore.cleanEmptyAbortedTxns()}} may clean 
up {{TXNS}} entry for the aborted transaction before Compactor removes this 
delta dir, at which point it looks like committed data.

Full CRUD are currently immune to this since they rely on "move" operation in 
MoveTask but longer term they should follow the same model as insert-only 
tables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20410) aborted Insert Overwrite on transactional table causes "Not enough history available for..." error

2018-08-16 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20410:
-

 Summary: aborted Insert Overwrite on transactional table causes 
"Not enough history available for..." error
 Key: HIVE-20410
 URL: https://issues.apache.org/jira/browse/HIVE-20410
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


suppose 
insert overwrite T values(1)
is aborted.

this creates a base_x directory (for insert-only transactional tables currently 
and for full CRUD once 'rename' in the MoveTask is eliminated) but subsequent 
read fails with "Not enough history available for..." error.

The problem is that the logic to produce this exception finds this base_x but 
treats it as if it was produced by a compactor, in which case the error would'v 
been appropriate.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20392) make compaction atomic on S3

2018-08-14 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20392:
-

 Summary: make compaction atomic on S3
 Key: HIVE-20392
 URL: https://issues.apache.org/jira/browse/HIVE-20392
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Eugene Koifman
Assignee: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20369) TestPreUpgradeTool not run by ptest

2018-08-11 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20369:
-

 Summary: TestPreUpgradeTool not run by ptest
 Key: HIVE-20369
 URL: https://issues.apache.org/jira/browse/HIVE-20369
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Eugene Koifman
Assignee: Eugene Koifman


TestPreUpgradeTool is not showing up in ptest runs
probably because upgrade-acid module is disconnected from root pom

how does standalone-metastore work?  it's also disconnected

also, hive-upgrade jar is not showing up in tar with mvn package



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 68281: HIVE-20354

2018-08-09 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68281/#review207051
---




ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
Lines 972 (patched)
<https://reviews.apache.org/r/68281/#comment290236>

what if some table is named "select_table"


- Eugene Koifman


On Aug. 9, 2018, 12:19 p.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68281/
> ---
> 
> (Updated Aug. 9, 2018, 12:19 p.m.)
> 
> 
> Review request for hive, Eugene Koifman and Jason Dere.
> 
> 
> Bugs: HIVE-20354
> https://issues.apache.org/jira/browse/HIVE-20354
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Semijoin hints dont work with merge statements.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 463880587e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
> 8df290435d 
>   ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 
> 
> 
> Diff: https://reviews.apache.org/r/68281/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



Re: Review Request 68281: HIVE-20354

2018-08-09 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68281/#review207043
---




ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java
Lines 1000 (patched)
<https://reviews.apache.org/r/68281/#comment290214>

Modifying parse tree directly is not a good idea - it messes up internal 
ANTLR strucutres and may cause issues downstream.  You should inject the hint 
into 'rewrittenQueryStr' so that a complete new statement is parsed - that is 
the model for all other parts of Merge reparsing.



ql/src/test/queries/clientpositive/semijoin_hint.q
Lines 116 (patched)
<https://reviews.apache.org/r/68281/#comment290215>

it may be useful to one statment with hint and another w/o hint - to see 
clearly the difference in the plan.


- Eugene Koifman


On Aug. 9, 2018, 10:44 a.m., Deepak Jaiswal wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68281/
> ---
> 
> (Updated Aug. 9, 2018, 10:44 a.m.)
> 
> 
> Review request for hive, Eugene Koifman and Jason Dere.
> 
> 
> Bugs: HIVE-20354
> https://issues.apache.org/jira/browse/HIVE-20354
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Semijoin hints dont work with merge statements.
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g f4d12ae564 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 
> 463880587e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/UpdateDeleteSemanticAnalyzer.java 
> 8df290435d 
>   ql/src/test/queries/clientpositive/semijoin_hint.q de176affd3 
>   ql/src/test/results/clientpositive/llap/semijoin_hint.q.out 679916de07 
> 
> 
> Diff: https://reviews.apache.org/r/68281/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Deepak Jaiswal
> 
>



[jira] [Created] (HIVE-20327) Compactor should gracefully handle 0 length files and invalid orc files

2018-08-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20327:
-

 Summary: Compactor should gracefully handle 0 length files and 
invalid orc files
 Key: HIVE-20327
 URL: https://issues.apache.org/jira/browse/HIVE-20327
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Older versions of Streaming API did not handle interrupts well and could leave 
0-length ORC files behind which cannot be read.

These should be just skipped.

Other cases of file where ORC Reader cannot be created
1. regular write (1 txn delta) where the client died and didn't properly close 
the file - this delta should be aborted and never read
2. streaming ingest write (delta_x_y, x < y).  There should always be a side 
file if the file was not closed properly. (though it may still indicate that 
length is 0)


If we check these cases and still can't create a reader, it should not silently 
skip the file since the system thinks it contains at least some committed data 
but the file is corrupted (and the side file doesn't point at a valid footer) - 
we should never be in this situation and we should throw so that the end user 
can try manual intervention (where the only option may be deleting the file)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20324) change hive.compactor.max.num.delta default to 50

2018-08-06 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20324:
-

 Summary: change hive.compactor.max.num.delta default to 50
 Key: HIVE-20324
 URL: https://issues.apache.org/jira/browse/HIVE-20324
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 2.0.0
Reporter: Eugene Koifman


current default is 500 - this is way to hight.  OOM is likely at 50 or so.
Need to update the default.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20313) consider making ROW__ID a 1st class object

2018-08-03 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20313:
-

 Summary: consider making ROW__ID a 1st class object
 Key: HIVE-20313
 URL: https://issues.apache.org/jira/browse/HIVE-20313
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 0.11.0
Reporter: Eugene Koifman


ROW__ID, which is a struct that represents a unique row ID within a partition 
of a full CRUD transactional table is currently modeled as a {{VirtualColumn}}. 
 Acid metadata columns from which ROW__ID is built are actually stored in the 
data file.  

There is no end to special handling of acid metadata columns in the code to 
make this work.

Perhaps a better approach is to add struct column to an acid table at creation 
time and make it a 1st class citizen visible in the metastore.  'select 
count(*) ' would need special handling to remove it.  There may need to be 
a way to make these columns read-only.

For data added via Load Data, Add Partition, etc (i.e. original files in a CRUD 
table), acid reader would have fill in the values as it does today.

This would make schema evolution, PPD, projection pruning work seamlessly.
This should also make adding formats other than ORC in full CRUD tables easy.

This will likely be painful but should be investigated.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20305) LlapRecordReader uses OrcInputFormat.getRootColumn(false)

2018-08-03 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20305:
-

 Summary: LlapRecordReader uses OrcInputFormat.getRootColumn(false)
 Key: HIVE-20305
 URL: https://issues.apache.org/jira/browse/HIVE-20305
 Project: Hive
  Issue Type: Bug
  Components: llap, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


LlapRecordReader uses OrcInputFormat.getRootColumn(false) so it seems to assume 
that
if {{AcidUtils.isFullAcidScan(jobConf)}} then the underlying file has acid meta 
columns in it.  That is not true, for data added via Load Data, Add Partition 
or converting flat table to full CRUD acid via Alter Table (by setting 
transactional=true tbl property).

cc [~teddy.choi]




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20234) Add an option to disable stats computation from Compactor

2018-07-24 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20234:
-

 Summary: Add an option to disable stats computation from Compactor
 Key: HIVE-20234
 URL: https://issues.apache.org/jira/browse/HIVE-20234
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman


Currently \{{Woker.StatsUpdater}} will run \{{analyze table ... compute 
statistics for columns ...}} at the end of each Major compaction to update 
stats on columns that already have stats.

 

It would be useful to add a config option that allows better control over this. 
 I could have 3 values: don't update col stats, update existing col stats, 
update all col stats.

Should this have ability to update table level stats?  Is that needed given 
HIVE-19532?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20218) make sure Statement.executeUpdate() returns number of rows affected

2018-07-20 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20218:
-

 Summary: make sure Statement.executeUpdate() returns number of 
rows affected
 Key: HIVE-20218
 URL: https://issues.apache.org/jira/browse/HIVE-20218
 Project: Hive
  Issue Type: Improvement
  Components: JDBC, Transactions
Affects Versions: 1.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


HiveStatement and HivePreparedStatement currently return 0 in all cases



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67969: HIVE-20115 Acid tables should not use footer scan for analyze

2018-07-19 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67969/#review206245
---




ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java
Line 87 (original), 88 (patched)
<https://reviews.apache.org/r/67969/#comment289150>

This is not introduced in this patch, but maybe this "if" should be a 
method since the same condition is checked in 3 places - to keep it in sync.



ql/src/test/queries/clientpositive/acid_no_buckets.q
Lines 37 (patched)
<https://reviews.apache.org/r/67969/#comment289149>

I don't understand this comment.  There was update/insert done (line 25) 
since last analyze at line 22-23.  Shouldn't analyze at 34-35 change stats?
Or are they auto updated after each statement?


- Eugene Koifman


On July 18, 2018, 4:19 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67969/
> ---
> 
> (Updated July 18, 2018, 4:19 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRTableScan1.java 
> 64f9c70f05 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/ProcessAnalyzeTable.java 
> 03cceace40 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java 49709e596e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkProcessAnalyzeTable.java
>  28d4de7f7b 
>   ql/src/test/queries/clientpositive/acid_no_buckets.q bcf9e0634b 
>   ql/src/test/results/clientpositive/llap/acid_no_buckets.q.out 36a6a5d5d1 
> 
> 
> Diff: https://reviews.apache.org/r/67969/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-20137) Truncate for Transactional tables should use base_x

2018-07-10 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20137:
-

 Summary: Truncate for Transactional tables should use base_x
 Key: HIVE-20137
 URL: https://issues.apache.org/jira/browse/HIVE-20137
 Project: Hive
  Issue Type: Improvement
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


This is a follow up to HIVE-19387.

Once we have a lock that blocks writers but not readers (HIVE-19369), it would 
make sense to make truncate create a new base_x, where is x is a writeId in 
current txn - the same as Insert Overwrite does.

This would mean it can work w/o interfering with existing writers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-20119) permissions on files in transactional tables

2018-07-07 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-20119:
-

 Summary: permissions on files in transactional tables
 Key: HIVE-20119
 URL: https://issues.apache.org/jira/browse/HIVE-20119
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Reporter: Eugene Koifman
Assignee: Eugene Koifman


What should these be?  With doAs they end up being owned by the user and then 
depending on umask cleaner may not be able to delete them - thus compaction is 
marked as failed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 67712: HIVE-19820 add ACID stats support to background stats updater

2018-06-27 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67712/#review205443
---




itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
Line 424 (original), 424 (patched)
<https://reviews.apache.org/r/67712/#comment288363>

arg4? arg5? is this decompiled code?



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
Lines 291 (patched)
<https://reviews.apache.org/r/67712/#comment288365>

there are several read ops in this txn - what semantics is the txn trying 
to achive here?



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
Line 296 (original), 324 (patched)
<https://reviews.apache.org/r/67712/#comment288366>

0 is not a valid transaction id



ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java
Line 412 (original), 440 (patched)
<https://reviews.apache.org/r/67712/#comment288367>

0 is not a valid txn id


- Eugene Koifman


On June 22, 2018, 5:29 p.m., Sergey Shelukhin wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67712/
> ---
> 
> (Updated June 22, 2018, 5:29 p.m.)
> 
> 
> Review request for hive, Eugene Koifman and Seong (Steve) Yeom.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> see jira
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  580bae9c3f1307325842a08275e085a8e31f9351 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUpdaterThread.java 
> ddca70497a3f51c3ec9ea532fac2a42aa36149b3 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorThread.java 
> dd0929f2b9748d83d55ccc271cec6aa07933bde1 
>   ql/src/test/org/apache/hadoop/hive/ql/stats/TestStatsUpdaterThread.java 
> 14f86eabbcf4bfc38c92294cd5d71d4905eb5c30 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  4296084381df1e109248820b96739a4eb5ee0490 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java
>  51e081b22fa27b013715bb6eddf7fbbcf6bbd061 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  9266879ad0134dbf87598af6f9305b73cc8c40ba 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
>  8cc9d2c586a411712d01d599ff2986f6ad5e0cfd 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
>  e4894fa12bfee78f51f3796e0ccaaf51c7ac4136 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  001c3edcff5a4d0ea67b73e83075b1f867342654 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  d6a882e8e98f92eefbdb7900bdf43e3274a21c5d 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/VerifyingObjectStore.java
>  c9a6a471cb7fc28845efb6d774601dba0cef2a85 
> 
> 
> Diff: https://reviews.apache.org/r/67712/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>



[jira] [Created] (HIVE-19965) Make HiveEndPoint use IMetaStoreClient.add_partition

2018-06-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19965:
-

 Summary: Make HiveEndPoint use IMetaStoreClient.add_partition
 Key: HIVE-19965
 URL: https://issues.apache.org/jira/browse/HIVE-19965
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


it currently uses "alter table add partition if exists..."

which since HIVE-18814 requires X lock on the table which blocks other 
streaming writers from making progress.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19962) remove hive.txn.operational.properties

2018-06-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19962:
-

 Summary: remove hive.txn.operational.properties
 Key: HIVE-19962
 URL: https://issues.apache.org/jira/browse/HIVE-19962
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


hive.txn.operational.properties should be removed and refs to it clean up - now 
that Acid V2 is in, this is no longer needed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19961) Add partition if exists on transactional CRUD table acquires X lock

2018-06-21 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19961:
-

 Summary: Add partition if exists on transactional CRUD table 
acquires X lock
 Key: HIVE-19961
 URL: https://issues.apache.org/jira/browse/HIVE-19961
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


This is necessary for correctness since each add partition consists of 2 parts
 # Add Partition metadata object to metastore
 # Create a delta dir and copy data there.  

This means it's neither Atomic not Isolated.  Isolation is fixed by using X 
lock (which is currently on the table.  todo: see if it can be made on the 
partition being created - this may block table level locks...)

Atomicity would have to be addressed by adding a write ID to Partition to that 
it's not visible until Hive transaction has committed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19917) Import of full CRUD transactional table fails if table is not in default database

2018-06-15 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19917:
-

 Summary: Import of full CRUD transactional table fails if table is 
not in default database
 Key: HIVE-19917
 URL: https://issues.apache.org/jira/browse/HIVE-19917
 Project: Hive
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


The actual issues is fixed by HIVE-19861.
This is a follow up to add a test case.

Issue:
{noformat}
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Can not create a Path from a null string
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:940) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:945) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hadoop.hive.ql.exec.DDLTask.createTableLike(DDLTask.java:5099) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:433) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeAcidExport(UpdateDeleteSemanticAnalyzer.java:195)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hadoop.hive.ql.parse.UpdateDeleteSemanticAnalyzer.analyzeInternal(UpdateDeleteSemanticAnalyzer.java:106)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:288)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:658) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1813) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1760) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1755) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:194)
 ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:257)
 ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.operation.Operation.run(Operation.java:243) 
~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:541)
 ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:527)
 ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:312)
 ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:562)
 ~[hive-service-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1557)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1542)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:647)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
 ~[hive-exec-3.0.0.3.0.0.0-1485.jar:3.0.0.3.0.0.0-1485]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[?:1.8.0_112]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[?:1.8.0_112]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: java.lang.IllegalArgumentException: Can not create a Path from a 
null string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:164) 
~[hadoop-common-3.0.0.3.0.0.0-1485.jar

[jira] [Created] (HIVE-19908) Block Insert Overwrite with Union All on full CRUD ACID tables using HIVE_UNION_SUBDIR_

2018-06-14 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19908:
-

 Summary: Block Insert Overwrite with Union All on full CRUD ACID 
tables using HIVE_UNION_SUBDIR_
 Key: HIVE-19908
 URL: https://issues.apache.org/jira/browse/HIVE-19908
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


This currently results in data loss.  Will block and suggest using truncate + 
insert.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19800) Handle rename files post HIVE-19751

2018-06-05 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19800:
-

 Summary: Handle rename files post HIVE-19751
 Key: HIVE-19800
 URL: https://issues.apache.org/jira/browse/HIVE-19800
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


this is a followup to HIVE-19751 which includes HIVE-19751 since it hasn't 
landed yet

this includes file rename logic and HIVE-19750 since it hasn't landed yet either

 

cc [~jdere]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19751) create submodule of hive-upgrade-acid for preUpgrade and postUpgrade

2018-05-31 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19751:
-

 Summary: create submodule of hive-upgrade-acid for preUpgrade and 
postUpgrade
 Key: HIVE-19751
 URL: https://issues.apache.org/jira/browse/HIVE-19751
 Project: Hive
  Issue Type: Sub-task
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Basically need to produce 2 separate jars: 1 for pre-upgrade step that can be 
compiled/unit tested with 2.x jars and another can be compiled/tested with 3.x 
jars. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19750) Initialize NEXT_WRITE_ID. NWI_NEXT on converting an existing table to full acid

2018-05-31 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19750:
-

 Summary: Initialize NEXT_WRITE_ID. NWI_NEXT on converting an 
existing table to full acid
 Key: HIVE-19750
 URL: https://issues.apache.org/jira/browse/HIVE-19750
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Fix For: 3.1.0


Need to set this to a reasonably high value the the table.
This will reserve a range of write IDs that will be treated by the system as 
committed.
This is needed so that we can assign unique ROW__IDs to each row in files that 
already exist in the table.  For example, if the value is initialized to the 
number of files currently in the table, we can think of each file as written by 
a separate transaction and thus a free to assign bucketProperty (BucketCodec) 
of ROW_ID in whichever way is convenient.
it's guaranteed that all rows get unique ROW_IDs this way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19749) Acid V1 to V2 upgrade

2018-05-31 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19749:
-

 Summary: Acid V1 to V2 upgrade
 Key: HIVE-19749
 URL: https://issues.apache.org/jira/browse/HIVE-19749
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


umbrella jira



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19735) Transactional table: rename partition

2018-05-29 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19735:
-

 Summary: Transactional table: rename partition
 Key: HIVE-19735
 URL: https://issues.apache.org/jira/browse/HIVE-19735
 Project: Hive
  Issue Type: Bug
Reporter: Eugene Koifman
Assignee: Eugene Koifman


Hive supports renaming a partiton

[https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RenamePartition]

 

is this addressed by HIVE-18748?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19714) TransactionalValidationListene.conformToAcid() only checks table level StorageDescriptor

2018-05-25 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19714:
-

 Summary: TransactionalValidationListene.conformToAcid() only 
checks table level StorageDescriptor
 Key: HIVE-19714
 URL: https://issues.apache.org/jira/browse/HIVE-19714
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 2.2.0
Reporter: Eugene Koifman


A table may actually have different SD for each partition so a proper check to 
for full CRUD table would check all of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19606) Straggler thread in HS2 for rename directory operation stuck in loop causing performance issue and cluster slowdown

2018-05-18 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19606:
-

 Summary: Straggler thread in HS2 for rename directory operation 
stuck in loop causing performance issue and cluster slowdown
 Key: HIVE-19606
 URL: https://issues.apache.org/jira/browse/HIVE-19606
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 1.0.0
Reporter: Eugene Koifman






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19599) Release Notes : Highlighting backwards incompatible changes

2018-05-17 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19599:
-

 Summary: Release Notes : Highlighting backwards incompatible 
changes
 Key: HIVE-19599
 URL: https://issues.apache.org/jira/browse/HIVE-19599
 Project: Hive
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Vineet Garg


We need to highlight backwards incompatible changes.  A list Jira titles won't 
be sufficient.

For example, tables with Acid V1 (pre 3.0) data has to be major compacted 
before upgrade and may not process any update/delete/merge until after upgrade. 
 Not doing so may result in data corruption/loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19598) Acid V1 to V2 upgrade

2018-05-17 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19598:
-

 Summary: Acid V1 to V2 upgrade
 Key: HIVE-19598
 URL: https://issues.apache.org/jira/browse/HIVE-19598
 Project: Hive
  Issue Type: Bug
  Components: Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman


The on-disk layout for full acid (transactional) tables has changed 3.0.

Any transactional table that has any update/delete events in any deltas that 
have not been Major compacted, must go through a Major compaction before 
upgrading to 3.0.  No more update/delete/merge should be run after/during major 
compaction.

Not doing so will result in data corruption/loss.

 

Need to create a utility tool to help with this process.  HIVE-19233 started 
this but it needs more work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19569) alter table db1.t1 rename db2.t2 generates MetaStoreEventListener.onDropTable()

2018-05-15 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19569:
-

 Summary: alter table db1.t1 rename db2.t2 generates 
MetaStoreEventListener.onDropTable()
 Key: HIVE-19569
 URL: https://issues.apache.org/jira/browse/HIVE-19569
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Standalone Metastore, Transactions
Affects Versions: 3.0.0
Reporter: Eugene Koifman


When renaming a table within the same DB, this operation causes 
{{MetaStoreEventListener.onAlterTable()}} to fire but when changing DB name for 
a table it causes {{MetaStoreEventListener.onDropTable()}} + 
{{MetaStoreEventListener.onCreateTable()}}.
The files from original table are moved to new table location.  
This creates confusing semantics since any logic in {{onDropTable()}} doesn't 
know about the larger context, i.e. that there will be a matching 
{{onCreateTable()}}.

In particular, this causes a problem for Acid tables since files moved from old 
table use WriteIDs that are not meaningful with the context of new table.

Current implementation is due to replication.  This should ideally be changed 
to raise a "not supported" error for tables that are marked for replication.

cc [~sankarh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HIVE-19387) CLONE - Truncate table for Acid tables

2018-05-02 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19387:
-

 Summary: CLONE - Truncate table for Acid tables
 Key: HIVE-19387
 URL: https://issues.apache.org/jira/browse/HIVE-19387
 Project: Hive
  Issue Type: New Feature
  Components: Transactions
Reporter: Eugene Koifman
Assignee: Eugene Koifman


How should this work?  Should it work like Insert Overwrite T select * from T 
where 1=2?
This should create a new empty base_x/ and thus operate w/o violating Snapshot 
Isolation semantics.

This makes sense for specific partition or unpartitioned table.  What about 
"Truncate T" where T is partitioned?  Is the expectation to wipe out all 
partition info or to make each partition empty?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning

2018-05-01 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/#review202231
---


Ship it!




Ship It!

- Eugene Koifman


On May 1, 2018, 2:53 p.m., Prasanth_J wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66645/
> ---
> 
> (Updated May 1, 2018, 2:53 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Eugene Koifman.
> 
> 
> Bugs: HIVE-19211
> https://issues.apache.org/jira/browse/HIVE-19211
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-19211: New streaming ingest API and support for dynamic partitioning
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 5a13726 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  90dbdac 
>   itests/hive-unit/pom.xml 3ae7f2f 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  8ee033d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java 
> PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java 
> a66c135 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 09f8802 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 76569d5 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java f6608eb 
>   serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
>  8c159e9 
>   streaming/pom.xml b58ec01 
>   streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java 
> 25998ae 
>   streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb 
>   streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java 
> 898b3f9 
>   streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 
>   streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 
>   streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java 
> 23e17e7 
>   streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 
>   streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 
>   streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d 
>   streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java 
> 762f5f8 
>   streaming/src/java/org/apache/hive/streaming/PartitionCreationFailed.java 
> 5f9aca6 
>   streaming/src/java/org/apache/hive/streaming/PartitionHandler.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/PartitionInfo.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/QueryFailedException.java 
> ccd3ae0 
>   streaming/src/java/org/apache/hive/streaming/RecordWriter.java dc6d70e 
>   streaming/src/java/org/apache/hive/streaming/SerializationError.java 
> a57ba00 
>   streaming/src/java/org/apache/hive/streaming/StreamingConnection.java 
> 2f760ea 
>   streaming/src/java/org/apache/hive/streaming/StreamingException.java 
> a7f84c1 
>   streaming/src/java/org/apache/hive/streaming/StreamingIOFailure.java 
> 0dfbfa7 
>   
> streaming/src/java/org/apache/hive/streaming/StrictDelimitedInputWriter.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/StrictJsonWriter.java 0077913 
>   streaming/src/java/org/apache/hive/streaming/StrictRegexWriter.java c0b7324 
>   streaming/src/java/org/apache/hive/streaming/TransactionBatch.java 2b05771 
>   
> streaming/src/java/org/apache/hive/streaming/TransactionBatchUnAvailable.java 
> a8c8cd4 
>   streaming/src/java/org/apache/hive/streaming/TransactionError.java a331b20 
>   streaming/src/test/org/apache/hive/streaming/TestDelimitedInputWriter.java 
> f0843a1 
>   streaming/src/test/org/apache/hive/streaming/TestStreaming.java 0ec3048 
>   
> streaming/src/test/org/apache/hive/streaming/TestStreamingDynamicPartitioning.java
>  PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/66645/diff/12/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Prasanth_J
> 
>



Re: Review Request 66645: HIVE-19211: New streaming ingest API and support for dynamic partitioning

2018-05-01 Thread Eugene Koifman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/66645/#review202201
---




streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Line 490 (original), 414 (patched)
<https://reviews.apache.org/r/66645/#comment283952>

should these be final?



streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java
Lines 442 (patched)
<https://reviews.apache.org/r/66645/#comment283953>

what does this achieve?
you said that this API doesn't support concurrency and 
begin/commit/close/ect have be sequntial.  That means that minTxnId of the 
batch can only change linearly.  So what does atomicReference buy you over 
'volotile' for example?

commitImpl, is not atomic - it calls msClient.commitTxn() and then adjusts 
minTxn.  But a hreartbeat between these 2 will end up heartbeating a committed 
txn...

It seems that maxTxn never changes, and the only thing that needs to be 
updated in the HeartbeatRunnable is the minTxn which needs to be volatile.  

Is there something else that this is trying to solve?



streaming/src/test/org/apache/hive/streaming/TestStreaming.java
Lines 589 (patched)
<https://reviews.apache.org/r/66645/#comment283950>

followup jira?



hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
Line 442 (original), 442 (patched)
<https://reviews.apache.org/r/66645/#comment283943>

is there a follow up jira for this?



itests/hive-unit/pom.xml
Lines 79 (patched)
<https://reviews.apache.org/r/66645/#comment283944>

what change requires this?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 471 (patched)
<https://reviews.apache.org/r/66645/#comment283945>

if you have only 1 txn in a batch, why call flush at all?  (this flush() is 
called when commit() is called) . Won't closign the file do the right thing?


- Eugene Koifman


On April 30, 2018, 4:10 p.m., Prasanth_J wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/66645/
> ---
> 
> (Updated April 30, 2018, 4:10 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Eugene Koifman.
> 
> 
> Bugs: HIVE-19211
> https://issues.apache.org/jira/browse/HIVE-19211
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-19211: New streaming ingest API and support for dynamic partitioning
> 
> 
> Diffs
> -
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 6e35653 
>   
> hcatalog/streaming/src/test/org/apache/hive/hcatalog/streaming/TestStreaming.java
>  90dbdac 
>   itests/hive-unit/pom.xml 3ae7f2f 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java
>  8ee033d 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveClientCache.java 
> PRE-CREATION 
>   metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreUtils.java 
> a66c135 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 09f8802 
>   ql/src/java/org/apache/hadoop/hive/ql/lockmgr/DbTxnManager.java 76569d5 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 4661881 
>   serde/src/java/org/apache/hadoop/hive/serde2/JsonSerDe.java PRE-CREATION 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
>  8c159e9 
>   streaming/pom.xml b58ec01 
>   streaming/src/java/org/apache/hive/streaming/AbstractRecordWriter.java 
> 25998ae 
>   streaming/src/java/org/apache/hive/streaming/ConnectionError.java 668bffb 
>   streaming/src/java/org/apache/hive/streaming/ConnectionInfo.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/DelimitedInputWriter.java 
> 898b3f9 
>   streaming/src/java/org/apache/hive/streaming/HeartBeatFailure.java b1f9520 
>   streaming/src/java/org/apache/hive/streaming/HiveEndPoint.java b04e137 
>   streaming/src/java/org/apache/hive/streaming/HiveStreamingConnection.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/ImpersonationFailed.java 
> 23e17e7 
>   streaming/src/java/org/apache/hive/streaming/InvalidColumn.java 0011b14 
>   streaming/src/java/org/apache/hive/streaming/InvalidPartition.java f1f9804 
>   streaming/src/java/org/apache/hive/streaming/InvalidTable.java ef1c91d 
>   streaming/src/java/org/apache/hive/streaming/InvalidTransactionState.java 
> PRE-CREATION 
>   streaming/src/java/org/apache/hive/streaming/InvalidTrasactionState.java 
> 762f5f8 
>   streaming/src/

[jira] [Created] (HIVE-19377) TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) (batchId=286)

2018-05-01 Thread Eugene Koifman (JIRA)
Eugene Koifman created HIVE-19377:
-

 Summary: TestTxnExIm - did not produce a TEST-*.xml file (likely 
timed out) (batchId=286)
 Key: HIVE-19377
 URL: https://issues.apache.org/jira/browse/HIVE-19377
 Project: Hive
  Issue Type: Sub-task
Reporter: Eugene Koifman


{{TestTxnExIm - did not produce a TEST-*.xml file (likely timed out) 
(batchId=286)}}

appears routinely in runs.  

{{mvn test -Dtest=TestExIm}} fails locally with 
{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:2.21.0:test (default-test) on 
project hive-exec: There are test failures.
[ERROR] 
[ERROR] Please refer to 
/Users/ekoifman/IdeaProjects/hive/ql/target/surefire-reports for the individual 
test results.
[ERROR] Please refer to dump files (if any exist) [date]-jvmRun[N].dump, 
[date].dumpstream and [date]-jvmRun[N].dumpstream.
[ERROR] ExecutionException The forked VM terminated without properly saying 
goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/ekoifman/IdeaProjects/hive/ql && 
/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java 
-Xmx2048m -jar 
/Users/ekoifman/IdeaProjects/hive/ql/target/surefire/surefirebooter4071469472847953044.jar
 /Users/ekoifman/IdeaProjects/hive/ql/target/surefire 
2018-05-01T10-56-27_610-jvmRun1 surefire8633599521000249236tmp 
surefire_02958205604336140780tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hive.ql.TestTxnExIm
[ERROR] org.apache.maven.surefire.booter.SurefireBooterForkException: 
ExecutionException The forked VM terminated without properly saying goodbye. VM 
crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /Users/ekoifman/IdeaProjects/hive/ql && 
/Library/Java/JavaVirtualMachines/jdk1.8.0_162.jdk/Contents/Home/jre/bin/java 
-Xmx2048m -jar 
/Users/ekoifman/IdeaProjects/hive/ql/target/surefire/surefirebooter4071469472847953044.jar
 /Users/ekoifman/IdeaProjects/hive/ql/target/surefire 
2018-05-01T10-56-27_610-jvmRun1 surefire8633599521000249236tmp 
surefire_02958205604336140780tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] Crashed tests:
[ERROR] org.apache.hadoop.hive.ql.TestTxnExIm
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:494)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:441)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:293)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1149)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:978)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:854)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:134)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:309)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:194)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:107)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:955)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.

  1   2   3   4   5   6   7   8   9   10   >