Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-13 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204697
---


Ship it!




Ship It!

- Vihang Karajgaonkar


On June 6, 2018, 11:05 a.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> ---
> 
> (Updated June 6, 2018, 11:05 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
> getMPartition, so it does not have to query the table object for every time 
> if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a 
> new version of getMPartition, which can use the provided partitionKeys 
> instead of querying it again.
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  0cc0ae5 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  77ed2b4 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  b15d89d 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
>  283798c 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
>  9da8d72 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  0461c4e 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/3/
> 
> 
> Testing
> ---
> 
> Run several performance tests with Sasha's performance tool. These 
> optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-06 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
---

(Updated June 6, 2018, 11:05 a.m.)


Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.


Changes
---

Updated the diff based on Sasha's comments


Bugs: HIVE-19718
https://issues.apache.org/jira/browse/HIVE-19718


Repository: hive-git


Description
---

Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
getMPartition, so it does not have to query the table object for every time if 
we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new 
version of getMPartition, which can use the provided partitionKeys instead of 
querying it again.


Diffs (updated)
-

  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 0cc0ae5 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 77ed2b4 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 b15d89d 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
 283798c 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 9da8d72 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 0461c4e 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 b71eda4 


Diff: https://reviews.apache.org/r/67351/diff/3/

Changes: https://reviews.apache.org/r/67351/diff/2-3/


Testing
---

Run several performance tests with Sasha's performance tool. These 
optimisations shave of ~10% of the runtime


Thanks,

Peter Vary



Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-06 Thread Peter Vary via Review Board


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> >

Thanks for the review Sasha!

Dropped the ones I think are should not be part of this change, but feel free 
to reopen them if you think otherwise.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2417 (original), 2428 (patched)
> > 
> >
> > Do we need a version of this that can be called when we already have 
> > the MTable object?

Yes. There are several public methods, which call the original version and the 
table object is not available at hand in the caller method.
To keep the change manageable changed only where the addPartitions benefited 
from the it.
We can file a follow-up jira to push down the partition keys on other places 
too.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2428 (original), 2434 (patched)
> > 
> >
> > Note that this may throw an exception which you need to catch and 
> > rollback your transaction. SO immediately after openTransaction you need a 
> > try-finally block.

Is it good practice to handle RunTimeException-s as well? Looking at the 
getMTable definition, it should not throw any exception by design. It will 
return null, if the table is not found.
Anyway it feels more correct if we handle open/commit/rollback in one place, so 
I added it.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2430 (original)
> > 
> >
> > missing commitTransaction()

It is there after getting the partitions. The goal is here to make sure that 
getting the table info, and getting the partition info is done in one 
transaction


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2467 (patched)
> > 
> >
> > Strictly speaking we do not need openTransaction() inside try block

I am not sure what is the general usage pattern for transactions and exceptions 
in ObjectStore.
If the query fails, I think it is a good idea to roll back the whole 
transaction, including the enclosing ones. At least this is the general 
behaviour I have seen.
Feel free to reopen if you disagree.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2446 (original), 2477 (patched)
> > 
> >
> > The rest can be done outside of try block - we already committed the 
> > transaction.

Keeping this part of the code unchanged helps with backports a lot - As a 
general rule I try to avoid changes which does not have immediate impact.
Refactoring the code outside the try/catch makes the code would be more 
readeable, but if it does not have other serious impact, I would be compelled 
to keep it as-is.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2449 (original), 2480 (patched)
> > 
> >
> > Can we do this with setUnique(true) instead?

I think this part of the code has to do with the comment:
  // We need to compare partition name with requested name since some DBs
  // (like MySQL, Derby) considers 'a' = 'a ' whereas others like (Postgres,
  // Oracle) doesn't exhibit this problem.
  
So if we want to do this change then it should be in a separate jira.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2456 (original), 2487 (patched)
> > 
> >
> > Can we just return mpart here?

Keeping this part of the code unchanged helps with backports a lot - As a 
general rule I try to avoid changes which does not have immediate impact.


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2535 (patched)
> > 
> >
> > Can you document this behavior?

Sure! Thanks! :)


> On June 5, 2018, 7:37 p.m., Alexander Kolbasov wrote:
> > 

Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-05 Thread Alexander Kolbasov

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204338
---




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2417 (original), 2428 (patched)


Do we need a version of this that can be called when we already have the 
MTable object?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2428 (original), 2434 (patched)


Note that this may throw an exception which you need to catch and rollback 
your transaction. SO immediately after openTransaction you need a try-finally 
block.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2430 (original)


missing commitTransaction()



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2467 (patched)


Strictly speaking we do not need openTransaction() inside try block



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2446 (original), 2477 (patched)


The rest can be done outside of try block - we already committed the 
transaction.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2449 (original), 2480 (patched)


Can we do this with setUnique(true) instead?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2456 (original), 2487 (patched)


Can we just return mpart here?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2535 (patched)


Can you document this behavior?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2487 (original), 2538 (patched)


Can you document this behavior?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 9120 (original), 9171 (patched)


I'm wondering whether we can make things faster a bit by using the fact 
that we only need to know whether partition exists and don't care about 
fetching any actual values from partitions - we can have a potentially faster 
query that just checks for partition existence.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
Line 331 (original), 331 (patched)


Please update Javadoc as well


- Alexander Kolbasov


On June 5, 2018, 7:54 a.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> ---
> 
> (Updated June 5, 2018, 7:54 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
> getMPartition, so it does not have to query the table object for every time 
> if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a 
> new version of getMPartition, which can use the provided partitionKeys 
> instead of querying it again.
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  0cc0ae5 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  d8b8414 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  b15d89d 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
>  283798c 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
>  9da8d72 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  0461c4e 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/2/
> 
> 
> Testing
> ---
> 
> Run several 

Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-05 Thread Peter Vary via Review Board


> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2411-2416 (original)
> > 
> >
> > Why do we need to remove these lines?

Good point.
Added openTransaction back.
Thanks for noticing


> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Line 2422 (original)
> > 
> >
> > Why do we need to remove this line?

Good point.
Added commitTransaction back.
Thanks for noticing


> On June 4, 2018, 6:21 p.m., Vihang Karajgaonkar wrote:
> > standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
> > Lines 2426 (patched)
> > 
> >
> > Adding a javadoc would be great. esp. mentioning that the advantage of 
> > using this method and when its better to use it.

Added javadoc


- Peter


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204262
---


On June 5, 2018, 7:54 a.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> ---
> 
> (Updated June 5, 2018, 7:54 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
> getMPartition, so it does not have to query the table object for every time 
> if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a 
> new version of getMPartition, which can use the provided partitionKeys 
> instead of querying it again.
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  0cc0ae5 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  d8b8414 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  b15d89d 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
>  283798c 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
>  9da8d72 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  0461c4e 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  b71eda4 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/2/
> 
> 
> Testing
> ---
> 
> Run several performance tests with Sasha's performance tool. These 
> optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-05 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
---

(Updated June 5, 2018, 7:54 a.m.)


Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.


Changes
---

Addressed Vihang's comment, and fixed test failures:
- Added transaction to the original getMPartition method
- Added javadoc
- Removed accidentally added extra line from HiveMetaStore.java


Bugs: HIVE-19718
https://issues.apache.org/jira/browse/HIVE-19718


Repository: hive-git


Description
---

Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
getMPartition, so it does not have to query the table object for every time if 
we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new 
version of getMPartition, which can use the provided partitionKeys instead of 
querying it again.


Diffs (updated)
-

  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 0cc0ae5 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 d8b8414 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 b15d89d 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
 283798c 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 9da8d72 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 0461c4e 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 b71eda4 


Diff: https://reviews.apache.org/r/67351/diff/2/

Changes: https://reviews.apache.org/r/67351/diff/1-2/


Testing
---

Run several performance tests with Sasha's performance tool. These 
optimisations shave of ~10% of the runtime


Thanks,

Peter Vary



Re: Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-06-04 Thread Vihang Karajgaonkar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/#review204262
---




standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2411-2416 (original)


Why do we need to remove these lines?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Line 2422 (original)


Why do we need to remove this line?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
Lines 2426 (patched)


Adding a javadoc would be great. esp. mentioning that the advantage of 
using this method and when its better to use it.


- Vihang Karajgaonkar


On May 29, 2018, 10:53 a.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67351/
> ---
> 
> (Updated May 29, 2018, 10:53 a.m.)
> 
> 
> Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.
> 
> 
> Bugs: HIVE-19718
> https://issues.apache.org/jira/browse/HIVE-19718
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> Various optimization for addPartitions call:
> - Push down table object to convertToMPart
> - Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
> getMPartition, so it does not have to query the table object for every time 
> if we add multiple partitions for the same table
> - The original getMPartition used to query the table every time. Created a 
> new version of getMPartition, which can use the provided partitionKeys 
> instead of querying it again.
> 
> 
> Diffs
> -
> 
>   
> itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
>  3d6fda6 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
>  c1d25db 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
>  13ccdb1 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
>  ce7d286 
>   
> standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
>  b223920 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
>  f6899be 
>   
> standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
>  98a85cc 
> 
> 
> Diff: https://reviews.apache.org/r/67351/diff/1/
> 
> 
> Testing
> ---
> 
> Run several performance tests with Sasha's performance tool. These 
> optimisations shave of ~10% of the runtime
> 
> 
> Thanks,
> 
> Peter Vary
> 
>



Review Request 67351: HIVE-19718 Adding partitions in bulk also fetches table for each partition

2018-05-29 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67351/
---

Review request for hive, Alexander Kolbasov and Vihang Karajgaonkar.


Bugs: HIVE-19718
https://issues.apache.org/jira/browse/HIVE-19718


Repository: hive-git


Description
---

Various optimization for addPartitions call:
- Push down table object to convertToMPart
- Push down partitionKeys to startAddPartition -> doesPartitionExist -> 
getMPartition, so it does not have to query the table object for every time if 
we add multiple partitions for the same table
- The original getMPartition used to query the table every time. Created a new 
version of getMPartition, which can use the provided partitionKeys instead of 
querying it again.


Diffs
-

  
itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
 3d6fda6 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 c1d25db 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
 13ccdb1 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java
 ce7d286 
  
standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
 b223920 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
 f6899be 
  
standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
 98a85cc 


Diff: https://reviews.apache.org/r/67351/diff/1/


Testing
---

Run several performance tests with Sasha's performance tool. These 
optimisations shave of ~10% of the runtime


Thanks,

Peter Vary