Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-19 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/
---

(Updated Oct. 19, 2018, 8:30 a.m.)


Review request for hive and Peter Vary.


Repository: hive-git


Description
---

HIVE-20661: Dynamic partitions loading calls add partition for every partition 
1-by-1


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
0ab77e84c6222d35bcec9ce95ed02014911ef144 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
4de038913a5c9a2c199f71702b8f70ca84d0856b 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
dd23d7db3e70c9540e48c42eb7b9a33ed775cea6 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 aba63f050b5b98a2aeeb0df6ff2de5e6e06761f2 


Diff: https://reviews.apache.org/r/68975/diff/6/

Changes: https://reviews.apache.org/r/68975/diff/5-6/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-18 Thread Laszlo Pinter via Review Board


> On Oct. 18, 2018, 9:13 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1921 (patched)
> > 
> >
> > nit: one extra space

done.


> On Oct. 18, 2018, 9:13 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Line 1876 (original)
> > 
> >
> > This perf logging is removed... Let's think through if we need other 
> > places to add some instead considering the new calling structure

It was not removed, but moved to another place. Now the tracking is started and 
finished in loadPartition().


> On Oct. 18, 2018, 9:13 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 2171 (patched)
> > 
> >
> > What happens when only 1 partiton is already exists from multiple ones?
> > 
> > Also, do we need SynchronizedMSC?

Yes, we need this one in every case. The SynchronizedMetaStoreClient is just a 
facede for the SessionHiveMetaStoreClient, which in the other hand, does some 
pre-processing (eg. adding the partition to the temp table), before calling the 
HiveMetaStoreClient.


> On Oct. 18, 2018, 9:13 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 2570 (patched)
> > 
> >
> > Do we need to call this every time in the for loop? I kinda remember 
> > that we allow only partitions for a single table only... Or I might be 
> > mistaken, but still might be a good idea to not generate this every time...

I think we do need. The fetching of the snapshot is done inside a callable task 
which is executed on thread pool in parallel. Altough we use only one table, 
the snapshot of it can change from time to time, depending on how many tasks 
were already finished. It's better to be safe, and get a new snapshot every 
time, than have an outdated one.


- Laszlo


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/#review209735
---


On Oct. 18, 2018, 7:01 a.m., Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68975/
> ---
> 
> (Updated Oct. 18, 2018, 7:01 a.m.)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20661: Dynamic partitions loading calls add partition for every 
> partition 1-by-1
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
> 0ab77e84c6222d35bcec9ce95ed02014911ef144 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 4de038913a5c9a2c199f71702b8f70ca84d0856b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  dd23d7db3e70c9540e48c42eb7b9a33ed775cea6 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  aba63f050b5b98a2aeeb0df6ff2de5e6e06761f2 
> 
> 
> Diff: https://reviews.apache.org/r/68975/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-18 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/#review209735
---



Thanks Laci!
1 serions question
1 mild one
and several annoying nits :)
Thanks,
Peter


ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1921 (patched)


nit: one extra space



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 1876 (original)


This perf logging is removed... Let's think through if we need other places 
to add some instead considering the new calling structure



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2171 (patched)


What happens when only 1 partiton is already exists from multiple ones?

Also, do we need SynchronizedMSC?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2570 (patched)


Do we need to call this every time in the for loop? I kinda remember that 
we allow only partitions for a single table only... Or I might be mistaken, but 
still might be a good idea to not generate this every time...



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2687 (patched)


I am not really a big fun of printing a stacktrace and rethrowing an error, 
unless I am quite sure that the exception rethrown is not printed later. 
Otherwise this could be quite confusing.


- Peter Vary


On okt. 18, 2018, 7:01 de, Laszlo Pinter wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68975/
> ---
> 
> (Updated okt. 18, 2018, 7:01 de)
> 
> 
> Review request for hive and Peter Vary.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> HIVE-20661: Dynamic partitions loading calls add partition for every 
> partition 1-by-1
> 
> 
> Diffs
> -
> 
>   
> ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
> 0ab77e84c6222d35bcec9ce95ed02014911ef144 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
> 4de038913a5c9a2c199f71702b8f70ca84d0856b 
>   
> ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
>  dd23d7db3e70c9540e48c42eb7b9a33ed775cea6 
>   
> standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
>  aba63f050b5b98a2aeeb0df6ff2de5e6e06761f2 
> 
> 
> Diff: https://reviews.apache.org/r/68975/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Laszlo Pinter
> 
>



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-18 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/
---

(Updated Oct. 18, 2018, 7:01 a.m.)


Review request for hive and Peter Vary.


Repository: hive-git


Description
---

HIVE-20661: Dynamic partitions loading calls add partition for every partition 
1-by-1


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
0ab77e84c6222d35bcec9ce95ed02014911ef144 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
4de038913a5c9a2c199f71702b8f70ca84d0856b 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
dd23d7db3e70c9540e48c42eb7b9a33ed775cea6 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 aba63f050b5b98a2aeeb0df6ff2de5e6e06761f2 


Diff: https://reviews.apache.org/r/68975/diff/5/

Changes: https://reviews.apache.org/r/68975/diff/4-5/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-17 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/
---

(Updated Oct. 17, 2018, 1:44 p.m.)


Review request for hive and Peter Vary.


Repository: hive-git


Description
---

HIVE-20661: Dynamic partitions loading calls add partition for every partition 
1-by-1


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
0ab77e84c6222d35bcec9ce95ed02014911ef144 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
4de038913a5c9a2c199f71702b8f70ca84d0856b 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
dd23d7db3e70c9540e48c42eb7b9a33ed775cea6 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 aba63f050b5b98a2aeeb0df6ff2de5e6e06761f2 


Diff: https://reviews.apache.org/r/68975/diff/4/

Changes: https://reviews.apache.org/r/68975/diff/3-4/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-13 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/
---

(Updated Oct. 13, 2018, 9:55 a.m.)


Review request for hive and Peter Vary.


Repository: hive-git


Description
---

HIVE-20661: Dynamic partitions loading calls add partition for every partition 
1-by-1


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
0ab77e84c6222d35bcec9ce95ed02014911ef144 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
4de038913a5c9a2c199f71702b8f70ca84d0856b 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
a2b57fb646899c54b63be14a8cde9b8644a973aa 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 a2ec09f7157140fda751ab777649c698b3319a16 


Diff: https://reviews.apache.org/r/68975/diff/3/

Changes: https://reviews.apache.org/r/68975/diff/2-3/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-13 Thread Laszlo Pinter via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/
---

(Updated Oct. 13, 2018, 7:56 a.m.)


Review request for hive and Peter Vary.


Repository: hive-git


Description
---

HIVE-20661: Dynamic partitions loading calls add partition for every partition 
1-by-1


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/metastore/SynchronizedMetaStoreClient.java 
0ab77e84c6222d35bcec9ce95ed02014911ef144 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 
4de038913a5c9a2c199f71702b8f70ca84d0856b 
  
ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 
a2b57fb646899c54b63be14a8cde9b8644a973aa 
  
standalone-metastore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
 a2ec09f7157140fda751ab777649c698b3319a16 


Diff: https://reviews.apache.org/r/68975/diff/2/

Changes: https://reviews.apache.org/r/68975/diff/1-2/


Testing
---


Thanks,

Laszlo Pinter



Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-13 Thread Laszlo Pinter via Review Board


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1896 (patched)
> > 
> >
> > What does this method do?
> > Moving files and updating partition data if the partition is already 
> > exists?
> > Javadoc would be good anyway.
> > Follow-up idea: update partition data with alter_partitions (multiple 
> > updates with 1 HMS call)?

Good point, javadoc added.

I noticed that as well, just didn't want to make this change even more 
complicated. There are other points also wich could be be further optimized.  I 
made a note to myself to open a new jira after this is merged in.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Line 1909 (original), 1939 (patched)
> > 
> >
> > nit: maybe do not reformat these lines if they are not needed

done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Line 1939 (original), 1961 (patched)
> > 
> >
> > nit: maybe do not reformat these lines if they are not needed

done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Line 1948 (original), 1970 (patched)
> > 
> >
> > nit: maybe do not reformat these lines if they are not needed

done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Line 1966 (original), 1988 (patched)
> > 
> >
> > nit: maybe do not reformat these lines if they are not needed

done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1970-1971 (original), 1992-1993 (patched)
> > 
> >
> > nit: maybe do not reformat these lines if they are not needed

done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 1978-1979 (original), 2000-2001 (patched)
> > 
> >
> > nit: maybe do not reformat these lines if they are not needed

done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 2102-2103 (patched)
> > 
> >
> > Concerned about this.
> > When we call Hive.loadPartition we call 2 methods there:
> > - loadPartitionInternal - 1 snapshot
> > - addPartitionToMetastore - 1 snapshot
> > Are we sure that these calls are:
> > - Lightweight - so we can happily call them twice
> > - Return the same value in both ocassions even if some other transacion 
> > is finished during this time?

Yes, I see your point here. 
By checking the implementation of AcidUtils.getTableSnapshot, it is not 
guaranteed that the two calls will have the same result. 
I made a modification to use the same instance of snapshot until the loading of 
partition is done.


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 2083-2084 (original), 2129-2130 (patched)
> > 
> >
> > We do not use batching for adding partitions?

What should be the size of the batch? I used BATCH_RETRIEVE_MAX configuration 
parameter for fetching data from metastore. Can I use this one as well, or 
there is another parameter?


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 2492-2506 (patched)
> > 
> >
> > Do we need this? Can't we use equals method of maps to compare instead?

Java arrays do not override Object.equals(). Hence if you compare them using 
equals() (which is what the equals methods of all the collection classes do), 
you get "instance equality", instead of "value equality".


> On Oct. 11, 2018, 8:54 a.m., Peter Vary wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
> > Lines 2511 (patched)
> > 
> >
> > Note: We still might fail with OOM if too many new partitions are 
> > there. We store in memory all of the new and old partition specifications. 
> > Was this the same before?

Yes, in the previous 

Re: Review Request 68975: HIVE-20661: Dynamic partitions loading calls add partition for every partition 1-by-1

2018-10-11 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68975/#review209443
---



Thanks Laszlo!
This is a big patch indeed.
Comments below.
Could you check that the test cases are covering all possible scenarios?
Thanks,
Peter


ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1896 (patched)


What does this method do?
Moving files and updating partition data if the partition is already exists?
Javadoc would be good anyway.
Follow-up idea: update partition data with alter_partitions (multiple 
updates with 1 HMS call)?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 1909 (original), 1939 (patched)


nit: maybe do not reformat these lines if they are not needed



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 1939 (original), 1961 (patched)


nit: maybe do not reformat these lines if they are not needed



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 1948 (original), 1970 (patched)


nit: maybe do not reformat these lines if they are not needed



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 1966 (original), 1988 (patched)


nit: maybe do not reformat these lines if they are not needed



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1970-1971 (original), 1992-1993 (patched)


nit: maybe do not reformat these lines if they are not needed



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 1978-1979 (original), 2000-2001 (patched)


nit: maybe do not reformat these lines if they are not needed



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2102-2103 (patched)


Concerned about this.
When we call Hive.loadPartition we call 2 methods there:
- loadPartitionInternal - 1 snapshot
- addPartitionToMetastore - 1 snapshot
Are we sure that these calls are:
- Lightweight - so we can happily call them twice
- Return the same value in both ocassions even if some other transacion is 
finished during this time?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2080-2082 (original), 2126-2128 (patched)


Maybe surround this with isDebugEnabled?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2083-2084 (original), 2129-2130 (patched)


We do not use batching for adding partitions?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2492-2506 (patched)


Do we need this? Can't we use equals method of maps to compare instead?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2511 (patched)


Note: We still might fail with OOM if too many new partitions are there. We 
store in memory all of the new and old partition specifications. Was this the 
same before?



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2604-2607 (patched)


Shouldn't we do this when uploading the files?
I think the original intent was to show the progress of the loading of the 
partitions. We might want to keep this functionality.



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Lines 2491-2492 (original), 2615 (patched)


nit: Please-please-please no unneccessary formatting changes... :)



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 2496 (original), 2619 (patched)


We need the PerfLogEnd for LOAD_DYNAMIC_PARTITIONS



ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java
Line 2500 (original), 2621 (patched)


Why do we shallow the original exception. We should add as a root cause



ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Lines 910 (patched)


Do we need this as public?



ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Lines 952 (patched)


Do we need this as public?



ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
Lines 1015 (patched)