Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-23 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210823
---


Ship it!




Ship It!

- Peter Vary


On nov. 20, 2018, 12:53 du, Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated nov. 20, 2018, 12:53 du)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>



Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-23 Thread Adam Szita via Review Board


> On Nov. 22, 2018, 1:25 p.m., Peter Vary wrote:
> > My only concen is that some other components might use HCAT_KEY_JOB_INFO 
> > property values as well? Was this a public property key?
> > 
> > Otherwise nicely done!

Thanks for looking into this change, AFAIK HCAT_KEY_JOB_INFO is only used in 
HCatalog code internally, i.e. Pig is using the proper methods provided by 
LoadMetadata interface, and receives the input size numbers through that. It 
does not depend on job conf objects populated with value for HCAT_KEY_JOB_INFO.


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210792
---


On Nov. 20, 2018, 12:53 p.m., Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated Nov. 20, 2018, 12:53 p.m.)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>



Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-22 Thread Peter Vary via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210792
---



My only concen is that some other components might use HCAT_KEY_JOB_INFO 
property values as well? Was this a public property key?

Otherwise nicely done!

- Peter Vary


On nov. 20, 2018, 12:53 du, Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated nov. 20, 2018, 12:53 du)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>



Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-20 Thread Nandor Kollar via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210718
---


Ship it!




Ship It!

- Nandor Kollar


On Nov. 20, 2018, 12:53 p.m., Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated Nov. 20, 2018, 12:53 p.m.)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>



Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-20 Thread Adam Szita via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/
---

Review request for hive, Nandor Kollar and Peter Vary.


Bugs: HIVE-20330
https://issues.apache.org/jira/browse/HIVE-20330


Repository: hive-git


Description
---

The change in this patch is that we're not just serializing and putting one 
InputJobInfo into JobConf, but rather always append to a list (or create it on 
the first occurrence) of InputJobInfo instances in it.
This ensures that if multiple tables serve as inputs in a job, Pig can retrieve 
information for each of the tables, not just the last one added.

I've also discovered a bug in InputJobInfo.writeObject() where the 
ObjectOutputStream was closed by mistake after writing partition information in 
a compressed manner. Closing the compressed writer inevitably closed the OOS on 
the context and prevented any other objects to be written into OOS - I had to 
fix that because it prevented serializing InputJobInfo instances inside a list.


Diffs
-

  hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
8e72a1275a5cdcc2d778080fff6bb82198395f5f 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
 195eaa367933990e3ef0ef879f34049c65822aee 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
 8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
 ad6f3eb9f93338023863c6239d6af0449b20ff9c 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
 364382d9ccf6eb9fc29689b0eb5f973f422051b4 
  
hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
 ac1dd54be821d32aa008d41514df05a41f16223c 
  hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
91aa4fa2693e0b0bd65c1667210af340619f552d 
  
hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
 c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
  
hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
 58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 


Diff: https://reviews.apache.org/r/69410/diff/1/


Testing
---

Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
instances to/from config instances.
Added (integration-like) unit tests to mock Pig calling HCatLoader for multiple 
input tables, and checking the reported input sizes.


Thanks,

Adam Szita