date:20181123

When do the deltas of a transaction become observable?

2018-11-23 Thread Granville Barnett

Hi All,

I'm trying to figure out where in the Hive codebase that all deltas that
are the side effect of a Hive 3.x transaction become observable. (My
current investigation is for HDFS.)

For example,

from table1
insert into table2 select x
insert into table3 select x;

This transaction generates two delta files: one that will be appear under
the location for table2 and another under the location for table3.

I'm expecting that there's some logic that will make the deltas of this
transaction appear in their respective HDFS locations upon commit (or
release of the locks) but I can't seem to find it. As it's a transactional
system I'd expect we observe both deltas or none at all, at the point of
successful commit.

The only reference to a location I've managed to stumble across is that of
the Hive scratch space: conceptually, I had thought that the intermediate
result of a transaction would be located here and then a rename would occur
to make the content visible to other readers.

I had done some basic tests to determine if the observation semantics were
tied to the metadata in the database product for the transactional system
but I could only determine write IDs were influencing this, e.g. if write
ID = 7 for a given table, then the read would consist of all deltas with a
write ID < 7.

If someone could point me in the right direction, or correct my
understanding then I would greatly appreciate it.

Thanks,

Granville

[jira] [Created] (HIVE-20965) Drilldown estimates are highly off after HIVE-18908

2018-11-23 Thread Zoltan Haindrich (JIRA)

Zoltan Haindrich created HIVE-20965:
---

 Summary: Drilldown estimates are highly off after HIVE-18908
 Key: HIVE-20965
 URL: https://issues.apache.org/jira/browse/HIVE-20965
 Project: Hive
  Issue Type: Bug
Reporter: Zoltan Haindrich
Assignee: Zoltan Haindrich






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-23 Thread Peter Vary via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210823
---


Ship it!




Ship It!

- Peter Vary


On nov. 20, 2018, 12:53 du, Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated nov. 20, 2018, 12:53 du)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>

Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

2018-11-23 Thread Adam Szita via Review Board



> On Nov. 22, 2018, 1:25 p.m., Peter Vary wrote:
> > My only concen is that some other components might use HCAT_KEY_JOB_INFO 
> > property values as well? Was this a public property key?
> > 
> > Otherwise nicely done!

Thanks for looking into this change, AFAIK HCAT_KEY_JOB_INFO is only used in 
HCatalog code internally, i.e. Pig is using the proper methods provided by 
LoadMetadata interface, and receives the input size numbers through that. It 
does not depend on job conf objects populated with value for HCAT_KEY_JOB_INFO.


- Adam


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69410/#review210792
---


On Nov. 20, 2018, 12:53 p.m., Adam Szita wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69410/
> ---
> 
> (Updated Nov. 20, 2018, 12:53 p.m.)
> 
> 
> Review request for hive, Nandor Kollar and Peter Vary.
> 
> 
> Bugs: HIVE-20330
> https://issues.apache.org/jira/browse/HIVE-20330
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> The change in this patch is that we're not just serializing and putting one 
> InputJobInfo into JobConf, but rather always append to a list (or create it 
> on the first occurrence) of InputJobInfo instances in it.
> This ensures that if multiple tables serve as inputs in a job, Pig can 
> retrieve information for each of the tables, not just the last one added.
> 
> I've also discovered a bug in InputJobInfo.writeObject() where the 
> ObjectOutputStream was closed by mistake after writing partition information 
> in a compressed manner. Closing the compressed writer inevitably closed the 
> OOS on the context and prevented any other objects to be written into OOS - I 
> had to fix that because it prevented serializing InputJobInfo instances 
> inside a list.
> 
> 
> Diffs
> -
> 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/common/HCatUtil.java 
> 8e72a1275a5cdcc2d778080fff6bb82198395f5f 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java
>  195eaa367933990e3ef0ef879f34049c65822aee 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatBaseInputFormat.java
>  8d7a8f9df9412105ec7d77fad9af0d7dd18f4323 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/HCatInputFormat.java
>  ad6f3eb9f93338023863c6239d6af0449b20ff9c 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InitializeInput.java
>  364382d9ccf6eb9fc29689b0eb5f973f422051b4 
>   
> hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/InputJobInfo.java
>  ac1dd54be821d32aa008d41514df05a41f16223c 
>   
> hcatalog/core/src/test/java/org/apache/hive/hcatalog/common/TestHCatUtil.java 
> 91aa4fa2693e0b0bd65c1667210af340619f552d 
>   
> hcatalog/hcatalog-pig-adapter/src/main/java/org/apache/hive/hcatalog/pig/HCatLoader.java
>  c3bde2d2a3cbd09fb0b1ed758bf4f2b1041a23cb 
>   
> hcatalog/hcatalog-pig-adapter/src/test/java/org/apache/hive/hcatalog/pig/AbstractHCatLoaderTest.java
>  58981f88ef6abfbf7a4b7ffc3116c53d47e86fde 
> 
> 
> Diff: https://reviews.apache.org/r/69410/diff/1/
> 
> 
> Testing
> ---
> 
> Added (true) unit tests to verify my method of adding/retrieving InputJobInfo 
> instances to/from config instances.
> Added (integration-like) unit tests to mock Pig calling HCatLoader for 
> multiple input tables, and checking the reported input sizes.
> 
> 
> Thanks,
> 
> Adam Szita
> 
>

Re: Review Request 69432: HIVE-20964 Create a test that checks the level of the parallel compilation

2018-11-23 Thread Marta Kuczora via Review Board


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69432/#review210821
---


Ship it!




Ship It!

- Marta Kuczora


On Nov. 22, 2018, 3:19 p.m., Peter Vary wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69432/
> ---
> 
> (Updated Nov. 22, 2018, 3:19 p.m.)
> 
> 
> Review request for hive, Denys Kuzmenko, Marta Kuczora, and Adam Szita.
> 
> 
> Bugs: HIVE-20964
> https://issues.apache.org/jira/browse/HIVE-20964
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> * Created 2 query types in the TestCompileLock mock driver. The original 
> SHORT_QUERY is finishing in 0.5s as before, but the new LONG_QUERY will 
> finish only after 5s.
> * With using the new 5s query I have created a new test where the compile 
> quota is 4 and the parallel request number is 10. So the test expects that 6 
> query will fail with timeout.
> * Added a new verifyThatTimedOutCompileOpsCount method to validate the number 
> of the timed out queries.
> * The other changes are just pushing down the query string so the 
> compileAndRespond method can decide which query to run.
> 
> 
> Diffs
> -
> 
>   ql/src/test/org/apache/hadoop/hive/ql/TestCompileLock.java 8dc05ff480 
> 
> 
> Diff: https://reviews.apache.org/r/69432/diff/1/
> 
> 
> Testing
> ---
> 
> Run the new test, and all the old tests in TestCompileLock
> 
> 
> Thanks,
> 
> Peter Vary
> 
>

When do the deltas of a transaction become observable?

[jira] [Created] (HIVE-20965) Drilldown estimates are highly off after HIVE-18908

Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

Re: Review Request 69410: HIVE-20330: HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs

Re: Review Request 69432: HIVE-20964 Create a test that checks the level of the parallel compilation

5 matches

Site Navigation

Mail list logo

Footer information