Re: Review Request 68474: HIVE-20440

2018-11-07 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
---

(Updated nov. 7, 2018, 2:38 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
Zhang.


Repository: hive-git


Description
---

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the 
intern-ed string of the path.


Diffs (updated)
-

  
itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java
 PRE-CREATION 
  ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
3293100af96dc60408c53065fa89143ead98f818 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/5/

Changes: https://reviews.apache.org/r/68474/diff/4-5/


Testing
---


Thanks,

Antal Sinkovits



Re: Review Request 68474: HIVE-20440

2018-11-07 Thread Antal Sinkovits via Review Board


> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at 
> > the granularity of a `MapJoinOperator`? For example, confirm that starting 
> > a new query actually evicts everything from the cache? We want to make sure 
> > we aren't accidentally leaking small tables.
> 
> Antal Sinkovits wrote:
> MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, 
> but the test code is really complex. And the eviction happens at the 
> HivePairFlatMapFunction level. For every Map/Reduce the cache is 
> reinitialized. If we are in a new query the cache gets evicted.

I've added a new test, to check this.


- Antal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
---


On nov. 7, 2018, 2:38 du, Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated nov. 7, 2018, 2:38 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   
> itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCacheEviction.java
>  PRE-CREATION 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/5/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440

2018-11-06 Thread Antal Sinkovits via Review Board


> On okt. 16, 2018, 2:50 du, Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
> > Lines 131 (patched)
> > 
> >
> > why do we run the action just for the l2 cache?

L2 contains all the elements from L1, so running through L2 is enough.


- Antal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
---


On nov. 6, 2018, 12:28 du, Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated nov. 6, 2018, 12:28 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440

2018-11-06 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
---

(Updated nov. 6, 2018, 12:28 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
Zhang.


Repository: hive-git


Description
---

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the 
intern-ed string of the path.


Diffs (updated)
-

  ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
3293100af96dc60408c53065fa89143ead98f818 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/4/

Changes: https://reviews.apache.org/r/68474/diff/3-4/


Testing
---


Thanks,

Antal Sinkovits



Re: Review Request 68474: HIVE-20440

2018-11-06 Thread Antal Sinkovits via Review Board


> On okt. 16, 2018, 2:56 du, Sahil Takiar wrote:
> > Could we add some more E2E integration tests? I'm thinking they could at 
> > the granularity of a `MapJoinOperator`? For example, confirm that starting 
> > a new query actually evicts everything from the cache? We want to make sure 
> > we aren't accidentally leaking small tables.

MapJoinOperator cannot be tested easily. There is a TestMapJoinOperator, but 
the test code is really complex. And the eviction happens at the 
HivePairFlatMapFunction level. For every Map/Reduce the cache is reinitialized. 
If we are in a new query the cache gets evicted.


- Antal


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
---


On nov. 6, 2018, 12:28 du, Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated nov. 6, 2018, 12:28 du)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml 8c3e55eaf4d0234a280b0936f6153d2f563bbe46 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/4/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440

2018-10-16 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209628
---



Could we add some more E2E integration tests? I'm thinking they could at the 
granularity of a `MapJoinOperator`? For example, confirm that starting a new 
query actually evicts everything from the cache? We want to make sure we aren't 
accidentally leaking small tables.

- Sahil Takiar


On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Oct. 10, 2018, 1:20 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440

2018-10-16 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209626
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)


nit: if you want to leave the `@return` section empty, then just remove it 
entirely



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 127 (patched)


nit: same as above



ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
Lines 178-190 (patched)


what about changing this to something like `getKey()` and just returning a 
`String`. I don't think the interface needs to be tied to reading data to a 
folder on HDFS.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 131 (patched)


why do we run the action just for the l2 cache?


- Sahil Takiar


On Oct. 10, 2018, 1:20 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Oct. 10, 2018, 1:20 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
> da1dd426c9155290e30fd1e3ae7f19a5479a8967 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
>  9e65fd98d6e4451421641b1429ccf334fe9a9586 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
>  54377428eafdb79e1bbdc8a182eafb46f8febd23 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
>  0e4b8df036724bd83e85fc3cc70f534272dab4c4 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
>  74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
>  24b8fea33815867ce544fd284437c4d02a21f1a3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   
> ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
>  e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/3/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440

2018-10-10 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
---

(Updated okt. 10, 2018, 1:20 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
Zhang.


Summary (updated)
-

HIVE-20440


Repository: hive-git


Description
---

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the 
intern-ed string of the path.


Diffs (updated)
-

  ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 
da1dd426c9155290e30fd1e3ae7f19a5479a8967 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/AbstractMapJoinTableContainer.java
 9e65fd98d6e4451421641b1429ccf334fe9a9586 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/HybridHashTableContainer.java
 54377428eafdb79e1bbdc8a182eafb46f8febd23 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinBytesTableContainer.java
 0e4b8df036724bd83e85fc3cc70f534272dab4c4 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainer.java
 74e0b120ea3560a6a2a0074e6c0026b4874b3d5e 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/MapJoinTableContainerSerDe.java
 24b8fea33815867ce544fd284437c4d02a21f1a3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
3293100af96dc60408c53065fa89143ead98f818 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/vector/mapjoin/fast/VectorMapJoinFastTableContainer.java
 e8dcbf18cb09b190536f920a53d6e9fa870ce33b 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/3/

Changes: https://reviews.apache.org/r/68474/diff/2-3/


Testing
---


Thanks,

Antal Sinkovits



Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

2018-10-01 Thread Sahil Takiar

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review209130
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Line 60 (original), 69 (patched)


keep the explicit cache method and call it in `MapJoinOperator#closeOp`. 
This way when a task finishes, we still keep the small table around for at 
least 30 seconds, which gives any tasks scheduled in the future a chance to 
re-use the small table.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 75 (patched)


can u add some javadocs to this class explaining what it is doing



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 82 (patched)


rename to something like `cleanupService`



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 90 (patched)


nit: make `INTEGER_ONE` a static import



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 91 (patched)


"SmallTableCache maintenance thread" -> "SmallTableCache Cleanup Thread"



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 117 (patched)


replace with `cacheL1.get(key, valueLoader)` where `valueLoader` loads from 
`cacheL2`


- Sahil Takiar


On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Sept. 19, 2018, 11:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

2018-09-20 Thread denys kuzmenko via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/#review208793
---




ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java
Lines 94 (patched)


Just a remark, note from google documentation:
"Because of the performance implications of using soft references, we 
generally recommend using the more predictable maximum cache size instead."


- denys kuzmenko


On Sept. 19, 2018, 11:14 p.m., Antal Sinkovits wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/68474/
> ---
> 
> (Updated Sept. 19, 2018, 11:14 p.m.)
> 
> 
> Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
> Zhang.
> 
> 
> Repository: hive-git
> 
> 
> Description
> ---
> 
> I've modified the SmallTableCache to use guava cache, with soft references.
> By using a value loader, I've also eliminated the synchronization on the 
> intern-ed string of the path.
> 
> 
> Diffs
> -
> 
>   ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
> cf27e92bafdc63096ec0fa8c3106657bab52f370 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
> 3293100af96dc60408c53065fa89143ead98f818 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
> PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/68474/diff/2/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Antal Sinkovits
> 
>



Re: Review Request 68474: HIVE-20440: Create better cache eviction policy for SmallTableCache

2018-09-19 Thread Antal Sinkovits via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/68474/
---

(Updated szept. 19, 2018, 11:14 du)


Review request for hive, Naveen Gangam, Sahil Takiar, Adam Szita, and Xuefu 
Zhang.


Repository: hive-git


Description (updated)
---

I've modified the SmallTableCache to use guava cache, with soft references.
By using a value loader, I've also eliminated the synchronization on the 
intern-ed string of the path.


Diffs (updated)
-

  ql/pom.xml d73deba440702ec39fc5610df28e0fe54baef025 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java 
cf27e92bafdc63096ec0fa8c3106657bab52f370 
  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SmallTableCache.java 
3293100af96dc60408c53065fa89143ead98f818 
  ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestSmallTableCache.java 
PRE-CREATION 


Diff: https://reviews.apache.org/r/68474/diff/2/

Changes: https://reviews.apache.org/r/68474/diff/1-2/


Testing
---


Thanks,

Antal Sinkovits