[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-03-02 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

committed to trunk

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.13.0

 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
 HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Status: Open  (was: Patch Available)

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
 HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.09.patch

removed leftover BSS changes, filed HIVE-6526.
Should be ready to go... would be nice to have HiveQA too

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
 HIVE-6429.09.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Status: Patch Available  (was: Open)

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
 HIVE-6429.09.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-28 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.10.patch

RB feedback + some internal discussion; mostly moving some key-specific stuff 
to key, and changing the vectorization path to go thru the elaborate 
writer/writable/oi path, and not raw values. Few tez tests appear to pass, I'll 
run the rest

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
 HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-27 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.08.patch

after some more discussion, we decided to rewrite once again using 
LazyBinarySerde. I preserved some refactoring done to BinarySortableSerde.

So here's a 3rd way to do this. There are many more untapped serde-s out 
there... :)

I've ran a few tests that failed previously and a couple Tez tests, they all 
pass. I will run all tez tests now, and all tests overnight if I don't forget

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, 
 HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-26 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.07.patch

fix the vectorization test

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-24 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.06.patch

Fixed bugs

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.06.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.05.patch

05 contains changes to move to BinarySortableSerde encoding... imho it's not 
such a good idea.

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, 
 HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.03.patch

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-21 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.04.patch

for now address the other feedback... I will have separate patch to use 
BinarySortableSerDe, just need to hack around vectorized path, but I don't 
think it's worth it, it's convoluted and still has to keep type array and 
separate path for vectorization; there
s also additional changes because for example hasAnyNulls would be complicated 
and expensive with BSSD format, so it has to be additionally retrieved at key 
creation time for the big table key in MJO.

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-20 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.02.patch

one fix and one small change

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, 
 HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.patch

Tez tests are halfway and passing so far.
I still need to add a config setting 

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Status: Patch Available  (was: Open)

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-18 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.01.patch

added config setting, other minor fixes

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.01.patch, HIVE-6429.WIP.patch, HIVE-6429.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases

2014-02-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6429:
---

Attachment: HIVE-6429.WIP.patch

WIP patch. Some tests appear to pass, but it cannot deal with lazy primitive 
serdes as I have just discovered. I will address this tomorrow.
Safety config to disable this (enabled by default) is probably needed 
[~hagleitn] [~jnp] fyi

 MapJoinKey has large memory overhead in typical cases
 -

 Key: HIVE-6429
 URL: https://issues.apache.org/jira/browse/HIVE-6429
 Project: Hive
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6429.WIP.patch


 The only thing that MJK really needs it hashCode and equals (well, and 
 construction), so there's no need to have array of writables in there. 
 Assuming all the keys for a table have the same structure, for the common 
 case where keys are primitive types, we can store something like a byte array 
 combination of keys to reduce the memory usage. Will probably speed up 
 compares too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)