[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) committed to trunk MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.13.0 Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Status: Open (was: Patch Available) MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.09.patch removed leftover BSS changes, filed HIVE-6526. Should be ready to go... would be nice to have HiveQA too MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, HIVE-6429.09.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Status: Patch Available (was: Open) MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, HIVE-6429.09.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.10.patch RB feedback + some internal discussion; mostly moving some key-specific stuff to key, and changing the vectorization path to go thru the elaborate writer/writable/oi path, and not raw values. Few tez tests appear to pass, I'll run the rest MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, HIVE-6429.09.patch, HIVE-6429.10.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.08.patch after some more discussion, we decided to rewrite once again using LazyBinarySerde. I preserved some refactoring done to BinarySortableSerde. So here's a 3rd way to do this. There are many more untapped serde-s out there... :) I've ran a few tests that failed previously and a couple Tez tests, they all pass. I will run all tez tests now, and all tests overnight if I don't forget MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.08.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.07.patch fix the vectorization test MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.07.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.06.patch Fixed bugs MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.06.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.05.patch 05 contains changes to move to BinarySortableSerde encoding... imho it's not such a good idea. MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.05.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.03.patch MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.04.patch for now address the other feedback... I will have separate patch to use BinarySortableSerDe, just need to hack around vectorized path, but I don't think it's worth it, it's convoluted and still has to keep type array and separate path for vectorization; there s also additional changes because for example hasAnyNulls would be complicated and expensive with BSSD format, so it has to be additionally retrieved at key creation time for the big table key in MJO. MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.03.patch, HIVE-6429.04.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.02.patch one fix and one small change MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.02.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.patch Tez tests are halfway and passing so far. I still need to add a config setting MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Status: Patch Available (was: Open) MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.01.patch added config setting, other minor fixes MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.01.patch, HIVE-6429.WIP.patch, HIVE-6429.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6429) MapJoinKey has large memory overhead in typical cases
[ https://issues.apache.org/jira/browse/HIVE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6429: --- Attachment: HIVE-6429.WIP.patch WIP patch. Some tests appear to pass, but it cannot deal with lazy primitive serdes as I have just discovered. I will address this tomorrow. Safety config to disable this (enabled by default) is probably needed [~hagleitn] [~jnp] fyi MapJoinKey has large memory overhead in typical cases - Key: HIVE-6429 URL: https://issues.apache.org/jira/browse/HIVE-6429 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6429.WIP.patch The only thing that MJK really needs it hashCode and equals (well, and construction), so there's no need to have array of writables in there. Assuming all the keys for a table have the same structure, for the common case where keys are primitive types, we can store something like a byte array combination of keys to reduce the memory usage. Will probably speed up compares too. -- This message was sent by Atlassian JIRA (v6.1.5#6160)