[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7832: - Labels: (was: TODOC14) Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, HIVE-7832.8.patch, HIVE-7832.9.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Release Note: Added the new configuration to https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, HIVE-7832.8.patch, HIVE-7832.9.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.9.patch Addressed Gopal's review comments. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, HIVE-7832.8.patch, HIVE-7832.9.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Committed to trunk Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, HIVE-7832.8.patch, HIVE-7832.9.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-7832: - Labels: TODOC14 (was: ) Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Labels: TODOC14 Fix For: 0.14.0 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, HIVE-7832.8.patch, HIVE-7832.9.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.8.patch Addressed [~gopalv]'s review comments. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, HIVE-7832.8.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.7.patch vectorization_part_project.q seems to pass in latest trunk. Uploaded the .6 patch with different name just to make sure vectorization_part_project.q is passing now. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.4.patch Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.5.patch Earlier patch made some complications related to clearing the dictionary entries. Fixed it in this patch. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.6.patch Minor fixes to variables. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.2.patch Addressed Gopal's review comment. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.3.patch Following changes should fix the failing tests 1) flushDictionary() is updated to add row index entries for the flushed rows 2) Doing dictionary check before writing stripe for cases where number of rows in stripe is less than the dictionary check after rows config. Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Attachment: HIVE-7832.1.patch Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes
[ https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7832: - Status: Patch Available (was: Open) Do ORC dictionary check at a finer level and preserve encoding across stripes - Key: HIVE-7832 URL: https://issues.apache.org/jira/browse/HIVE-7832 Project: Hive Issue Type: Improvement Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7832.1.patch Currently ORC dictionary check happens while writing the stripe. Just before writing stripe if ratio of dictionary entries to total non-null rows is greater than threshold then the dictionary is discarded. Also, the decision of using dictionary or not is preserved across stripes. This sometimes leads to costly insertion cost of O(logn) for each stripes when there are too many distinct keys. -- This message was sent by Atlassian JIRA (v6.2#6252)