[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-10-12 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7832:
-
Labels:   (was: TODOC14)

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, 
 HIVE-7832.8.patch, HIVE-7832.9.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-10-07 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-
Release Note: Added the new configuration to 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-ORCFileFormat

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, 
 HIVE-7832.8.patch, HIVE-7832.9.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-09-02 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-
Attachment: HIVE-7832.9.patch

Addressed Gopal's review comments.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, 
 HIVE-7832.8.patch, HIVE-7832.9.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-09-02 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-
   Resolution: Fixed
Fix Version/s: 0.14.0
   Status: Resolved  (was: Patch Available)

Committed to trunk

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Fix For: 0.14.0

 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, 
 HIVE-7832.8.patch, HIVE-7832.9.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-09-02 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-7832:
-
Labels: TODOC14  (was: )

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
  Labels: TODOC14
 Fix For: 0.14.0

 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, 
 HIVE-7832.8.patch, HIVE-7832.9.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-29 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.8.patch

Addressed [~gopalv]'s review comments.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch, 
 HIVE-7832.8.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-27 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.7.patch

vectorization_part_project.q seems to pass in latest trunk. Uploaded the .6 
patch with different name just to make sure vectorization_part_project.q is 
passing now.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch, HIVE-7832.7.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-26 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.4.patch

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-26 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.5.patch

Earlier patch made some complications related to clearing the dictionary 
entries. Fixed it in this patch.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-26 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.6.patch

Minor fixes to variables.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch, 
 HIVE-7832.4.patch, HIVE-7832.5.patch, HIVE-7832.6.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-22 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.2.patch

Addressed Gopal's review comment.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-22 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.3.patch

Following changes should fix the failing tests
1) flushDictionary() is updated to add row index entries for the flushed rows
2) Doing dictionary check before writing stripe for cases where number of rows 
in stripe is less than the dictionary check after rows config.

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch, HIVE-7832.2.patch, HIVE-7832.3.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-21 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Attachment: HIVE-7832.1.patch

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HIVE-7832) Do ORC dictionary check at a finer level and preserve encoding across stripes

2014-08-21 Thread Prasanth J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth J updated HIVE-7832:
-

Status: Patch Available  (was: Open)

 Do ORC dictionary check at a finer level and preserve encoding across stripes
 -

 Key: HIVE-7832
 URL: https://issues.apache.org/jira/browse/HIVE-7832
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.14.0
Reporter: Prasanth J
Assignee: Prasanth J
 Attachments: HIVE-7832.1.patch


 Currently ORC dictionary check happens while writing the stripe. Just before 
 writing stripe if ratio of dictionary entries to total non-null rows is 
 greater than threshold then the dictionary is discarded. Also, the decision 
 of using dictionary or not is preserved across stripes. This sometimes leads 
 to costly insertion cost of O(logn) for each stripes when there are too many 
 distinct keys.



--
This message was sent by Atlassian JIRA
(v6.2#6252)