[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-11 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4324:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Kevin  Owen!

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-10 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4324:
--

Attachment: HIVE-4324.D12045.3.patch

omalley updated the revision HIVE-4324 [jira] ORC Turn off dictionary encoding 
when number of distinct keys is greater than threshold.

  Removed debugging line from q file that was making it pass in my machine
  but fail in jenkins.

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12045

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12045?vs=37245id=37521#toc

BRANCH
  h-4324

ARCANIST PROJECT
  hive

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out
  ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out

To: JIRA, ashutoshc, omalley


 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4324:


Status: Open  (was: Patch Available)

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4324:


Status: Patch Available  (was: Open)

Toggling patch available to try to get a rebuild by jenkins.

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-09 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4324:


Attachment: HIVE-4324.D12045.2.patch

re-uploading the same patch to retry jenkins job

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4324:
--

Attachment: HIVE-4324.D12045.1.patch

omalley requested code review of HIVE-4324 [jira] ORC Turn off dictionary 
encoding when number of distinct keys is greater than threshold.

Reviewers: JIRA

forward port of kevin's patch

Add a configurable threshold so that if the number of distinct values in a 
string column is greater than that fraction of non-null values, dictionary 
encoding is turned off.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D12045

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out
  ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28797/

To: JIRA, omalley


 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4324:
--

Attachment: HIVE-4324.D12045.2.patch

omalley updated the revision HIVE-4324 [jira] ORC Turn off dictionary encoding 
when number of distinct keys is greater than threshold.

  I addressed Ashutosh's feedback.

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D12045

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D12045?vs=37185id=37245#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
  ql/src/test/resources/orc-file-dump-dictionary-threshold.out
  ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out

To: JIRA, ashutoshc, omalley


 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-4324:


Fix Version/s: 0.12.0
   Status: Patch Available  (was: Open)

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: File Formats
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Fix For: 0.12.0

 Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
 HIVE-4324.D12045.2.patch


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-04-24 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4324:
-

Status: Open  (was: Patch Available)

can you address Owen's comments ?

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-04-10 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4324:


Attachment: HIVE-4324.1.patch.txt

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-04-10 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-4324:


Status: Patch Available  (was: Open)

 ORC Turn off dictionary encoding when number of distinct keys is greater than 
 threshold
 ---

 Key: HIVE-4324
 URL: https://issues.apache.org/jira/browse/HIVE-4324
 Project: Hive
  Issue Type: Sub-task
  Components: Serializers/Deserializers
Affects Versions: 0.11.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-4324.1.patch.txt


 Add a configurable threshold so that if the number of distinct values in a 
 string column is greater than that fraction of non-null values, dictionary 
 encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira