[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2014-08-09 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091873#comment-14091873
 ] 

Lefty Leverenz commented on HIVE-4324:
--

This added configuration parameter 
*hive.exec.orc.dictionary.key.size.threshold* to HiveConf.java in 0.12.0.  It's 
documented in the wiki here:

* [Configuration Properties -- hive.exec.orc.dictionary.key.size.threshold | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.exec.orc.dictionary.key.size.threshold]
 

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736542#comment-13736542
 ] 

Hudson commented on HIVE-4324:
--

SUCCESS: Integrated in Hive-trunk-h0.21 #2261 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/2261/])
HIVE-4324 : ORC Turn off dictionary encoding when number of distinct keys is 
greater than threshold (Kevin Wilfong & Owen Omalley via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512893)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* /hive/trunk/ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out


> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736444#comment-13736444
 ] 

Hudson commented on HIVE-4324:
--

FAILURE: Integrated in Hive-trunk-hadoop2 #351 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/351/])
HIVE-4324 : ORC Turn off dictionary encoding when number of distinct keys is 
greater than threshold (Kevin Wilfong & Owen Omalley via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512893)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* /hive/trunk/ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out


> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736322#comment-13736322
 ] 

Hudson commented on HIVE-4324:
--

SUCCESS: Integrated in Hive-trunk-hadoop1-ptest #123 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/123/])
HIVE-4324 : ORC Turn off dictionary encoding when number of distinct keys is 
greater than threshold (Kevin Wilfong & Owen Omalley via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512893)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* /hive/trunk/ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out


> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736279#comment-13736279
 ] 

Hudson commented on HIVE-4324:
--

FAILURE: Integrated in Hive-trunk-hadoop2-ptest #53 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/53/])
HIVE-4324 : ORC Turn off dictionary encoding when number of distinct keys is 
greater than threshold (Kevin Wilfong & Owen Omalley via Ashutosh Chauhan) 
(hashutosh: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1512893)
* /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
* /hive/trunk/conf/hive-default.xml.template
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcFile.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java
* 
/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java
* /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
* /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
* /hive/trunk/ql/src/test/queries/clientpositive/orc_dictionary_threshold.q
* /hive/trunk/ql/src/test/resources/orc-file-dump-dictionary-threshold.out
* /hive/trunk/ql/src/test/results/clientpositive/orc_dictionary_threshold.q.out


> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736183#comment-13736183
 ] 

Ashutosh Chauhan commented on HIVE-4324:


+1 LGTM

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13736148#comment-13736148
 ] 

Hive QA commented on HIVE-4324:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597316/HIVE-4324.D12045.3.patch

{color:green}SUCCESS:{color} +1 2776 tests passed

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/386/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/386/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch, HIVE-4324.D12045.3.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13735577#comment-13735577
 ] 

Hive QA commented on HIVE-4324:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597135/HIVE-4324.D12045.2.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2776 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_dictionary_threshold
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/368/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/368/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch, HIVE-4324.D12045.2.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-08 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13734050#comment-13734050
 ] 

Phabricator commented on HIVE-4324:
---

ashutoshc has accepted the revision "HIVE-4324 [jira] ORC Turn off dictionary 
encoding when number of distinct keys is greater than threshold".

  +1 LGTM

REVISION DETAIL
  https://reviews.facebook.net/D12045

BRANCH
  h-4324

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley


> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Fix For: 0.12.0
>
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch, 
> HIVE-4324.D12045.2.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-07 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732595#comment-13732595
 ] 

Phabricator commented on HIVE-4324:
---

ashutoshc has requested changes to the revision "HIVE-4324 [jira] ORC Turn off 
dictionary encoding when number of distinct keys is greater than threshold".

  Mostly looks good, except for some minor nits.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OutStream.java:249 Is it better 
to modify clear to accept compress and suppress arguments ?
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java:768 Good 
to add a javadoc saying this Reader reads strings which doesn't have 
accompanying dictionary.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java:838 
Similarly here, javadoc of effect : This reader reads dictionary encoded 
strings.
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringRedBlackTree.java:166 This 
method could be package private?

REVISION DETAIL
  https://reviews.facebook.net/D12045

BRANCH
  h-4324

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, omalley


> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt, HIVE-4324.D12045.1.patch
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-08-06 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731277#comment-13731277
 ] 

Owen O'Malley commented on HIVE-4324:
-

This patch is extremely stale. I'm in the middle of updating it for trunk.

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-06-28 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695768#comment-13695768
 ] 

Owen O'Malley commented on HIVE-4324:
-

Kevin,
  Sorry in getting back to you. Yes, there are external users who are using the 
java apis directly.

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-06-05 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676104#comment-13676104
 ] 

Kevin Wilfong commented on HIVE-4324:
-

Sorry for the delay Owen.  Are you concerned that there will be applications 
outside of Hive calling methods in OrcFile.java?

If so I can add the backward compatible method.

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-05-17 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13661151#comment-13661151
 ] 

Owen O'Malley commented on HIVE-4324:
-

We should get this committed. Kevin, we can pull the incremental work out to a 
separate jira. One remaining concern is that we should provide a compatible 
OrcFile.createWriter method so that code doesn't break when users upgrade from 
0.11 to 0.12.

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: File Formats
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632701#comment-13632701
 ] 

Namit Jain commented on HIVE-4324:
--

+1

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4324) ORC Turn off dictionary encoding when number of distinct keys is greater than threshold

2013-04-10 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627978#comment-13627978
 ] 

Kevin Wilfong commented on HIVE-4324:
-

https://reviews.facebook.net/D10113

> ORC Turn off dictionary encoding when number of distinct keys is greater than 
> threshold
> ---
>
> Key: HIVE-4324
> URL: https://issues.apache.org/jira/browse/HIVE-4324
> Project: Hive
>  Issue Type: Sub-task
>  Components: Serializers/Deserializers
>Affects Versions: 0.11.0
>Reporter: Kevin Wilfong
>Assignee: Kevin Wilfong
> Attachments: HIVE-4324.1.patch.txt
>
>
> Add a configurable threshold so that if the number of distinct values in a 
> string column is greater than that fraction of non-null values, dictionary 
> encoding is turned off.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira