[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-02-07 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311024#comment-14311024
 ] 

Lefty Leverenz commented on HIVE-4639:
--

Doc note:  [~prasanth_j] documented this in the ORC wiki.

* [ORC -- Column Statistics | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC#LanguageManualORC-ColumnStatistics]

But it says the hasNull flag is added in 1.2.0 -- shouldn't that be 1.1.0, 
since this jira's fix version is 0.15?

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Fix For: 0.15.0

 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-02-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311026#comment-14311026
 ] 

Prasanth Jayachandran commented on HIVE-4639:
-

Good catch! [~leftylev]. Updated the docs!

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Fix For: 0.15.0

 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272090#comment-14272090
 ] 

Hive QA commented on HIVE-4639:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12691023/HIVE-4639.3.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 6747 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_joins
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2311/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2311/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2311/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12691023 - PreCommit-HIVE-TRUNK-Build

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272122#comment-14272122
 ] 

Gopal V commented on HIVE-4639:
---

for the sake of documentation this does not change the ORC format version (i.e 
ORC files with hasNull flags can be read by hive-14).

[~leftylev]: FYI.

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Fix For: 0.15.0

 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-09 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272141#comment-14272141
 ] 

Lefty Leverenz commented on HIVE-4639:
--

Thanks [~gopalv].  I assume that means no documentation is needed, since this 
is internal and backward-compatible.

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Fix For: 0.15.0

 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch, HIVE-4639.3.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270348#comment-14270348
 ] 

Hive QA commented on HIVE-4639:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12690690/HIVE-4639.2.patch

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 6747 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testColumnsWithNullAndCompression
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithoutNull
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplex
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplexOldFormat
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testSerdeStatsOldFormat
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testStringAndBinaryStatistics
org.apache.hive.hcatalog.streaming.TestStreaming.testEndpointConnection
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2296/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2296/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2296/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12690690 - PreCommit-HIVE-TRUNK-Build

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268884#comment-14268884
 ] 

Gopal V commented on HIVE-4639:
---

Added this patch to my daily TPC-H 1Tb ETL  reloaded lineitem with the new 
format.

Testing {{select * from lineitem where l_shipdate is null;}}.

Before: 66.728 seconds (208774320430 bytes read)
After: 7.87 seconds  (539046900 bytes read)

LGTM - +1.

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch, HIVE-4639.2.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-07 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268053#comment-14268053
 ] 

Owen O'Malley commented on HIVE-4639:
-

You should encode four values:
  no_values, all_nulls, some_nulls, no_nulls

This will allow you to support a richer set of sargs.

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-07 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268071#comment-14268071
 ] 

Gopal V commented on HIVE-4639:
---

Yes, we have that granularity locked up in two states (as a tri-state, now - 
all_nulls, some_nulls, no_nulls).

We actually have all_nulls/no_values encoded as min=null/max=null. This patch 
is the some_nulls/no_nulls boolean on top of that - though, that information 
is in somewhat non-obvious detail.

Another thought occurs, that since we have a whole long stream of IS_PRESENT 
already, I suspect storing the actual NULL count would be somewhat helpful, if 
we need to have a heuristic for IS_NULL row-level predicate evaluation for wide 
de-normalized tables (i.e read filter col first and then avoid creating large 
vector batches for the rest).

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-07 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14268099#comment-14268099
 ] 

Prasanth Jayachandran commented on HIVE-4639:
-

As Gopal mentioned, we can infer the other stats from the existing information
all_nulls - min = null
no_nulls - hasNull = false
some_nulls - hasNull = true, min != null

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2015-01-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267182#comment-14267182
 ] 

Hive QA commented on HIVE-4639:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12690444/HIVE-4639.1.patch

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 6731 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_full
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_extrapolate_part_stats_partial
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_predicate_pushdown
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_alter_merge_stats_orc
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_optimization2
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_optimize_nullscan
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_analyze
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_ptf
org.apache.hadoop.hive.ql.io.orc.TestInputOutputFormat.testCombinationInputFormatWithAcid
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.test1[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testReadFormat_0_11[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testReadFormat_0_11[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[0]
org.apache.hadoop.hive.ql.io.orc.TestOrcFile.testStringAndBinaryStatistics[1]
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testColumnsWithNullAndCompression
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithNull
org.apache.hadoop.hive.ql.io.orc.TestOrcNullOptimization.testMultiStripeWithoutNull
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplex
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testOrcSerDeStatsComplexOldFormat
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testSerdeStatsOldFormat
org.apache.hadoop.hive.ql.io.orc.TestOrcSerDeStats.testStringAndBinaryStatistics
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2274/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/2274/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-2274/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12690444 - PreCommit-HIVE-TRUNK-Build

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Prasanth Jayachandran
 Attachments: HIVE-4639.1.patch


 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-4639) Add has null flag to ORC internal index

2013-06-04 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675338#comment-13675338
 ] 

Prasanth J commented on HIVE-4639:
--

[~owen.omalley]are you working on this issue? If not I can take over this issue.

 Add has null flag to ORC internal index
 ---

 Key: HIVE-4639
 URL: https://issues.apache.org/jira/browse/HIVE-4639
 Project: Hive
  Issue Type: Improvement
  Components: File Formats
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 It would enable more predicate pushdown if we added a flag to the index entry 
 recording if there were any null values in the column for the 10k rows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira