[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-11-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811189#comment-13811189
 ] 

Hive QA commented on HIVE-3959:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12611523/HIVE-3959.6.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4547 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_smb_mapjoin_8
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/105/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/105/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, 
> HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-11-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811172#comment-13811172
 ] 

Hive QA commented on HIVE-3959:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12611523/HIVE-3959.6.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4547 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/104/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/104/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, 
> HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-10-31 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811015#comment-13811015
 ] 

Hive QA commented on HIVE-3959:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12611523/HIVE-3959.6.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4547 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucketmapjoin7
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_parallel_orderby
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/95/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/95/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, 
> HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-10-31 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810886#comment-13810886
 ] 

Thejas M Nair commented on HIVE-3959:
-

+1

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.5.patch, 
> HIVE-3959.6.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-10-31 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13810730#comment-13810730
 ] 

Thejas M Nair commented on HIVE-3959:
-

Can you please rebase the patch  (as HIVE-5610 changes are in) ?


> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.3.patch, HIVE-3959.4.patch, HIVE-3959.4.patch, HIVE-3959.patch.1, 
> HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-10-30 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13809854#comment-13809854
 ] 

Thejas M Nair commented on HIVE-3959:
-

updated review board with comments

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.4.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-10-25 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13805775#comment-13805775
 ] 

Hive QA commented on HIVE-3959:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12610142/HIVE-3959.4.patch

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 4481 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_like_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_join_star
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_pcr
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_union_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_create_table_serde
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/1237/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/1237/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.2.patch, HIVE-3959.3.patch, 
> HIVE-3959.4.patch, HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-10-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803456#comment-13803456
 ] 

Ashutosh Chauhan commented on HIVE-3959:


[~gangtimliu] Would you like to review the rebased patch? 

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Attachments: HIVE-3959.1.patch, HIVE-3959.patch.1, 
> HIVE-3959.patch.11.txt, HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-09-29 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781438#comment-13781438
 ] 

Gang Tim Liu commented on HIVE-3959:


Yes,assign it to dilip

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Dilip Joseph
>Priority: Minor
> Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780918#comment-13780918
 ] 

Ashutosh Chauhan commented on HIVE-3959:


This is useful work. [~bmadhvani] / [~gangtimliu] Do you guys want to refresh 
this patch?

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Gang Tim Liu
>Priority: Minor
> Attachments: HIVE-3959.patch.1, HIVE-3959.patch.11.txt, 
> HIVE-3959.patch.12.txt, HIVE-3959.patch.2
>
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-04-02 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13620364#comment-13620364
 ] 

Gang Tim Liu commented on HIVE-3959:


rebase https://reviews.facebook.net/D9885

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Gang Tim Liu
>Priority: Minor
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3959) Update Partition Statistics in Metastore Layer

2013-02-04 Thread Bhushan Mandhani (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570910#comment-13570910
 ] 

Bhushan Mandhani commented on HIVE-3959:


Diff out at https://reviews.facebook.net/D8271 Still need some minor updates 
before I can submit the patch.

> Update Partition Statistics in Metastore Layer
> --
>
> Key: HIVE-3959
> URL: https://issues.apache.org/jira/browse/HIVE-3959
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore, Statistics
>Reporter: Bhushan Mandhani
>Assignee: Bhushan Mandhani
>Priority: Minor
>
> When partitions are created using queries ("insert overwrite" and "insert 
> into") then the StatsTask updates all stats. However, when partitions are 
> added directly through metadata-only partitions (either CLI or direct calls 
> to Thrift Metastore) no stats are populated even if hive.stats.reliable is 
> set to true. This puts us in a situation where we can't decide if stats are 
> truly reliable or not.
> We propose that the "fast stats" (numFiles and totalSize) which don't require 
> a scan of the data should always be populated and be completely reliable. For 
> now we are still excluding rowCount and rawDataSize because that will make 
> these operations very expensive. Currently they are quick metadata-only ops.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira