[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2020-12-08 Thread Tristan Stevens (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246083#comment-17246083
 ] 

Tristan Stevens commented on HIVE-11266:


[~findepi] this is true however with managed (i.e. non external) tables then 
modifying the underlying data without performing a REFRESH is not supported. 
With external tables however it is expected behaviour. This is essentially the 
definition of MANAGED vs. EXTERNAL.

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2020-12-08 Thread Piotr Findeisen (Jira)


[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245953#comment-17245953
 ] 

Piotr Findeisen commented on HIVE-11266:


{quote}This is not just external tables - any tables where users are directly 
modifying the underlying data can be impacted by this.
{quote}
 
{quote}Yes, I agree with you, external table is just my personal use 
case.{quote}
 
[~tmgstev] [~simobatt] was there a follow-up issue to this?
>From the attached patch (same as 
>[https://github.com/apache/hive/commit/a2dff9e13acc62ecc0388b3b2e221f26c9184dbb)]
> i see this was fixed for external tables only.
 

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-10-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197551#comment-16197551
 ] 

Hive QA commented on HIVE-11266:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12891109/HIVE-11266.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 11191 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[stats_noscan_2] 
(batchId=117)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7197/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7197/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7197/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12891109 - PreCommit-HIVE-Build

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-10-09 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197401#comment-16197401
 ] 

Jesus Camacho Rodriguez commented on HIVE-11266:


Adding test to the patch.

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Attachments: HIVE-11266.01.patch, HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-10-09 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16197143#comment-16197143
 ] 

Ashutosh Chauhan commented on HIVE-11266:
-

+1

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Attachments: HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics for external tables

2017-10-07 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195776#comment-16195776
 ] 

Hive QA commented on HIVE-11266:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12890844/HIVE-11266.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 11190 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_predicate_pushdown]
 (batchId=231)
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_single_sourced_multi_insert]
 (batchId=231)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[optimize_nullscan]
 (batchId=162)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_explainuser_1]
 (batchId=171)
org.apache.hadoop.hive.cli.TestTezPerfCliDriver.testCliDriver[query23] 
(batchId=239)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/7181/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/7181/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-7181/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12890844 - PreCommit-HIVE-Build

> count(*) wrong result based on table statistics for external tables
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Jesus Camacho Rodriguez
>Priority: Blocker
> Attachments: HIVE-11266.patch
>
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2017-07-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16102369#comment-16102369
 ] 

Sergey Shelukhin commented on HIVE-11266:
-

[~pxiong] the issue is still there for external tables. The semantics for 
external tables in Hive are not well defined, but many people assume (and I 
agree) that it's ok to manually manage these using file operations, which 
invalidates the stats without Hive knowing about it.
I don't think this setting should be used for external tables.
Thoughts?
cc [~ashutoshc] [~hagleitn]


> count(*) wrong result based on table statistics
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2017-03-07 Thread Tristan Stevens (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900854#comment-15900854
 ] 

Tristan Stevens commented on HIVE-11266:


If Hive is still serving results directly from the stats then with external 
tables it cannot guarantee their accuracy.

> count(*) wrong result based on table statistics
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2017-03-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900851#comment-15900851
 ] 

Pengcheng Xiong commented on HIVE-11266:


I see. We changed a lot since then. This should be already fixed in the recent 
Hive versions.

> count(*) wrong result based on table statistics
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2017-03-07 Thread Simone (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900832#comment-15900832
 ] 

Simone commented on HIVE-11266:
---

It was Hive 1.1.0 in CDH distribution

> count(*) wrong result based on table statistics
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2017-03-07 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900364#comment-15900364
 ] 

Pengcheng Xiong commented on HIVE-11266:


Hello there, which version of hive are you using? I saw you put a lable 1.1.0 
as affected versions. Does that mean that you are using Hive 1.1? Thanks.

> count(*) wrong result based on table statistics
> ---
>
> Key: HIVE-11266
> URL: https://issues.apache.org/jira/browse/HIVE-11266
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Simone Battaglia
>Assignee: Pengcheng Xiong
>Priority: Critical
>
> Hive returns wrong count result on an external table with table statistics if 
> I change table data files.
> This is the scenario in details:
> 1) create external table my_table (...) location 'my_location';
> 2) analyze table my_table compute statistics;
> 3) change/add/delete one or more files in 'my_location' directory;
> 4) select count(\*) from my_table;
> In this case the count query doesn't generate a MR job and returns the result 
> based on table statistics. This result is wrong because is based on 
> statistics stored in the Hive metastore and doesn't take into account 
> modifications introduced on data files.
> Obviously setting "hive.compute.query.using.stats" to FALSE this problem 
> doesn't occur but the default value of this property is TRUE.
> I thinks that also this post on stackoverflow, that shows another type of bug 
> in case of multiple insert, is related to the one that I reported:
> http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2015-07-23 Thread Tristan Stevens (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638644#comment-14638644
 ] 

Tristan Stevens commented on HIVE-11266:


This is not just external tables - any tables where users are directly 
modifying the underlying data can be impacted by this.

 count(*) wrong result based on table statistics
 ---

 Key: HIVE-11266
 URL: https://issues.apache.org/jira/browse/HIVE-11266
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Simone
Priority: Critical

 Hive returns wrong count result on an external table with table statistics if 
 I change table data files.
 This is the scenario in details:
 1) create external table my_table (...) location 'my_location';
 2) analyze table my_table compute statistics;
 3) change/add/delete one or more files in 'my_location' directory;
 4) select count(\*) from my_table;
 In this case the count query doesn't generate a MR job and returns the result 
 based on table statistics. This result is wrong because is based on 
 statistics stored in the Hive metastore and doesn't take into account 
 modifications introduced on data files.
 Obviously setting hive.compute.query.using.stats to FALSE this problem 
 doesn't occur but the default value of this property is TRUE.
 I thinks that also this post on stackoverflow, that shows another type of bug 
 in case of multiple insert, is related to the one that I reported:
 http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-11266) count(*) wrong result based on table statistics

2015-07-23 Thread Simone (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638659#comment-14638659
 ] 

Simone commented on HIVE-11266:
---

Yes, I agree with you, external table is just my personal use case.

 count(*) wrong result based on table statistics
 ---

 Key: HIVE-11266
 URL: https://issues.apache.org/jira/browse/HIVE-11266
 Project: Hive
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Simone
Priority: Critical

 Hive returns wrong count result on an external table with table statistics if 
 I change table data files.
 This is the scenario in details:
 1) create external table my_table (...) location 'my_location';
 2) analyze table my_table compute statistics;
 3) change/add/delete one or more files in 'my_location' directory;
 4) select count(\*) from my_table;
 In this case the count query doesn't generate a MR job and returns the result 
 based on table statistics. This result is wrong because is based on 
 statistics stored in the Hive metastore and doesn't take into account 
 modifications introduced on data files.
 Obviously setting hive.compute.query.using.stats to FALSE this problem 
 doesn't occur but the default value of this property is TRUE.
 I thinks that also this post on stackoverflow, that shows another type of bug 
 in case of multiple insert, is related to the one that I reported:
 http://stackoverflow.com/questions/24080276/wrong-result-for-count-in-hive-table



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)