[jira] [Comment Edited] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2018-02-15 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366496#comment-16366496
 ] 

James Taylor edited comment on PHOENIX-4333 at 2/16/18 6:18 AM:


Attached WIP patch with all the above implemented. Still need to fix 
ExplainPlanWithStatsEnabledIT. Here are the current failures:
{code}
[ERROR] Tests run: 24, Failures: 3, Errors: 2, Skipped: 0, Time elapsed: 64.285 
s <<< FAILURE! - in org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT
[ERROR] 
testSelectQueriesWithStatsForParallelizationOn(org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT)
  Time elapsed: 2.387 s  <<< FAILURE!
java.lang.AssertionError: expected:<10> but was:<9>
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testSelectQueriesWithFilters(ExplainPlanWithStatsEnabledIT.java:669)
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testSelectQueriesWithStatsForParallelizationOn(ExplainPlanWithStatsEnabledIT.java:629)

[ERROR] 
testBytesRowsForSelectWhenKeyOutOfRange(org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT)
  Time elapsed: 0.012 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testBytesRowsForSelectWhenKeyOutOfRange(ExplainPlanWithStatsEnabledIT.java:116)

[ERROR] 
testBytesRowsForSelectOnTenantViews(org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT)
  Time elapsed: 4.654 s  <<< FAILURE!
java.lang.AssertionError: expected:<2000> but was:
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testBytesRowsForSelectOnTenantViews(ExplainPlanWithStatsEnabledIT.java:426)

[ERROR] 
testSelectQueriesWithStatsForParallelizationOff(org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT)
  Time elapsed: 2.322 s  <<< FAILURE!
java.lang.AssertionError: expected:<10> but was:<9>
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testSelectQueriesWithFilters(ExplainPlanWithStatsEnabledIT.java:669)
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testSelectQueriesWithStatsForParallelizationOff(ExplainPlanWithStatsEnabledIT.java:624)

[ERROR] 
testEstimatesForAggregateQueries(org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT)
  Time elapsed: 2.324 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.phoenix.end2end.ExplainPlanWithStatsEnabledIT.testEstimatesForAggregateQueries(ExplainPlanWithStatsEnabledIT.java:560)

[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 155.149 
s - in org.apache.phoenix.end2end.TenantSpecificTablesDDLIT
[INFO] Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 149.108 
s - in org.apache.phoenix.end2end.TenantSpecificTablesDMLIT
[INFO] Tests run: 52, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 280.504 
s - in org.apache.phoenix.end2end.ViewIT
[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   ExplainPlanWithStatsEnabledIT.testBytesRowsForSelectOnTenantViews:426 
expected:<2000> but was:
[ERROR]   
ExplainPlanWithStatsEnabledIT.testSelectQueriesWithStatsForParallelizationOff:624->testSelectQueriesWithFilters:669
 expected:<10> but was:<9>
[ERROR]   
ExplainPlanWithStatsEnabledIT.testSelectQueriesWithStatsForParallelizationOn:629->testSelectQueriesWithFilters:669
 expected:<10> but was:<9>
[ERROR] Errors: 
[ERROR]   
ExplainPlanWithStatsEnabledIT.testBytesRowsForSelectWhenKeyOutOfRange:116 
NullPointer
[ERROR]   ExplainPlanWithStatsEnabledIT.testEstimatesForAggregateQueries:560 
NullPointer
[INFO] 
{code}


was (Author: jamestaylor):
Attached WIP patch with all the above implemented. Still need to fix 
ExplainPlanWithStatsEnabledIT.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch, PHOENIX-4333_wip1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236115#comment-16236115
 ] 

James Taylor edited comment on PHOENIX-4333 at 11/2/17 6:25 PM:


We really want to answer the question "Is there a guidepost within every 
region?". Whether a guidepost then intersects the scan is not the check we 
need. For example, you may have a query doing a skip scan which would fail the 
intersection test, but still have a guidepost in the region.

I think if you always set the endRegionKey (instead of only when it's a local 
index) here before the inner loop:
{code}
endRegionKey = regionInfo.getEndKey();
if (isLocalIndex) {
{code}
and then after the inner loop, check that we set currentKeyBytes (which means 
we entered the loop) or that the currentGuidePost is less than the region end 
key, then that's enough, since we know that the currentGuidePost is already 
bigger than the start region key. The check for endKey == stopKey is a small 
optimization, since we don't need to do the key comparison again if that's not 
the case since we've already done it as we entered the loop (see comment below).
{code}
// We have a guide post in the region if the above loop was entered
// or if the current key is less than the region end key (since the loop
// may not have been entered if our scan end key is smaller than the
// first guide post in that region).
gpsAvailableForAllRegions &= 
currentKeyBytes != initialKeyBytes || 
( endKey == stopKey && // If not comparing against region boundary
  ( endRegionKey.length == 0 || // then check if gp is in the region
currentGuidePost.compareTo(endRegionKey) < 0) );
{code}

Does this not pass all of your tests?


was (Author: jamestaylor):
We really want to answer the question "Is there a guidepost within every 
region?". Whether a guidepost then intersects the scan is not the check we 
need. For example, you may have a query doing a skip scan which would fail the 
intersection test, but still have a guidepost in the region.

I think if you always set the endRegionKey (instead of only when it's a local 
index) here before the inner loop:
{code}
endRegionKey = regionInfo.getEndKey();
if (isLocalIndex) {
{code}
and then after the inner loop, check that we set currentKeyBytes (which means 
we entered the loop) or that the currentGuidePost is less than the region end 
key, then that's enough, since we know that the currentGuidePost is already 
bigger than the start region key. The check for endKey == stopKey is a small 
optimization, since we don't need to do the key comparison again if that's not 
the case since we've already done it as we entered the loop (see comment below).
{code}
// We have a guide post in previous region if the above loop 
was entered
// or if the current key is less than the region end key (since 
the loop
// may not have been entered if our scan end key is smaller 
than the first
// guide post in that region
gpsAvailableForAllRegions &= currentKeyBytes != initialKeyBytes 
|| 
(endKey == stopKey && (endRegionKey.length == 0 || 
currentGuidePost.compareTo(endRegionKey) < 0));
{code}

Does this not pass all of your tests?

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236115#comment-16236115
 ] 

James Taylor edited comment on PHOENIX-4333 at 11/2/17 6:19 PM:


We really want to answer the question "Is there a guidepost within every 
region?". Whether a guidepost then intersects the scan is not the check we 
need. For example, you may have a query doing a skip scan which would fail the 
intersection test, but still have a guidepost in the region.

I think if you always set the endRegionKey (instead of only when it's a local 
index) here before the inner loop:
{code}
endRegionKey = regionInfo.getEndKey();
if (isLocalIndex) {
{code}
and then after the inner loop, check that we set currentKeyBytes (which means 
we entered the loop) or that the currentGuidePost is less than the region end 
key, then that's enough, since we know that the currentGuidePost is already 
bigger than the start region key. The check for endKey == stopKey is a small 
optimization, since we don't need to do the key comparison again if that's not 
the case since we've already done it as we entered the loop (see comment below).
{code}
// We have a guide post in previous region if the above loop 
was entered
// or if the current key is less than the region end key (since 
the loop
// may not have been entered if our scan end key is smaller 
than the first
// guide post in that region
gpsAvailableForAllRegions &= currentKeyBytes != initialKeyBytes 
|| 
(endKey == stopKey && (endRegionKey.length == 0 || 
currentGuidePost.compareTo(endRegionKey) < 0));
{code}

Does this not pass all of your tests?


was (Author: jamestaylor):
We really want to answer the question "Is there a guidepost within every 
region?". Whether a guidepost then intersects the scan is not the check we 
need. For example, you may have a query doing a skip scan which would fail the 
intersection test, but still have a guidepost in the region.

I think if you always set the endRegionKey (instead of only when it's a local 
index) here before the inner loop:
{code}
endRegionKey = regionInfo.getEndKey();
if (isLocalIndex) {
{code}
and then after the inner loop, check that we set currentKeyBytes (which means 
we entered the loop) or that the currentGuidePost is less than the region end 
key, then that's enough, since we know that the currentGuidePost is already 
bigger than the start region key. The check for endKey == stopKey is a small 
optimization, since we don't need to do the key comparison again if that's not 
the case since we've already done it as we entered the loop (see comment below).
{code}
// We have a guide post in previous region if the above loop 
was entered
// or if the current key is less than the region end key (since 
the loop
// may not have been entered if our scan end key is smaller 
than the first
// guide post in that region
hasGuidePostInAllRegions &= currentKeyBytes != initialKeyBytes 
|| (endKey == stopKey && currentGuidePost.compareTo(endRegionKey) < 0;)
{code}

Does this not pass all of your tests?

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227597#comment-16227597
 ] 

Samarth Jain edited comment on PHOENIX-4333 at 10/31/17 9:12 PM:
-

Test which demonstrates the issue that [~mujtabachohan] brought up. I would say 
it is working as designed. We call these estimates for a reason :). If the user 
desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view.

FYI, [~cody.mar...@gmail.com] 


was (Author: samarthjain):
Test which demonstrates the issue that [~mujtabachohan] brought up. I would say 
it is working fine. We call these estimates for a reason :). If the user 
desires more accuracy, he/she should call UPDATE STATISTICS on the tenant view.

FYI, [~cody.mar...@gmail.com] 

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Attachments: PHOENIX-4333_test.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)