[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2018-02-15 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16366496#comment-16366496
 ] 

James Taylor commented on PHOENIX-4333:
---

Attached WIP patch with all the above implemented. Still need to fix 
ExplainPlanWithStatsEnabledIT.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: James Taylor
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch, PHOENIX-4333_wip1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236638#comment-16236638
 ] 

James Taylor commented on PHOENIX-4333:
---

Two other corner case:
# Handle the case where there's a single region. In that case, we can use the 
time estimate from the single row we have in gps table.
# Handle case where there's a guidepost in the first region, but it's *before* 
the startKey. We'll need to tweak this loop to stop first slightly sooner (when 
we're past the start key of the first region) so we know if there's a guidepost 
in the first region. If we enter the loop, then we have a gps for that region. 
Note too there are a couple of minor changes here that make sense to make, such 
as setting intersectWithGuidePosts and not checking the key length each time 
through the loop since it's not changing.
{code}
int startRegionIndex = regionIndex;
boolean gpsForFirstRegion = false;
try {
if (gpsSize > 0) {
stream = new ByteArrayInputStream(guidePosts.get(), 
guidePosts.getOffset(), guidePosts.getLength());
input = new DataInputStream(stream);
decoder = new PrefixByteDecoder(gps.getMaxLength());
try {
byte[] firstRegionStartKey = 
regionLocations.get(regionIndex).getRegionInfo().getStartKey();
if (firstRegionStartKey.getLength() > 0) {
// Walk guideposts until we're past the first region 
start key
while (firstRegionStartKey.compareTo(currentGuidePost = 
PrefixByteCodec.decode(decoder, input)) >= 0) {
gpsForFirstRegion = true;
minGuidePostTimestamp = Math.min(estimateTs,
gps.getGuidePostTimestamps()[guideIndex]);
guideIndex++;
}
// Continue walking guideposts until we get past the 
currentKey
while (currentKey.compareTo(currentGuidePost = 
PrefixByteCodec.decode(decoder, input)) >= 0) {
minGuidePostTimestamp = Math.min(estimateTs,
gps.getGuidePostTimestamps()[guideIndex]);
guideIndex++;
}
}
} catch (EOFException e) {
// expected. Thrown when we have decoded all guide posts.
intersectWithGuidePosts = false;
}
}
{code}
# Then we'll want to consider {{gpsForFirstRegion}} in our setting of 
{{gpsAvailableForAllRegions}}. This would be necessary if the currentKey (i.e. 
the start key) is after the gps, but before the endKey.
{code}
// We have a guide post in the region if the above loop was entered
// or if the current key is less than the region end key (since the loop
// may not have been entered if our scan end key is smaller than the
// first guide post in that region).
gpsAvailableForAllRegions &= 
currentKeyBytes != initialKeyBytes || 
( gpsForFirstRegion && regionIndex == startRegionIndex ) ||
( endKey == stopKey && // If not comparing against region boundary
  ( endRegionKey.length == 0 || // then check if gp is in the region
currentGuidePost.compareTo(endRegionKey) < 0) );
{code}

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236313#comment-16236313
 ] 

James Taylor commented on PHOENIX-4333:
---

Also, looking at ExplainPlanWithStatsEnabledIT.testSelectQueriesWithFilters(), 
the region boundaries are not going to intersect as expected with the 
guideposts, since the split points are using raw bytes which won't have the 
sign bit flipped. Below is what you want to do as Phoenix will do the right 
thing in that case wrt to data types. Some other tests need to be changed as 
well - I'd recommend just always having the SPLIT clause in the CREATE TABLE 
statement as it's just more clear.
{code}
private void testSelectQueriesWithFilters(boolean 
useStatsForParallelization) throws Exception {
String tableName = generateUniqueName();
try (Connection conn = DriverManager.getConnection(getUrl())) {
int guidePostWidth = 20;
String ddl =
"CREATE TABLE " + tableName + " (k INTEGER PRIMARY KEY, a 
bigint, b bigint) "
+ " GUIDE_POSTS_WIDTH=" + guidePostWidth
+ ", USE_STATS_FOR_PARALLELIZATION=" + 
useStatsForParallelization + " SPLIT ON (102,105,108)";
conn.createStatement().execute(ddl);
conn.createStatement().execute("upsert into " + tableName + " 
values (100,100,3)");
{code}

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236115#comment-16236115
 ] 

James Taylor commented on PHOENIX-4333:
---

We really want to answer the question "Is there a guidepost within every 
region?". Whether a guidepost then intersects the scan is not the check we 
need. For example, you may have a query doing a skip scan which would fail the 
intersection test, but still have a guidepost in the region.

I think if you always set the endRegionKey (instead of only when it's a local 
index) here before the inner loop:
{code}
endRegionKey = regionInfo.getEndKey();
if (isLocalIndex) {
{code}
and then after the inner loop, check that we set currentKeyBytes (which means 
we entered the loop) or that the currentGuidePost is less than the region end 
key, then that's enough, since we know that the currentGuidePost is already 
bigger than the start region key. The check for endKey == stopKey is a small 
optimization, since we don't need to do the key comparison again if that's not 
the case since we've already done it as we entered the loop (see comment below).
{code}
// We have a guide post in previous region if the above loop 
was entered
// or if the current key is less than the region end key (since 
the loop
// may not have been entered if our scan end key is smaller 
than the first
// guide post in that region
hasGuidePostInAllRegions &= currentKeyBytes != initialKeyBytes 
|| (endKey == stopKey && currentGuidePost.compareTo(endRegionKey) < 0;)
{code}

Does this not pass all of your tests?

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235303#comment-16235303
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Is it a safe assumption to make that if intersectScan is returning a non-null 
value, then we have an intersection? 

{code}
Scan newScan = scanRanges.intersectScan(scan, currentKeyBytes, 
currentGuidePostBytes, keyOffset,
false);
if (newScan != null) {
 // guide post was available in the 
}
{code}

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235294#comment-16235294
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Good point, [~jamestaylor]. I don't think my check would work in the below case:

REGION 1 - VIEW1 and VIEW2
REGION2 - VIEW2 and VIEW3

If we collect stats for VIEW1 and VIEW3, then even though both regions have 
stats, they don't have stats for VIEW2. I think I would also need to check 
whether there any guidepost intersected for the region.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch, 
> PHOENIX-4333_v2.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235275#comment-16235275
 ] 

James Taylor commented on PHOENIX-4333:
---

Does your check handle the case in which multiple regions are scanned and one 
in the middle has no guide posts? Not sure I understand why the check needs to 
be in the catch, but not a big deal.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235262#comment-16235262
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Actually, the check needs to be done inside this catch block:

{code}
catch (EOFException e) {
// We have read all guide posts

}
{code}

And if we are doing there, I think the check I had makes it easier to 
understand what's going on, IMHO.

{code}
+if (regionIndex < stopIndex) {
+/*
+ * We don't have guide posts available for all 
regions. So in this case we
+ * conservatively say that we cannot provide 
estimates
+ */
+gpsAvailableForAllRegions = false;
+}
 }
{code}



> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235253#comment-16235253
 ] 

Samarth Jain commented on PHOENIX-4333:
---

Ah, I see. Yes, that's true. Let me update the patch.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235250#comment-16235250
 ] 

James Taylor commented on PHOENIX-4333:
---

Haven’t tested it, but if currentKeyBytes gets set during the inner loop, then 
that means we’ve found at least one gp, no? Just a bit simpler way to detect 
that. If that doesn’t work, the way you have it is fine too.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-02 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235245#comment-16235245
 ] 

Samarth Jain commented on PHOENIX-4333:
---

It might be a late night and lack of coffee but I am not sure I see the 
co-relation here.
{code}
gpsAvailableForAllRegions &= initialKeyBytes != currentKeyBytes;
{code}

We set initialKeyBytes to currentKeyBytes when we know we are not using stats 
for parallelisation.
{code}
if (!useStatsForParallelization) {
/*
 * If we are not using stats for generating parallel scans, 
we need to reset the
 * currentKey back to what it was at the beginning of the 
loop.
 */
currentKeyBytes = initialKeyBytes;
}
{code}

bq. I also think we should set the estimatedRows and estimatedSize to what 
we've found, but only set estimateInfoTimestamp to null if 
!gpsAvailableForAllRegions. That way callers can choose to use or not use the 
partial estimates based on estimateInfoTimestamp.

Makes sense.


> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-01 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235217#comment-16235217
 ] 

James Taylor commented on PHOENIX-4333:
---

The concept is good, but how about outside of the loop if you just have this 
check:
{code}
} catch (EOFException e) {
// We have read all guide posts
intersectWithGuidePosts = false;
}
}
+  gpsAvailableForAllRegions &= initialKeyBytes != currentKeyBytes;
if (!useStatsForParallelization) {
{code}
I also think we should set the estimatedRows and estimatedSize to what we've 
found, but only set estimateInfoTimestamp to null if 
!gpsAvailableForAllRegions. That way callers can choose to use or not use the 
partial estimates based on estimateInfoTimestamp.
{code}
 this.estimatedRows = estimates.rowsEstimate;
 this.estimatedSize = estimates.bytesEstimate;
 this.estimateInfoTimestamp = gpsAvailableForAllRegions ? 
estimateTs : null;
{code}

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch, PHOENIX-4333_v1.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-11-01 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233745#comment-16233745
 ] 

Hudson commented on PHOENIX-4333:
-

SUCCESS: Integrated in Jenkins build Phoenix-master #1856 (See 
[https://builds.apache.org/job/Phoenix-master/1856/])
PHOENIX-4333 Test to demonstrate partial stats information for tenant (samarth: 
rev a39633169b75ceef340782379dd6c3c51d960142)
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/ExplainPlanWithStatsEnabledIT.java


> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>Priority: Major
> Attachments: PHOENIX-4333_test.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227639#comment-16227639
 ] 

Samarth Jain commented on PHOENIX-4333:
---

I have committed the test to the master, 4.x-HBase-0.98 and 4.12* branches.

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
> Attachments: PHOENIX-4333_test.patch
>
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4333) Stats - Incorrect estimate when stats are updated on a tenant specific view

2017-10-31 Thread Samarth Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226388#comment-16226388
 ] 

Samarth Jain commented on PHOENIX-4333:
---

I am not sure what is the best option here. We possibly shouldn't be relying on 
the EST_INFO_TS for tenant views since in situations like these overlaps, we 
may have incomplete guide post info for a view. The user can possibly call 
update stats on the view after the first data load. And then subsequently rely 
on major compaction to collect stats for it.

[~jamestaylor], WDYT?

> Stats - Incorrect estimate when stats are updated on a tenant specific view
> ---
>
> Key: PHOENIX-4333
> URL: https://issues.apache.org/jira/browse/PHOENIX-4333
> Project: Phoenix
>  Issue Type: Bug
>Affects Versions: 4.12.0
>Reporter: Mujtaba Chohan
>Assignee: Samarth Jain
>
> Consider two tenants A, B with tenant specific view on 2 separate 
> regions/region servers.
> {noformat}
> Region 1 keys:
> A,1
> A,2
> B,1
> Region 2 keys:
> B,2
> B,3
> {noformat}
> When stats are updated on tenant A view. Querying stats on tenant B view 
> yield partial results (only contains stats for B,1) which are incorrect even 
> though it shows updated timestamp as current.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)