[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2018-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393896#comment-16393896
 ] 

Hudson commented on PHOENIX-4148:
-

SUCCESS: Integrated in Jenkins build PreCommit-PHOENIX-Build #1797 (See 
[https://builds.apache.org/job/PreCommit-PHOENIX-Build/1797/])
PHOENIX-4148 COUNT(DISTINCT(...)) should have a memory size limit (Lars 
(jtaylor: rev 45f856af776a29eaf6ae503d6d3062b277893396)
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/SpillableGroupByIT.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/DistinctValueWithCountServerAggregator.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/memory/GlobalMemoryManager.java
* (edit) phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/ClientAggregators.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/ServerAggregators.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/memory/ChildMemoryManager.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/Aggregators.java


> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: 4148.txt, PHOENIX-4148_v2.patch
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2018-03-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16393884#comment-16393884
 ] 

Hudson commented on PHOENIX-4148:
-

FAILURE: Integrated in Jenkins build Phoenix-4.x-HBase-1.3 #55 (See 
[https://builds.apache.org/job/Phoenix-4.x-HBase-1.3/55/])
PHOENIX-4148 COUNT(DISTINCT(...)) should have a memory size limit (Lars 
(jtaylor: rev e24b29d282266c0146e1e66dee274416e1921dae)
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/ClientAggregators.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/coprocessor/UngroupedAggregateRegionObserver.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/memory/GlobalMemoryManager.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/Aggregators.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/DistinctValueWithCountServerAggregator.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/memory/ChildMemoryManager.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/exception/SQLExceptionCode.java
* (edit) phoenix-core/src/test/java/org/apache/phoenix/util/TestUtil.java
* (edit) 
phoenix-core/src/main/java/org/apache/phoenix/expression/aggregator/ServerAggregators.java
* (edit) 
phoenix-core/src/it/java/org/apache/phoenix/end2end/SpillableGroupByIT.java


> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 4.14.0, 5.0.0
>
> Attachments: 4148.txt, PHOENIX-4148_v2.patch
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2018-03-07 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16390289#comment-16390289
 ] 

James Taylor commented on PHOENIX-4148:
---

Attached final patch with test and specific error code for insufficient memory 
exceptions.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: 4148.txt, PHOENIX-4148_v2.patch
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2018-02-22 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16373707#comment-16373707
 ] 

James Taylor commented on PHOENIX-4148:
---

I'll go ahead and commit your patch unless I hear objections, [~lhofhansl]. I'd 
like to get this into 4.14 as it prevents a potential RS crash. The 
code/approach can always be changed/improved down the road.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
>Priority: Major
> Fix For: 4.14.0
>
> Attachments: 4148.txt
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-12-23 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302570#comment-16302570
 ] 

James Taylor commented on PHOENIX-4148:
---

Just commit what you have - lgtm.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 4.14.0
>
> Attachments: 4148.txt
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-12-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302537#comment-16302537
 ] 

Lars Hofhansl commented on PHOENIX-4148:


Dropped the ball on this. I think it's important to fix this as a 
count(distinct) query could otherwise easily kill a region server.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 4.14.0
>
> Attachments: 4148.txt
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-10-20 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213321#comment-16213321
 ] 

James Taylor commented on PHOENIX-4148:
---

[~lhofhansl] - I think the patch is fine. I'm fine if you want to commit it.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 4148.txt
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-09-13 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164979#comment-16164979
 ] 

Lars Hofhansl commented on PHOENIX-4148:


Hey [~jamestaylor], this is the issue I mentioned. I do not like the patch. If 
you can think of something better I'd love to know.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
> Attachments: 4148.txt
>
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-09-11 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16161698#comment-16161698
 ] 

Lars Hofhansl commented on PHOENIX-4148:


I have a clumsy patch. (one can check for the size of an aggregator, but 
getSize() turns out to be very expensive for the count-distinct aggregator. I 
changed that to be incremental instead. The patch is clumsy (hence not posting 
it). 

Also I still managed to kill the RS by using up all its heap, so perhaps the 
default max memory to use for Phoenix is too big.

I'll continue to look, but it does seem like this needs a bigger discussion. 
Nothing Phoenix does should ever lead to a RegionServer to exhaust its heap... 
This would fail the Phoenix query anyway, so it's far better to fail the 
Phoenix query before that. The memory management in Phoenix is pretty smart, I 
think, it just seems that it needs to be adjusted.


> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-09-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151366#comment-16151366
 ] 

Lars Hofhansl commented on PHOENIX-4148:


Hmm... How do I find the tenant id all the way down in 
DistinctValueWithCountServerAggregator?

Also it seems we're double allocating a bit (unless I am missing something):
* In GroupedAggregateRegionObserver groupByCache we group by the key
* DistinctValueWithCountServerAggregator we group again to get the count



> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-09-01 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151241#comment-16151241
 ] 

James Taylor commented on PHOENIX-4148:
---

Yes, that sounds right. The MemoryManager is really just tracking how much 
memory is used, not doing the actual allocation. So you'd essentially reserve 
each 1MB chunk and then allocate it yourself (or allocate lots of small chunks 
that add up to 1MB) - it doesn't have to match up exactly with what you've 
reserved in the MemoryManager.

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-09-01 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151238#comment-16151238
 ] 

Lars Hofhansl commented on PHOENIX-4148:


My idea was to reserve a chunk of the HEAP (maybe in increments of 1MB), then 
allocate stuff onto the hashmap until the chunk is used up, at which point I'd 
request a new chunk. When finished all chunks would be released. Does that 
sound about right?

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PHOENIX-4148) COUNT(DISTINCT(...)) should have a memory size limit

2017-09-01 Thread James Taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/PHOENIX-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150119#comment-16150119
 ] 

James Taylor commented on PHOENIX-4148:
---

Use null as the tenantId. See example in GroupedAggregateRegionObserver of 
{{GlobalCache.getTenantCache(env, tenantId)}} and this javadoc:
{code}
/**
 * Get the tenant cache associated with the tenantId. If tenantId is not 
applicable, null may be
 * used in which case a global tenant cache is returned.
 * @param env the HBase configuration
 * @param tenantId the tenant ID or null if not applicable.
 * @return TenantCache
 */
public static TenantCache getTenantCache(RegionCoprocessorEnvironment env, 
ImmutableBytesPtr tenantId) {
{code}

> COUNT(DISTINCT(...)) should have a memory size limit
> 
>
> Key: PHOENIX-4148
> URL: https://issues.apache.org/jira/browse/PHOENIX-4148
> Project: Phoenix
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>
> I just managed to kill (hang) a region server by issuing a 
> COUNT(DISTINCT(...)) query over a column with very high cardinality (20m in 
> this case).
> This is perhaps not a useful thing to do, but Phoenix should nonetheless not 
> allow to have a server fail because of a query.
> [~jamestaylor], I see there GlobalMemoryManager, but I do not quite see how 
> I'd get a reference to one, once needs a tenant id, etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)