Could not find hash cache for joinId

Maryann Xue Wed, 08 Jul 2015 21:17:21 -0700

Hi Alex,

Could you please try this new patch?



Thanks,
Maryann

On Wed, Jul 8, 2015 at 3:53 PM, Maryann Xue <maryann....@gmail.com
<javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote:

> Thanks again for all this information! Would you mind checking a couple
> more things for me? For test.table1, does it have its regions on all region
> servers in your cluster? And for region servers whose logs have that error
> message, do they have table1's regions and what are the startkeys of those
> regions?
>
>
> Thanks,
> Maryann
>
> On Wed, Jul 8, 2015 at 3:05 PM, Alex Kamil <alex.ka...@gmail.com
> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>
>> Maryann,
>>
>>
>> - the patch didn't help when applied to the client (we havent put it on
>> the server yet)
>> - starting another client instance in a separate jvm and running the
>> query there after the query fails on the first client  - returns the same
>> error
>> - the counts are : table1: 68834 rows, table2: 2138 rows
>> - to support multitenancy we currently set "MULTI_TENANT=true" in the
>> CREATE stmt
>> - we use tenant-based connection with apache dbcp connection pool using
>> this code:
>>
>> *BasicDataSource ds = new BasicDataSource();*
>>
>> *ds.setDriverClassName("org.apache.phoenix.jdbc.PhoenixDriver");*
>>
>> *ds.setUrl("jdbc:phoenix:" + url);*
>>
>> *ds.setInitialSize(50);*
>>
>> *if (tenant != null) ds.setConnectionProperties("TenantId=" + tenant);*
>>
>> *return ds;*
>> - when we don't use tenant based connection there is no error
>> - verified that the tenant_id used in tenant connection has access to
>> the records (created with the same tenant_id)
>> - the problem occurs only on the cluster but works in stand-alone mode
>>
>> - are there any settings to be set on server or client side in the code
>> or in hbase-site.xml to enable multitenancy?
>> - were there any bug fixes related to multitenancy or cache management in
>> joins since 3.3.0
>>
>> thanks
>> Alex
>>
>> On Tue, Jul 7, 2015 at 2:22 PM, Maryann Xue <maryann....@gmail.com
>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote:
>>
>>> It could be not the real cache expiration (which should not be
>>> considered a bug), since your increasing the cache live time didn't solve
>>> the problem. So the problem might be the cache had not been sent over to
>>> that server at all, which then would be a bug, and mostly likely it would
>>> be because the client didn't do it right.
>>>
>>> So starting a new client after the problem happens should be a good test
>>> of the above theory.
>>>
>>> Anyway, what's the approximate time of running a count(*) on your
>>> test.table2?
>>>
>>>
>>> Thanks,
>>> Maryann
>>>
>>> On Tue, Jul 7, 2015 at 1:53 PM, Alex Kamil <alex.ka...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>
>>>> Maryann,
>>>>
>>>> is this patch only for the client? as we saw the error in regionserver
>>>> logs and it seems that server side cache has expired
>>>>
>>>> also by "start a new process doing the same query" do you mean start
>>>> two client instances and run the query from one then from the other client?
>>>>
>>>> thanks
>>>> Alex
>>>>
>>>> On Tue, Jul 7, 2015 at 1:20 PM, Maryann Xue <maryann....@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote:
>>>>
>>>>> My question was actually if the problem appears on your cluster, will
>>>>> it go away if you just start a new process doing the same query? I do have
>>>>> a patch, but it only fixes the problem I assume here, and it might be
>>>>> something else.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Maryann
>>>>>
>>>>> On Tue, Jul 7, 2015 at 12:59 PM, Alex Kamil <alex.ka...@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>>>
>>>>>> a patch would be great, we saw that this problem goes away in
>>>>>> standalone mode but reappears on the cluster
>>>>>>
>>>>>> On Tue, Jul 7, 2015 at 12:56 PM, Alex Kamil <alex.ka...@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>>>>
>>>>>>> sure, sounds good
>>>>>>>
>>>>>>> On Tue, Jul 7, 2015 at 10:57 AM, Maryann Xue <maryann....@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote:
>>>>>>>
>>>>>>>> Hi Alex,
>>>>>>>>
>>>>>>>> I suspect it's related to using cached region locations that might
>>>>>>>> have been invalid. A simple way to verify this is try starting a new 
>>>>>>>> java
>>>>>>>> process doing this query and see if the problem goes away.
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Maryann
>>>>>>>>
>>>>>>>> On Mon, Jul 6, 2015 at 10:56 PM, Maryann Xue <maryann....@gmail.com
>>>>>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote:
>>>>>>>>
>>>>>>>>> Thanks a lot for the details, Alex! That might be a bug if it
>>>>>>>>> failed only on cluster and increasing cache alive time didn't not 
>>>>>>>>> help.
>>>>>>>>> Would you mind testing it out for me if I provide a simple patch 
>>>>>>>>> tomorrow?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Maryann
>>>>>>>>>
>>>>>>>>> On Mon, Jul 6, 2015 at 9:09 PM, Alex Kamil <alex.ka...@gmail.com
>>>>>>>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>>>>>>>
>>>>>>>>>> one more thing - the same query (via tenant connection) works in
>>>>>>>>>> standalone mode but fails on a cluster.
>>>>>>>>>> I've tried modifying
>>>>>>>>>> phoenix.coprocessor.maxServerCacheTimeToLiveMs
>>>>>>>>>> <https://phoenix.apache.org/tuning.html> from the default
>>>>>>>>>> 30000(ms) to 300000 with no effect
>>>>>>>>>>
>>>>>>>>>> On Mon, Jul 6, 2015 at 7:35 PM, Alex Kamil <alex.ka...@gmail.com
>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>>>>>>>>
>>>>>>>>>>> also pls note that it only fails with tenant-specific connections
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jul 6, 2015 at 7:17 PM, Alex Kamil <alex.ka...@gmail.com
>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Maryann,
>>>>>>>>>>>>
>>>>>>>>>>>> here is the query, I don't see warnings
>>>>>>>>>>>> SELECT '\''||C.ROWKEY||'\'' AS RK, C.VS FROM  test.table1 AS C
>>>>>>>>>>>> JOIN (SELECT DISTINCT B.ROWKEY, B.VS FROM test.table2 AS B) B
>>>>>>>>>>>> ON (C.ROWKEY=B.ROWKEY AND C.VS=B.VS) LIMIT 2147483647;
>>>>>>>>>>>>
>>>>>>>>>>>> thanks
>>>>>>>>>>>> Alex
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 3, 2015 at 10:36 PM, Maryann Xue <
>>>>>>>>>>>> maryann....@gmail.com
>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','maryann....@gmail.com');>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Alex,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Most likely what happened was as suggested by the error
>>>>>>>>>>>>> message: the cache might have expired. Could you please check if 
>>>>>>>>>>>>> there are
>>>>>>>>>>>>> any Phoenix warnings in the client log and share your query?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Maryann
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 3, 2015 at 4:01 PM, Alex Kamil <
>>>>>>>>>>>>> alex.ka...@gmail.com
>>>>>>>>>>>>> <javascript:_e(%7B%7D,'cvml','alex.ka...@gmail.com');>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> getting this error with phoenix 3.3.0/hbase 0.94.15, any
>>>>>>>>>>>>>> ideas?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> org.apache.phoenix.exception.PhoenixIOException: 
>>>>>>>>>>>>>> org.apache.phoenix.exception.PhoenixIOException: 
>>>>>>>>>>>>>> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find 
>>>>>>>>>>>>>> hash cache for joinId: ???Z
>>>>>>>>>>>>>> ^XI??. The cache might have expired
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> and have been removed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:96)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:511)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>         at 
>>>>>>>>>>>>>> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks
>>>>>>>>>>>>>> Alex
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

could_not_find_hash_cache.patch
Description: Binary data

Could not find hash cache for joinId

Reply via email to