It could be not the real cache expiration (which should not be considered a bug), since your increasing the cache live time didn't solve the problem. So the problem might be the cache had not been sent over to that server at all, which then would be a bug, and mostly likely it would be because the client didn't do it right.
So starting a new client after the problem happens should be a good test of the above theory. Anyway, what's the approximate time of running a count(*) on your test.table2? Thanks, Maryann On Tue, Jul 7, 2015 at 1:53 PM, Alex Kamil <[email protected]> wrote: > Maryann, > > is this patch only for the client? as we saw the error in regionserver > logs and it seems that server side cache has expired > > also by "start a new process doing the same query" do you mean start two > client instances and run the query from one then from the other client? > > thanks > Alex > > On Tue, Jul 7, 2015 at 1:20 PM, Maryann Xue <[email protected]> wrote: > >> My question was actually if the problem appears on your cluster, will it >> go away if you just start a new process doing the same query? I do have a >> patch, but it only fixes the problem I assume here, and it might be >> something else. >> >> >> Thanks, >> Maryann >> >> On Tue, Jul 7, 2015 at 12:59 PM, Alex Kamil <[email protected]> wrote: >> >>> a patch would be great, we saw that this problem goes away in standalone >>> mode but reappears on the cluster >>> >>> On Tue, Jul 7, 2015 at 12:56 PM, Alex Kamil <[email protected]> >>> wrote: >>> >>>> sure, sounds good >>>> >>>> On Tue, Jul 7, 2015 at 10:57 AM, Maryann Xue <[email protected]> >>>> wrote: >>>> >>>>> Hi Alex, >>>>> >>>>> I suspect it's related to using cached region locations that might >>>>> have been invalid. A simple way to verify this is try starting a new java >>>>> process doing this query and see if the problem goes away. >>>>> >>>>> >>>>> Thanks, >>>>> Maryann >>>>> >>>>> On Mon, Jul 6, 2015 at 10:56 PM, Maryann Xue <[email protected]> >>>>> wrote: >>>>> >>>>>> Thanks a lot for the details, Alex! That might be a bug if it failed >>>>>> only on cluster and increasing cache alive time didn't not help. Would >>>>>> you >>>>>> mind testing it out for me if I provide a simple patch tomorrow? >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Maryann >>>>>> >>>>>> On Mon, Jul 6, 2015 at 9:09 PM, Alex Kamil <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> one more thing - the same query (via tenant connection) works in >>>>>>> standalone mode but fails on a cluster. >>>>>>> I've tried modifying phoenix.coprocessor.maxServerCacheTimeToLiveMs >>>>>>> <https://phoenix.apache.org/tuning.html> from the default 30000(ms) >>>>>>> to 300000 with no effect >>>>>>> >>>>>>> On Mon, Jul 6, 2015 at 7:35 PM, Alex Kamil <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> also pls note that it only fails with tenant-specific connections >>>>>>>> >>>>>>>> On Mon, Jul 6, 2015 at 7:17 PM, Alex Kamil <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Maryann, >>>>>>>>> >>>>>>>>> here is the query, I don't see warnings >>>>>>>>> SELECT '\''||C.ROWKEY||'\'' AS RK, C.VS FROM test.table1 AS C >>>>>>>>> JOIN (SELECT DISTINCT B.ROWKEY, B.VS FROM test.table2 AS B) B ON >>>>>>>>> (C.ROWKEY=B.ROWKEY AND C.VS=B.VS) LIMIT 2147483647; >>>>>>>>> >>>>>>>>> thanks >>>>>>>>> Alex >>>>>>>>> >>>>>>>>> On Fri, Jul 3, 2015 at 10:36 PM, Maryann Xue < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Hi Alex, >>>>>>>>>> >>>>>>>>>> Most likely what happened was as suggested by the error message: >>>>>>>>>> the cache might have expired. Could you please check if there are any >>>>>>>>>> Phoenix warnings in the client log and share your query? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Maryann >>>>>>>>>> >>>>>>>>>> On Fri, Jul 3, 2015 at 4:01 PM, Alex Kamil <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> getting this error with phoenix 3.3.0/hbase 0.94.15, any ideas? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> org.apache.phoenix.exception.PhoenixIOException: >>>>>>>>>>> org.apache.phoenix.exception.PhoenixIOException: >>>>>>>>>>> org.apache.hadoop.hbase.DoNotRetryIOException: Could not find hash >>>>>>>>>>> cache for joinId: ???Z >>>>>>>>>>> ^XI??. The cache might have expired >>>>>>>>>>> >>>>>>>>>>> and have been removed. >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:96) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:511) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.MergeSortResultIterator.getIterators(MergeSortResultIterator.java:48) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.MergeSortResultIterator.minIterator(MergeSortResultIterator.java:84) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.MergeSortResultIterator.next(MergeSortResultIterator.java:111) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.LimitingResultIterator.next(LimitingResultIterator.java:47) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.iterate.DelegateResultIterator.next(DelegateResultIterator.java:44) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:739) >>>>>>>>>>> >>>>>>>>>>> at >>>>>>>>>>> org.apache.commons.dbcp.DelegatingResultSet.next(DelegatingResultSet.java:207) >>>>>>>>>>> >>>>>>>>>>> thanks >>>>>>>>>>> Alex >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
