Hi thanks for the quick response

I'm going to appear thick now, brace yourself.   By client do you mean the 
queryserver running in EMR which does have an hbase-site.xml or my Java API as 
a property, we don't have an hbase-site.xml (unless it's embedded in the thin 
client jar.

Cheers

S
________________________________
From: Kadir Ozdemir <kozde...@salesforce.com>
Sent: 28 September 2021 7:45 PM
To: user <user@phoenix.apache.org>
Subject: Re: Urgent problem with Phoenix function LAST_VALUE and joins

In Phoenix 5.0 we introduced a new server side paging feature to eliminate 
timeouts due long running server side computations such aggregation and joins. 
I wonder if this new feature caused your issue. If so, the work around is to 
disable it by setting phoenix.server.paging.enabled to false in hbase-site.xml. 
This is a client side config param so you just need to restart your client. 
Hope this will fix your issue.

On Mon, Sep 27, 2021 at 9:23 PM Simon Mottram 
<simon.mott...@cucumber.co.nz<mailto:simon.mott...@cucumber.co.nz>> wrote:
Hi

Got my fingers crossed that there's a work around for this as this really is a 
big problem for us

We are using:

Amazon EMR

Release label:emr-6.1.0
Hadoop distribution:Amazon
Applications:Hbase 2.2.5, Hive 3.1.2, Phoenix 5.0.0, Pig 0.17.0

Thin Client version:
phoenix-5.0.0-HBase-2.0-thin-client.jar

We get the following error when doing an LAST_VALUE aggregation where

  1.  A JOIN is empty
  2.  The column is INTEGER or DATETIME

Remote driver error: IllegalArgumentException: offset (25) + length (4) exceed 
the capacity of the array: 25

The query that breaks is:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
JOIN VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
ON DOCID = OBSERVATION_VALUE_ID
  AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe'
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
 OFFSET 0;

I can refactor this using EXIST but get same error, presumably the driver knows 
to treat them the same:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
EXISTS (
SELECT
DOCID
FROM
VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
WHERE
DOCID = OBSERVATION_VALUE_ID
AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
 OFFSET 0;

If we remove the external reference we get no error, regardless of whether 
there are any hits or not

-- these all work
There are no hits for this query

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
BIOMATERIAL_TYPE = 'aardvark'
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
 OFFSET 0;

Lots of hits for this query:

SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10  OFFSET 0;

I've tried weird things like:

Comparing exists to TRUE to try and force it into a normal BOOLEAN value, same 
IllegalArgumentException.

 SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
(EXISTS (
SELECT
DOCID
FROM
VARIABLE_VALUES_QA.OBSERVATION_TAG_INDEX
WHERE
DOCID = OBSERVATION_VALUE_ID
AND TAGNAME = 'TRIAL_ID'
AND TAGVALUE = 'f62dd8e0-d2ea-4d9a-9ab6-2049601bb9fe')) = TRUE
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
 OFFSET 0;

And are you prepared for this one, which throws exact same error:

 SELECT
"BIOMATERIAL_NAME",
FIRST_VALUE("PLANT_FRUIT_COUNT") WITHIN GROUP (
ORDER BY OBSERVATION_DATE DESC) AS "Plant Fruit Count"
FROM
VARIABLE_VALUES_QA.OBSERVATION
WHERE
TRUE
GROUP BY
"BIOMATERIAL_NAME"
LIMIT 10
 OFFSET 0;

Change FIRST_VALUE to AVG and it works fine.

Reply via email to