Phoenix Version: 4.14.2-HBase-1.4
HBase Version: AWS EMR
Release
label:emr-5.24.1
Hadoop distribution:Amazon
Applications:Phoenix 4.14.1,
Hue 4.4.0, HBase 1.4.9
Having an issue where ANY(ARRAY[]) stops the query returning any
results when used in a 'AND' conjunction
e.g (fielda = 'a') AND (fieldb = any(array['a','b','c]))
Always returns zero results, if i change to disjunction (OR) it works
fine but obviously isn't what's wanted here. Excuse the long post
but
I wanted to be as clear as possible.
It's quite possible I have misunderstood the way ANY() works but...
Here's a simple query that returns a correct number of results:
SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
FROM DEV_OAPI.OBSERVATION
WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
'TRIAL00015')
ORDER BY OBSERVATIONDATE DESC
LIMIT 10
OFFSET 0
So there's definitely records where biomaterial name and trialname
have
these values
If I change it to
SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
FROM DEV_OAPI.OBSERVATION
WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
ORDER BY OBSERVATIONDATE DESC
LIMIT 10
OFFSET 0
I get valid results
So the ANY(ARRAY[]) function works
Here's the explain which looks very odd to more, but it works
PLAN
CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN
OVER
DEV_OAPI.OBSERVATION
SERVER FILTER BY
org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
[children=[ARRAY['TRIAL00015'], TRIALNAME =
org.apache.phoenix.expression.function.ArrayElemRefExpression
[children=[ARRAY['TRIAL00015'], 1]]]]
SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE
DESC]
C
CLIENT MERGE
SORT
CLIENT LIMIT 10
So far so good, BUT.
However if I combine the ARRAY expression with any expression using
AND
I get zero results, even tho as above both sides of the conjunction
return true.
e.g.
SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
FROM DEV_OAPI.OBSERVATION
WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME =
ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE']))
ORDER BY OBSERVATIONDATE DESC
LIMIT 10
OFFSET 0
Explain (newlines added):
SERVER FILTER BY
(BIOMATERIALNAME = 'SCION00424'
AND
org.apache.phoenix.expression.function.ArrayAnyComparisonExpression
[children=[ARRAY['TRIAL00015','SOMETHING ELSE'],
TRIALNAME =
org.apache.phoenix.expression.function.ArrayElemRefExpression
[children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]])
Just out of interest I tried with strings only in the array check
SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME
FROM DEV_OAPI.OBSERVATION
WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' =
ANY(ARRAY['TRIAL00015']))
ORDER BY OBSERVATIONDATE DESC
LIMIT 10
OFFSET 0
This works fine (in a kind of unhelpful way)
I have tested using the thick client:
<!-- HBASE THICK CLIENT DEPS -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-core</artifactId>
<version>4.14.2-HBase-1.4</version>
</dependency>
and the thin client
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-queryserver-client</artifactId>
<version>4.14.2-HBase-1.4</version>
</dependency>
I've tried using braces and re-ordering but any query of the form:
<anyclause> AND field = ANY(ARRAY['value1'...])
Returns zero results regardless of values
We can't change version as we are using the Amazon AWS EMR managed
stack and no other phoenix libraries work.
Thanks for taking the time to read this far!
Cheers
Simon