Update: Just in case anyone hits this issue in future with the AWS managed HBase, the fix is to use a very specific version of the driver
For thick client: <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-core</artifactId> <version>4.14.1-HBase-1.4</version> </dependency> For thin client: <dependency> <groupId>org.apache.phoenix</groupId> <artifactId>phoenix-queryserver-client</artifactId> <version>4.14.1-HBase-1.4</version> </dependency> This required support help from AWS as this driver version is not mentioned on the official Apache Phoenix download page Regards Simon On Sun, 2019-11-17 at 21:03 +0000, Simon Mottram wrote: > Phoenix Version: 4.14.2-HBase-1.4 > HBase Version: AWS EMR > Release > label:emr-5.24.1 > Hadoop distribution:Amazon > Applications:Phoenix 4.14.1, > Hue 4.4.0, HBase 1.4.9 > > > Having an issue where ANY(ARRAY[]) stops the query returning any > results when used in a 'AND' conjunction > > e.g (fielda = 'a') AND (fieldb = any(array['a','b','c])) > > Always returns zero results, if i change to disjunction (OR) it works > fine but obviously isn't what's wanted here. Excuse the long post > but > I wanted to be as clear as possible. > > It's quite possible I have misunderstood the way ANY() works but... > > Here's a simple query that returns a correct number of results: > > SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME > FROM DEV_OAPI.OBSERVATION > WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME = > 'TRIAL00015') > ORDER BY OBSERVATIONDATE DESC > LIMIT 10 > OFFSET 0 > > So there's definitely records where biomaterial name and trialname > have > these values > > If I change it to > > SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME > FROM DEV_OAPI.OBSERVATION > WHERE (TRIALNAME = ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE'])) > ORDER BY OBSERVATIONDATE DESC > LIMIT 10 > OFFSET 0 > > I get valid results > > So the ANY(ARRAY[]) function works > > Here's the explain which looks very odd to more, but it works > > PLAN > > > > CLIENT 1-CHUNK 50574 ROWS 314572800 BYTES PARALLEL 1-WAY FULL SCAN > OVER > DEV_OAPI.OBSERVATION > > > > SERVER FILTER BY > org.apache.phoenix.expression.function.ArrayAnyComparisonExpression > [children=[ARRAY['TRIAL00015'], TRIALNAME = > org.apache.phoenix.expression.function.ArrayElemRefExpression > [children=[ARRAY['TRIAL00015'], 1]]]] > SERVER TOP 10 ROWS SORTED BY [OBSERVATIONDATE > DESC] > C > CLIENT MERGE > SORT > > CLIENT LIMIT 10 > > So far so good, BUT. > > However if I combine the ARRAY expression with any expression using > AND > I get zero results, even tho as above both sides of the conjunction > return true. > > e.g. > > SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME > FROM DEV_OAPI.OBSERVATION > WHERE (BIOMATERIALNAME = 'ROOT00386') AND (TRIALNAME = > ANY(ARRAY['TRIAL00015', 'SOMETHING ELSE'])) > ORDER BY OBSERVATIONDATE DESC > LIMIT 10 > OFFSET 0 > > Explain (newlines added): > SERVER FILTER BY > (BIOMATERIALNAME = 'SCION00424' > AND > org.apache.phoenix.expression.function.ArrayAnyComparisonExpression > [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], > TRIALNAME = > org.apache.phoenix.expression.function.ArrayElemRefExpression > [children=[ARRAY['TRIAL00015','SOMETHING ELSE'], 1]]]]) > > Just out of interest I tried with strings only in the array check > > SELECT BIOMATERIALNAME, OBSERVATIONDATE, TRIALNAME > FROM DEV_OAPI.OBSERVATION > WHERE (BIOMATERIALNAME = 'ROOT00386') AND ('TRIAL00015' = > ANY(ARRAY['TRIAL00015'])) > ORDER BY OBSERVATIONDATE DESC > LIMIT 10 > OFFSET 0 > > This works fine (in a kind of unhelpful way) > > I have tested using the thick client: > <!-- HBASE THICK CLIENT DEPS --> > <dependency> > <groupId>org.apache.phoenix</groupId> > <artifactId>phoenix-core</artifactId> > <version>4.14.2-HBase-1.4</version> > </dependency> > > and the thin client > > <dependency> > <groupId>org.apache.phoenix</groupId> > <artifactId>phoenix-queryserver-client</artifactId> > <version>4.14.2-HBase-1.4</version> > </dependency> > > I've tried using braces and re-ordering but any query of the form: > <anyclause> AND field = ANY(ARRAY['value1'...]) > Returns zero results regardless of values > > We can't change version as we are using the Amazon AWS EMR managed > stack and no other phoenix libraries work. > > Thanks for taking the time to read this far! > > Cheers > > Simon