repeated_count does work on nested objects, please take a look at the fix for DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851>
Here is a working example, I am on Drill 1.7.0 git commit ID: git.commit.id =09b262776 I have used a file in my query below, that is also available on that JIRA 0: jdbc:drill:schema=dfs.tmp> select repeated_count(tmp.num_list) from `Jsn_Arry100.json` tmp; +---------+ | EXPR$0 | +---------+ | 99999 | +---------+ 1 row selected (0.444 seconds) On Thu, May 19, 2016 at 7:09 PM, Sunil B <[email protected]> wrote: > Hi, > > I am experimenting with drill 1.6 to see if it fits our SQL on hadoop > needs. > As repeated_count doesn't work on nested objects ( > https://issues.apache.org/jira/browse/DRILL-1650), I decided to implement > my own UDF to do that using a FieldReader. I was hoping that using > FieldReader would be a generic way to count the no. of elements in an > array. However, during that process I discovered couple of inconsistencies > with some of the FieldReaders. > > Here is my test UDF implementation. This is just created to illustrate the > issue: > > @FunctionTemplate(name="arrayCount", > scope=FunctionTemplate.FunctionScope.SIMPLE) > public class ArrayCount implements DrillSimpleFunc { > > @Param FieldReader prArray; > @Output VarCharHolder out; > > @Inject DrillBuf buffer; > > /** Builds a string as an output. The string contains is in the > following format: > * Size:<result of FieldReader.size() function>,Iterating > Count:<result from counting the no. of Iterations>,<Simple FieldReader > class name> > **/ > public void eval() { > > int count = 0; > StringBuilder sb = new > StringBuilder().append("Size:").append(prArray.size()).append(","); > while(prArray.next()) count++; > sb.append("Iterating > > Count:").append(count).append(",").append(prArray.getClass().getSimpleName()); > > byte[] d = sb.toString().getBytes(); > out.buffer = buffer; > out.start = 0; > out.end = d.length; > buffer.setBytes(0, d); > } > public void setup() {} > } > > > Here's the output from a sample File: > > 0: jdbc:drill:zk=local> select t1.b, arrayCount(t1.b) from > dfs.`/s/tmp/delete/btmpoc/data/a.json` t1; > +------------+----------------------------------------------------------+ > | b | EXPR$1 | > +------------+----------------------------------------------------------+ > | [1,2] | Size:2,Iterating Count:2,RepeatedBigIntHolderReaderImpl | > | [1,2,3] | Size:3,Iterating Count:5,RepeatedBigIntHolderReaderImpl | > | [1,2,3,4] | Size:4,Iterating Count:9,RepeatedBigIntHolderReaderImpl | > | [] | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl | > | [] | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl | > +------------+----------------------------------------------------------+ > 5 rows selected (0.179 seconds) > > > 0: jdbc:drill:zk=local> select t1.c, arrayCount(t1.c) from > dfs.`/s/tmp/delete/btmpoc/data/a.json` t1; > > +-----------------------------------------------+-------------------------------------------------+ > | c | > EXPR$1 | > > +-----------------------------------------------+-------------------------------------------------+ > | [{"ca":"11cav"}] | Size:1,Iterating > Count:1,RepeatedMapReaderImpl | > | [{"ca":"21cav","cb":"21cbv"},{"ca":"22cav"}] | Size:3,Iterating > Count:2,RepeatedMapReaderImpl | > | [{"ca":"31cav"},{"ca":"32cav"}] | Size:3,Iterating > Count:2,RepeatedMapReaderImpl | > | [{"ca":"3"}] | Size:2,Iterating > Count:1,RepeatedMapReaderImpl | > | [] | Size:0,Iterating > Count:0,RepeatedMapReaderImpl | > > +-----------------------------------------------+-------------------------------------------------+ > 5 rows selected (0.115 seconds) > > ================================================ > RepeatedBigIntHolderReaderImpl is generated from HolderReaderImpl.java. I > think the following line in HolderReaderImpl.java has the issue: > > https://github.com/apache/drill/blob/245da9790813569c5da9404e0fc5e45cc88e22bb/exec/vector/src/main/codegen/templates/HolderReaderImpl.java#L80 > Maybe we should change it to: if(repeatedHolder.start + index + 1 < > repeatedHolder.end) > > Not sure if the size function of RepeatedMapReaderImpl is implemented > correctly. >
