Hi Khurram, Has 1.7 been released yet? If so that is great. However, the bugs in the code still exist. (assuming my understanding of the functions is correct). I need to use these functions for other complex use cases to iterate through the array.
On Thu, May 19, 2016 at 11:46 AM, Khurram Faraaz <[email protected]> wrote: > repeated_count does work on nested objects, please take a look at the fix > for DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851> > > Here is a working example, I am on Drill 1.7.0 git commit ID: > git.commit.id > =09b262776 > I have used a file in my query below, that is also available on that JIRA > > 0: jdbc:drill:schema=dfs.tmp> select repeated_count(tmp.num_list) from > `Jsn_Arry100.json` tmp; > +---------+ > | EXPR$0 | > +---------+ > | 99999 | > +---------+ > 1 row selected (0.444 seconds) > > On Thu, May 19, 2016 at 7:09 PM, Sunil B <[email protected]> wrote: > > > Hi, > > > > I am experimenting with drill 1.6 to see if it fits our SQL on hadoop > > needs. > > As repeated_count doesn't work on nested objects ( > > https://issues.apache.org/jira/browse/DRILL-1650), I decided to > implement > > my own UDF to do that using a FieldReader. I was hoping that using > > FieldReader would be a generic way to count the no. of elements in an > > array. However, during that process I discovered couple of > inconsistencies > > with some of the FieldReaders. > > > > Here is my test UDF implementation. This is just created to illustrate > the > > issue: > > > > @FunctionTemplate(name="arrayCount", > > scope=FunctionTemplate.FunctionScope.SIMPLE) > > public class ArrayCount implements DrillSimpleFunc { > > > > @Param FieldReader prArray; > > @Output VarCharHolder out; > > > > @Inject DrillBuf buffer; > > > > /** Builds a string as an output. The string contains is in the > > following format: > > * Size:<result of FieldReader.size() function>,Iterating > > Count:<result from counting the no. of Iterations>,<Simple FieldReader > > class name> > > **/ > > public void eval() { > > > > int count = 0; > > StringBuilder sb = new > > StringBuilder().append("Size:").append(prArray.size()).append(","); > > while(prArray.next()) count++; > > sb.append("Iterating > > > > > Count:").append(count).append(",").append(prArray.getClass().getSimpleName()); > > > > byte[] d = sb.toString().getBytes(); > > out.buffer = buffer; > > out.start = 0; > > out.end = d.length; > > buffer.setBytes(0, d); > > } > > public void setup() {} > > } > > > > > > Here's the output from a sample File: > > > > 0: jdbc:drill:zk=local> select t1.b, arrayCount(t1.b) from > > dfs.`/s/tmp/delete/btmpoc/data/a.json` t1; > > +------------+----------------------------------------------------------+ > > | b | EXPR$1 | > > +------------+----------------------------------------------------------+ > > | [1,2] | Size:2,Iterating Count:2,RepeatedBigIntHolderReaderImpl | > > | [1,2,3] | Size:3,Iterating Count:5,RepeatedBigIntHolderReaderImpl | > > | [1,2,3,4] | Size:4,Iterating Count:9,RepeatedBigIntHolderReaderImpl | > > | [] | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl | > > | [] | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl | > > +------------+----------------------------------------------------------+ > > 5 rows selected (0.179 seconds) > > > > > > 0: jdbc:drill:zk=local> select t1.c, arrayCount(t1.c) from > > dfs.`/s/tmp/delete/btmpoc/data/a.json` t1; > > > > > +-----------------------------------------------+-------------------------------------------------+ > > | c | > > EXPR$1 | > > > > > +-----------------------------------------------+-------------------------------------------------+ > > | [{"ca":"11cav"}] | Size:1,Iterating > > Count:1,RepeatedMapReaderImpl | > > | [{"ca":"21cav","cb":"21cbv"},{"ca":"22cav"}] | Size:3,Iterating > > Count:2,RepeatedMapReaderImpl | > > | [{"ca":"31cav"},{"ca":"32cav"}] | Size:3,Iterating > > Count:2,RepeatedMapReaderImpl | > > | [{"ca":"3"}] | Size:2,Iterating > > Count:1,RepeatedMapReaderImpl | > > | [] | Size:0,Iterating > > Count:0,RepeatedMapReaderImpl | > > > > > +-----------------------------------------------+-------------------------------------------------+ > > 5 rows selected (0.115 seconds) > > > > ================================================ > > RepeatedBigIntHolderReaderImpl is generated from HolderReaderImpl.java. I > > think the following line in HolderReaderImpl.java has the issue: > > > > > https://github.com/apache/drill/blob/245da9790813569c5da9404e0fc5e45cc88e22bb/exec/vector/src/main/codegen/templates/HolderReaderImpl.java#L80 > > Maybe we should change it to: if(repeatedHolder.start + index + 1 < > > repeatedHolder.end) > > > > Not sure if the size function of RepeatedMapReaderImpl is implemented > > correctly. > > >
