Hi Khurram, Has 1.7 been released yet? If so that is great.

However, the bugs in the code still exist. (assuming my understanding of
the functions is correct). I need to use these functions for other complex
use cases to iterate through the array.



On Thu, May 19, 2016 at 11:46 AM, Khurram Faraaz <[email protected]>
wrote:

> repeated_count does work on nested objects, please take a look at the fix
> for DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851>
>
> Here is a working example, I am on Drill 1.7.0 git commit ID:
> git.commit.id
> =09b262776
> I have used a file in my query below, that is also available on that JIRA
>
> 0: jdbc:drill:schema=dfs.tmp> select repeated_count(tmp.num_list) from
> `Jsn_Arry100.json` tmp;
> +---------+
> | EXPR$0  |
> +---------+
> | 99999   |
> +---------+
> 1 row selected (0.444 seconds)
>
> On Thu, May 19, 2016 at 7:09 PM, Sunil B <[email protected]> wrote:
>
> > Hi,
> >
> > I am experimenting with drill 1.6 to see if it fits our SQL on hadoop
> > needs.
> > As repeated_count doesn't work on nested objects (
> > https://issues.apache.org/jira/browse/DRILL-1650), I decided to
> implement
> > my own UDF to do that using a FieldReader. I was hoping that using
> > FieldReader would be a generic way to count the no. of elements in an
> > array. However, during that process I discovered couple of
> inconsistencies
> > with some of the FieldReaders.
> >
> > Here is my test UDF implementation. This is just created to illustrate
> the
> > issue:
> >
> > @FunctionTemplate(name="arrayCount",
> > scope=FunctionTemplate.FunctionScope.SIMPLE)
> > public class ArrayCount implements DrillSimpleFunc {
> >
> >     @Param FieldReader prArray;
> >     @Output VarCharHolder out;
> >
> >     @Inject DrillBuf buffer;
> >
> >     /** Builds a string as an output. The string contains is in the
> > following format:
> >      *  Size:<result of FieldReader.size() function>,Iterating
> > Count:<result from counting the no. of Iterations>,<Simple FieldReader
> > class name>
> >     **/
> >     public void eval() {
> >
> >         int count = 0;
> >         StringBuilder sb = new
> > StringBuilder().append("Size:").append(prArray.size()).append(",");
> >         while(prArray.next()) count++;
> >         sb.append("Iterating
> >
> >
> Count:").append(count).append(",").append(prArray.getClass().getSimpleName());
> >
> >         byte[] d = sb.toString().getBytes();
> >         out.buffer = buffer;
> >         out.start = 0;
> >         out.end = d.length;
> >         buffer.setBytes(0, d);
> >     }
> >     public void setup() {}
> > }
> >
> >
> > Here's the output from a sample File:
> >
> > 0: jdbc:drill:zk=local> select t1.b, arrayCount(t1.b)  from
> > dfs.`/s/tmp/delete/btmpoc/data/a.json` t1;
> > +------------+----------------------------------------------------------+
> > |     b      |                          EXPR$1                          |
> > +------------+----------------------------------------------------------+
> > | [1,2]      | Size:2,Iterating Count:2,RepeatedBigIntHolderReaderImpl  |
> > | [1,2,3]    | Size:3,Iterating Count:5,RepeatedBigIntHolderReaderImpl  |
> > | [1,2,3,4]  | Size:4,Iterating Count:9,RepeatedBigIntHolderReaderImpl  |
> > | []         | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl  |
> > | []         | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl  |
> > +------------+----------------------------------------------------------+
> > 5 rows selected (0.179 seconds)
> >
> >
> > 0: jdbc:drill:zk=local> select t1.c, arrayCount(t1.c)  from
> > dfs.`/s/tmp/delete/btmpoc/data/a.json` t1;
> >
> >
> +-----------------------------------------------+-------------------------------------------------+
> > |                       c                       |
> > EXPR$1                      |
> >
> >
> +-----------------------------------------------+-------------------------------------------------+
> > | [{"ca":"11cav"}]                              | Size:1,Iterating
> > Count:1,RepeatedMapReaderImpl  |
> > | [{"ca":"21cav","cb":"21cbv"},{"ca":"22cav"}]  | Size:3,Iterating
> > Count:2,RepeatedMapReaderImpl  |
> > | [{"ca":"31cav"},{"ca":"32cav"}]               | Size:3,Iterating
> > Count:2,RepeatedMapReaderImpl  |
> > | [{"ca":"3"}]                                  | Size:2,Iterating
> > Count:1,RepeatedMapReaderImpl  |
> > | []                                            | Size:0,Iterating
> > Count:0,RepeatedMapReaderImpl  |
> >
> >
> +-----------------------------------------------+-------------------------------------------------+
> > 5 rows selected (0.115 seconds)
> >
> > ================================================
> > RepeatedBigIntHolderReaderImpl is generated from HolderReaderImpl.java. I
> > think the following line in HolderReaderImpl.java has the issue:
> >
> >
> https://github.com/apache/drill/blob/245da9790813569c5da9404e0fc5e45cc88e22bb/exec/vector/src/main/codegen/templates/HolderReaderImpl.java#L80
> > Maybe we should change it to: if(repeatedHolder.start + index + 1 <
> > repeatedHolder.end)
> >
> > Not sure if the size function of RepeatedMapReaderImpl is implemented
> > correctly.
> >
>

Reply via email to