repeated_count does work on nested objects, please take a look at the fix
for DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851>

Here is a working example, I am on Drill 1.7.0 git commit ID: git.commit.id
=09b262776
I have used a file in my query below, that is also available on that JIRA

0: jdbc:drill:schema=dfs.tmp> select repeated_count(tmp.num_list) from
`Jsn_Arry100.json` tmp;
+---------+
| EXPR$0  |
+---------+
| 99999   |
+---------+
1 row selected (0.444 seconds)

On Thu, May 19, 2016 at 7:09 PM, Sunil B <[email protected]> wrote:

> Hi,
>
> I am experimenting with drill 1.6 to see if it fits our SQL on hadoop
> needs.
> As repeated_count doesn't work on nested objects (
> https://issues.apache.org/jira/browse/DRILL-1650), I decided to implement
> my own UDF to do that using a FieldReader. I was hoping that using
> FieldReader would be a generic way to count the no. of elements in an
> array. However, during that process I discovered couple of inconsistencies
> with some of the FieldReaders.
>
> Here is my test UDF implementation. This is just created to illustrate the
> issue:
>
> @FunctionTemplate(name="arrayCount",
> scope=FunctionTemplate.FunctionScope.SIMPLE)
> public class ArrayCount implements DrillSimpleFunc {
>
>     @Param FieldReader prArray;
>     @Output VarCharHolder out;
>
>     @Inject DrillBuf buffer;
>
>     /** Builds a string as an output. The string contains is in the
> following format:
>      *  Size:<result of FieldReader.size() function>,Iterating
> Count:<result from counting the no. of Iterations>,<Simple FieldReader
> class name>
>     **/
>     public void eval() {
>
>         int count = 0;
>         StringBuilder sb = new
> StringBuilder().append("Size:").append(prArray.size()).append(",");
>         while(prArray.next()) count++;
>         sb.append("Iterating
>
> Count:").append(count).append(",").append(prArray.getClass().getSimpleName());
>
>         byte[] d = sb.toString().getBytes();
>         out.buffer = buffer;
>         out.start = 0;
>         out.end = d.length;
>         buffer.setBytes(0, d);
>     }
>     public void setup() {}
> }
>
>
> Here's the output from a sample File:
>
> 0: jdbc:drill:zk=local> select t1.b, arrayCount(t1.b)  from
> dfs.`/s/tmp/delete/btmpoc/data/a.json` t1;
> +------------+----------------------------------------------------------+
> |     b      |                          EXPR$1                          |
> +------------+----------------------------------------------------------+
> | [1,2]      | Size:2,Iterating Count:2,RepeatedBigIntHolderReaderImpl  |
> | [1,2,3]    | Size:3,Iterating Count:5,RepeatedBigIntHolderReaderImpl  |
> | [1,2,3,4]  | Size:4,Iterating Count:9,RepeatedBigIntHolderReaderImpl  |
> | []         | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl  |
> | []         | Size:0,Iterating Count:9,RepeatedBigIntHolderReaderImpl  |
> +------------+----------------------------------------------------------+
> 5 rows selected (0.179 seconds)
>
>
> 0: jdbc:drill:zk=local> select t1.c, arrayCount(t1.c)  from
> dfs.`/s/tmp/delete/btmpoc/data/a.json` t1;
>
> +-----------------------------------------------+-------------------------------------------------+
> |                       c                       |
> EXPR$1                      |
>
> +-----------------------------------------------+-------------------------------------------------+
> | [{"ca":"11cav"}]                              | Size:1,Iterating
> Count:1,RepeatedMapReaderImpl  |
> | [{"ca":"21cav","cb":"21cbv"},{"ca":"22cav"}]  | Size:3,Iterating
> Count:2,RepeatedMapReaderImpl  |
> | [{"ca":"31cav"},{"ca":"32cav"}]               | Size:3,Iterating
> Count:2,RepeatedMapReaderImpl  |
> | [{"ca":"3"}]                                  | Size:2,Iterating
> Count:1,RepeatedMapReaderImpl  |
> | []                                            | Size:0,Iterating
> Count:0,RepeatedMapReaderImpl  |
>
> +-----------------------------------------------+-------------------------------------------------+
> 5 rows selected (0.115 seconds)
>
> ================================================
> RepeatedBigIntHolderReaderImpl is generated from HolderReaderImpl.java. I
> think the following line in HolderReaderImpl.java has the issue:
>
> https://github.com/apache/drill/blob/245da9790813569c5da9404e0fc5e45cc88e22bb/exec/vector/src/main/codegen/templates/HolderReaderImpl.java#L80
> Maybe we should change it to: if(repeatedHolder.start + index + 1 <
> repeatedHolder.end)
>
> Not sure if the size function of RepeatedMapReaderImpl is implemented
> correctly.
>

Reply via email to