[jira] [Commented] (DRILL-6963) create/aggregate/work with array

2019-11-29 Thread benj (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985067#comment-16985067
 ] 

benj commented on DRILL-6963:
-

For the second point (arry_agg), in attempt of an eventual official function, 
here is a simple implementation that can do that (without possibility to 
_DISTINCT_ or _ORDER BY_) 
{code:java}
package org.apache.drill.contrib.function;

import io.netty.buffer.DrillBuf;
import org.apache.drill.exec.expr.DrillAggFunc;
import org.apache.drill.exec.expr.annotations.FunctionTemplate;
import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope;
import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
import org.apache.drill.exec.expr.annotations.Output;
import org.apache.drill.exec.expr.annotations.Param;
import org.apache.drill.exec.expr.annotations.Workspace;
import org.apache.drill.exec.expr.holders.*;

import javax.inject.Inject;

// If dataset is too large, need : ALTER SESSION SET `planner.enable_hashagg` = 
false
public class ArrayAgg {

// STRING NULLABLE //   
@FunctionTemplate(
name = "array_agg",
scope = FunctionScope.POINT_AGGREGATE,
nulls = NullHandling.INTERNAL)
public static class NullableVarChar_ArrayAgg implements DrillAggFunc {
  @Param NullableVarCharHolder input;
  @Workspace ObjectHolder agg;
  @Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter 
out;
  @Inject DrillBuf buffer;

  @Override public void setup() { 
agg = new ObjectHolder();
  }

  @Override public void reset() {
agg = new ObjectHolder();
  }

  @Override public void add() {
org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter 
listWriter;
if (agg.obj == null) {
  agg.obj = out.rootAsList();
}

if ( input.isSet == 0 )
return;

org.apache.drill.exec.expr.holders.VarCharHolder rowHolder = new 
org.apache.drill.exec.expr.holders.VarCharHolder();
byte[] inputBytes = 
org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder(
 input ).getBytes( com.google.common.base.Charsets.UTF_8 );
buffer.reallocIfNeeded(inputBytes.length); 
buffer.setBytes(0, inputBytes);
rowHolder.start = 0; 
rowHolder.end = inputBytes.length; 
rowHolder.buffer = buffer;

listWriter = 
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
listWriter.varChar().write( rowHolder );  
  }

  @Override public void output() {
  ((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) 
agg.obj).endList();
  }
}

// INTEGER NULLABLE //
@FunctionTemplate(
 name = "array_agg",
 scope = FunctionScope.POINT_AGGREGATE,
 nulls = NullHandling.INTERNAL)
public static class NullableInt_ArrayAgg implements DrillAggFunc {
@Param NullableIntHolder input;
@Workspace ObjectHolder agg;
@Output 
org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out;
@Inject DrillBuf buffer;

@Override public void setup() {
  
 agg = new ObjectHolder();
}

@Override public void reset() {
 agg = new ObjectHolder();
}

@Override public void add() {
 org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter 
listWriter;
 if (agg.obj == null) {
   agg.obj = out.rootAsList();
 }
 
if ( input.isSet == 0 )
return;
 
 listWriter = 
(org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj;
 listWriter.integer().writeInt( input.value );
}

@Override public void output() {
  
((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) 
agg.obj).endList();
}
}

// ...
}
{code}
 

> create/aggregate/work with array
> 
>
> Key: DRILL-6963
> URL: https://issues.apache.org/jira/browse/DRILL-6963
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Functions - Drill
>Reporter: benj
>Priority: Major
>
> * Add the possibility to build array (like : SELECT array[a1,a2,a3...]) - 
> ideally work with all types
>  * Add a default array_agg (like : SELECT col1, array_agg(col2), 
> array_agg(DISTINCT col2) FROM ... GROUP BY col1) ;  - ideally work with all 
> types
>  * Add function/facilities/operator to work with array



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6963) create/aggregate/work with array

2019-01-21 Thread benj (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747905#comment-16747905
 ] 

benj commented on DRILL-6963:
-

Sure, for example, PostgreSQL has these kind of functionality
 * Work with array : [https://www.postgresql.org/docs/11/arrays.html]
 * Aggregate in array => array_agg(exp) : 
[https://www.postgresql.org/docs/11/functions-aggregate.html]
 * Some array function and operators : 
[https://www.postgresql.org/docs/9.1/functions-array.html]
 * Some other functions like :
 ** {{regexp_split_to_array}}(text, pattern, flags) : 
[https://www.postgresql.org/docs/current/functions-string.html]

array_agg() is useful, particularly when you group by an item and would like to 
keep some information from another that has variable length / number of line 
concerned in the original table is different for each group.

Example :
{code:java}
With table records
alice;1
alice;2
bob;1
bob;2
bob;3
charles;5

SELECT firstname, array_agg(DISTINCT score ORDER BY score) AS dist_scores
FROM records
GROUP BY firstname;

firstname | dist_score
- | --
alice | {1,2}
bob   | {1,2,3}
charles   | {5}

{code}
 

 

> create/aggregate/work with array
> 
>
> Key: DRILL-6963
> URL: https://issues.apache.org/jira/browse/DRILL-6963
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Functions - Drill
>Reporter: benj
>Priority: Major
>
> * Add the possibility to build array (like : SELECT array[a1,a2,a3...]) - 
> ideally work with all types
>  * Add a default array_agg (like : SELECT col1, array_agg(col2), 
> array_agg(DISTINCT col2) FROM ... GROUP BY col1) ;  - ideally work with all 
> types
>  * Add function/facilities/operator to work with array



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-6963) create/aggregate/work with array

2019-01-18 Thread Kunal Khatua (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746629#comment-16746629
 ] 

Kunal Khatua commented on DRILL-6963:
-

[~benj641] do you have a reference of an existing system that does this? 
An example with a sample data in the description would help other committers 
provide feedback on the feasibility of the feature.

> create/aggregate/work with array
> 
>
> Key: DRILL-6963
> URL: https://issues.apache.org/jira/browse/DRILL-6963
> Project: Apache Drill
>  Issue Type: Wish
>  Components: Functions - Drill
>Reporter: benj
>Priority: Major
>
> * Add the possibility to build array (like : SELECT array[a1,a2,a3...]) - 
> ideally work with all types
>  * Add a default array_agg (like : SELECT col1, array_agg(col2), 
> array_agg(DISTINCT col2) FROM ... GROUP BY col1) ;  - ideally work with all 
> types
>  * Add function/facilities/operator to work with array



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)