[jira] [Commented] (DRILL-6963) create/aggregate/work with array
[ https://issues.apache.org/jira/browse/DRILL-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985067#comment-16985067 ] benj commented on DRILL-6963: - For the second point (arry_agg), in attempt of an eventual official function, here is a simple implementation that can do that (without possibility to _DISTINCT_ or _ORDER BY_) {code:java} package org.apache.drill.contrib.function; import io.netty.buffer.DrillBuf; import org.apache.drill.exec.expr.DrillAggFunc; import org.apache.drill.exec.expr.annotations.FunctionTemplate; import org.apache.drill.exec.expr.annotations.FunctionTemplate.FunctionScope; import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling; import org.apache.drill.exec.expr.annotations.Output; import org.apache.drill.exec.expr.annotations.Param; import org.apache.drill.exec.expr.annotations.Workspace; import org.apache.drill.exec.expr.holders.*; import javax.inject.Inject; // If dataset is too large, need : ALTER SESSION SET `planner.enable_hashagg` = false public class ArrayAgg { // STRING NULLABLE // @FunctionTemplate( name = "array_agg", scope = FunctionScope.POINT_AGGREGATE, nulls = NullHandling.INTERNAL) public static class NullableVarChar_ArrayAgg implements DrillAggFunc { @Param NullableVarCharHolder input; @Workspace ObjectHolder agg; @Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out; @Inject DrillBuf buffer; @Override public void setup() { agg = new ObjectHolder(); } @Override public void reset() { agg = new ObjectHolder(); } @Override public void add() { org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter; if (agg.obj == null) { agg.obj = out.rootAsList(); } if ( input.isSet == 0 ) return; org.apache.drill.exec.expr.holders.VarCharHolder rowHolder = new org.apache.drill.exec.expr.holders.VarCharHolder(); byte[] inputBytes = org.apache.drill.exec.expr.fn.impl.StringFunctionHelpers.getStringFromVarCharHolder( input ).getBytes( com.google.common.base.Charsets.UTF_8 ); buffer.reallocIfNeeded(inputBytes.length); buffer.setBytes(0, inputBytes); rowHolder.start = 0; rowHolder.end = inputBytes.length; rowHolder.buffer = buffer; listWriter = (org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj; listWriter.varChar().write( rowHolder ); } @Override public void output() { ((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj).endList(); } } // INTEGER NULLABLE // @FunctionTemplate( name = "array_agg", scope = FunctionScope.POINT_AGGREGATE, nulls = NullHandling.INTERNAL) public static class NullableInt_ArrayAgg implements DrillAggFunc { @Param NullableIntHolder input; @Workspace ObjectHolder agg; @Output org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter out; @Inject DrillBuf buffer; @Override public void setup() { agg = new ObjectHolder(); } @Override public void reset() { agg = new ObjectHolder(); } @Override public void add() { org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter listWriter; if (agg.obj == null) { agg.obj = out.rootAsList(); } if ( input.isSet == 0 ) return; listWriter = (org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj; listWriter.integer().writeInt( input.value ); } @Override public void output() { ((org.apache.drill.exec.vector.complex.writer.BaseWriter.ListWriter) agg.obj).endList(); } } // ... } {code} > create/aggregate/work with array > > > Key: DRILL-6963 > URL: https://issues.apache.org/jira/browse/DRILL-6963 > Project: Apache Drill > Issue Type: Wish > Components: Functions - Drill >Reporter: benj >Priority: Major > > * Add the possibility to build array (like : SELECT array[a1,a2,a3...]) - > ideally work with all types > * Add a default array_agg (like : SELECT col1, array_agg(col2), > array_agg(DISTINCT col2) FROM ... GROUP BY col1) ; - ideally work with all > types > * Add function/facilities/operator to work with array -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (DRILL-6963) create/aggregate/work with array
[ https://issues.apache.org/jira/browse/DRILL-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747905#comment-16747905 ] benj commented on DRILL-6963: - Sure, for example, PostgreSQL has these kind of functionality * Work with array : [https://www.postgresql.org/docs/11/arrays.html] * Aggregate in array => array_agg(exp) : [https://www.postgresql.org/docs/11/functions-aggregate.html] * Some array function and operators : [https://www.postgresql.org/docs/9.1/functions-array.html] * Some other functions like : ** {{regexp_split_to_array}}(text, pattern, flags) : [https://www.postgresql.org/docs/current/functions-string.html] array_agg() is useful, particularly when you group by an item and would like to keep some information from another that has variable length / number of line concerned in the original table is different for each group. Example : {code:java} With table records alice;1 alice;2 bob;1 bob;2 bob;3 charles;5 SELECT firstname, array_agg(DISTINCT score ORDER BY score) AS dist_scores FROM records GROUP BY firstname; firstname | dist_score - | -- alice | {1,2} bob | {1,2,3} charles | {5} {code} > create/aggregate/work with array > > > Key: DRILL-6963 > URL: https://issues.apache.org/jira/browse/DRILL-6963 > Project: Apache Drill > Issue Type: Wish > Components: Functions - Drill >Reporter: benj >Priority: Major > > * Add the possibility to build array (like : SELECT array[a1,a2,a3...]) - > ideally work with all types > * Add a default array_agg (like : SELECT col1, array_agg(col2), > array_agg(DISTINCT col2) FROM ... GROUP BY col1) ; - ideally work with all > types > * Add function/facilities/operator to work with array -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (DRILL-6963) create/aggregate/work with array
[ https://issues.apache.org/jira/browse/DRILL-6963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746629#comment-16746629 ] Kunal Khatua commented on DRILL-6963: - [~benj641] do you have a reference of an existing system that does this? An example with a sample data in the description would help other committers provide feedback on the feasibility of the feature. > create/aggregate/work with array > > > Key: DRILL-6963 > URL: https://issues.apache.org/jira/browse/DRILL-6963 > Project: Apache Drill > Issue Type: Wish > Components: Functions - Drill >Reporter: benj >Priority: Major > > * Add the possibility to build array (like : SELECT array[a1,a2,a3...]) - > ideally work with all types > * Add a default array_agg (like : SELECT col1, array_agg(col2), > array_agg(DISTINCT col2) FROM ... GROUP BY col1) ; - ideally work with all > types > * Add function/facilities/operator to work with array -- This message was sent by Atlassian JIRA (v7.6.3#76005)