[jira] [Commented] (DRILL-7494) Unable to connect to Drill using JDBC driver when using custom authenticator

2019-12-20 Thread Vova Vysotskyi (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001224#comment-17001224
 ] 

Vova Vysotskyi commented on DRILL-7494:
---

Another issue caused by the Hadoop upgrade is the problem with a Guava version 
mismatch. Moving back {{woodstox-core}} is not enough, the application hangs 
due to {{MethodNotFoundException}} in {{Preconditions}} class. Patching may be 
done in {{org.apache.drill.jdbc.Driver}} class, but it doesn't work as expected 
for the case of some class loaders, for example, it wouldn't work for 
SquirreLSql.

Some time is required to find a way how to fix this limitation.

> Unable to connect to Drill using JDBC driver when using custom authenticator
> 
>
> Key: DRILL-7494
> URL: https://issues.apache.org/jira/browse/DRILL-7494
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.17.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Blocker
> Fix For: 1.17.0
>
>
> Steps to reproduce:
> Set up Drill with a custom authenticator as described here: 
> https://drill.apache.org/docs/creating-custom-authenticators/
> Try to connect using simple java client.
> Connection hangs without any errors. It is caused by the fix for DRILL-6540 
> where was excluded {{com.fasterxml.woodstox:woodstox-core}} but its classes 
> are used in {{org.apache.hadoop.conf.Configuration}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7490) limit is not pushed to JDBC storage plugin

2019-12-20 Thread Arina Ielchiieva (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7490:

Fix Version/s: (was: 1.18.0)
   1.17.0

> limit is not pushed to JDBC storage plugin
> --
>
> Key: DRILL-7490
> URL: https://issues.apache.org/jira/browse/DRILL-7490
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning  Optimization, Storage - JDBC
>Affects Versions: 1.11.0
>Reporter: Vova Vysotskyi
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> The limit is not pushed to JDBC storage plugin. 
> The following query:
> {code:sql}
> select person_id from mysql.`drill_mysql_test`.person limit 10
> {code}
> Returns plan:
> {noformat}
> 00-00Screen
> 00-01  Project(person_id=[$0])
> 00-02SelectionVectorRemover
> 00-03  Limit(fetch=[10])
> 00-04Limit(fetch=[10])
> 00-05  Jdbc(sql=[SELECT `person_id` FROM 
> `drill_mysql_test`.`person` ])
> {noformat}
> This issue happens since incorrect row count is calculated for {{JdbcSort}}.
> Drill enforces to use {{RelNode.estimateRowCount(mq)}} for calculating row 
> count for all inheritants of {{Sort}} rel node (see {{DrillRelMdRowCount}}), 
> but {{JdbcSort}} doesn't override this method and uses parent one which 
> returns row count of input. So planner choses {{DrillLimitRel}} since it has 
> less row count.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001127#comment-17001127
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360518128
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/PriorityQueueCopierTemplate.java
 ##
 @@ -17,94 +17,95 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import io.netty.buffer.DrillBuf;
-
 import java.io.IOException;
 import java.util.List;
 
 import javax.inject.Named;
 
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorAccessibleUtilities;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
+import io.netty.buffer.DrillBuf;
 
 public abstract class PriorityQueueCopierTemplate implements 
PriorityQueueCopier {
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(PriorityQueueCopierTemplate.class);
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7494) Unable to connect to Drill using JDBC driver when using custom authenticator

2019-12-20 Thread Vova Vysotskyi (Jira)
Vova Vysotskyi created DRILL-7494:
-

 Summary: Unable to connect to Drill using JDBC driver when using 
custom authenticator
 Key: DRILL-7494
 URL: https://issues.apache.org/jira/browse/DRILL-7494
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.17.0
Reporter: Vova Vysotskyi
Assignee: Vova Vysotskyi
 Fix For: 1.17.0


Steps to reproduce:
Set up Drill with a custom authenticator as described here: 
https://drill.apache.org/docs/creating-custom-authenticators/

Try to connect using simple java client.
Connection hangs without any errors. It is caused by the fix for DRILL-6540 
where was excluded {{com.fasterxml.woodstox:woodstox-core}} but its classes are 
used in {{org.apache.hadoop.conf.Configuration}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001106#comment-17001106
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360450237
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001116#comment-17001116
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360511648
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -17,48 +17,59 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import com.typesafe.config.ConfigException;
-import io.netty.buffer.DrillBuf;
-
 import java.util.Queue;
 
 import javax.inject.Named;
 
-import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BaseAllocator;
 import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.drill.exec.vector.ValueVector;
 import org.apache.hadoop.util.IndexedSortable;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 import org.apache.drill.shaded.guava.com.google.common.collect.Queues;
 
+import io.netty.buffer.DrillBuf;
+
 public abstract class MSortTemplate implements MSorter, IndexedSortable {
 //  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MSortTemplate.class);
 
   private SelectionVector4 vector4;
   private SelectionVector4 aux;
+  @SuppressWarnings("unused")
+  private long compares;
+
+  /**
+   * Holds offsets into the SV4 of the start of each batch
+   * (sorted run.)
+   */
+
   private Queue runStarts = Queues.newLinkedBlockingQueue();
   private FragmentContext context;
 
   /**
-   * This is only useful for debugging and/or unit testing. Controls the 
maximum size of batches exposed to downstream
+   * Controls the maximum size of batches exposed to downstream
*/
   private int desiredRecordBatchCount;
 
   @Override
-  public void setup(final FragmentContext context, final BufferAllocator 
allocator, final SelectionVector4 vector4, final VectorContainer hyperBatch) 
throws SchemaChangeException{
+  public void setup(final FragmentContext context, final BufferAllocator 
allocator, final SelectionVector4 vector4,
+final VectorContainer hyperBatch, int outputBatchSize, int 
desiredBatchSize) throws SchemaChangeException{
 
 Review comment:
   ```suggestion
VectorContainer hyperBatch, int outputBatchSize, int 
desiredBatchSize) throws SchemaChangeException {
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001107#comment-17001107
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360426272
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -17,250 +17,255 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import java.io.IOException;
-import java.util.Collection;
-import java.util.Iterator;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Set;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
-import org.apache.calcite.rel.RelFieldCollation.Direction;
-import org.apache.drill.common.AutoCloseables;
-import org.apache.drill.common.config.DrillConfig;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP;
+
 import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.common.expression.ErrorCollector;
-import org.apache.drill.common.expression.ErrorCollectorImpl;
-import org.apache.drill.common.expression.LogicalExpression;
-import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.common.logical.data.Order.Ordering;
-import org.apache.drill.exec.ExecConstants;
-import org.apache.drill.exec.compile.sig.GeneratorMapping;
-import org.apache.drill.exec.compile.sig.MappingSet;
-import org.apache.drill.exec.exception.ClassTransformationException;
-import org.apache.drill.exec.exception.OutOfMemoryException;
-import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.expr.ClassGenerator;
-import org.apache.drill.exec.expr.ClassGenerator.HoldingContainer;
-import org.apache.drill.exec.expr.CodeGenerator;
-import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
-import org.apache.drill.exec.expr.TypeHelper;
-import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
-import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.ops.MetricDef;
 import org.apache.drill.exec.physical.config.ExternalSort;
-import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
-import org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder;
-import org.apache.drill.exec.proto.ExecProtos.FragmentHandle;
-import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
+import 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator;
+import org.apache.drill.exec.physical.impl.xsort.SortImpl.SortResults;
 import org.apache.drill.exec.record.AbstractRecordBatch;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
-import org.apache.drill.exec.record.MaterializedField;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.SchemaUtil;
-import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 import org.apache.drill.exec.testing.ControlsInjector;
 import org.apache.drill.exec.testing.ControlsInjectorFactory;
-import org.apache.drill.exec.vector.CopyUtil;
 import org.apache.drill.exec.vector.ValueVector;
 import org.apache.drill.exec.vector.complex.AbstractContainerVector;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
-import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
-import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
-import org.apache.drill.shaded.guava.com.google.common.collect.Iterators;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import com.sun.codemodel.JConditional;
-import com.sun.codemodel.JExpr;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * External sort batch: a sort batch which can spill to disk in
+ * order to operate within a defined memory footprint.
+ * 
+ * Basic Operation
+ * The operator has three key phases:
+ * 
+ * 
+ * The load phase in which batches are read from upstream.
+ * The merge phase in which spilled batches are 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001128#comment-17001128
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r36065
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001141#comment-17001141
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360371678
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001100#comment-17001100
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360365118
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001122#comment-17001122
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360515412
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSorter.java
 ##
 @@ -24,11 +24,15 @@
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
-// TODO:  Doc.:  What's an MSorter?  A sorter for merge join?  something else?
-// (What's the "M" part?  Actually, rename interface to clearer.
+/**
+ * In-memory sorter. Takes a list of batches as input, produces a selection
+ * vector 4, with sorted results, as output.
+ */
+
 
 Review comment:
   ```suggestion
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001120#comment-17001120
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360485101
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001123#comment-17001123
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360464313
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001099#comment-17001099
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360365258
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
 
 Review comment:
   ```suggestion
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001130#comment-17001130
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360526022
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSimpleExternalSort.java
 ##
 @@ -71,38 +61,22 @@ public void mergeSortWithSv2Legacy() throws Exception {
* @throws Exception
*/
 
-  private void mergeSortWithSv2(boolean testLegacy) throws Exception {
-ClusterFixtureBuilder builder = ClusterFixture.builder(dirTestWatcher)
-.configProperty(ExecConstants.EXTERNAL_SORT_DISABLE_MANAGED, false);
+  @Test
 
 Review comment:
   good catch :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001136#comment-17001136
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360517824
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/PriorityQueueCopier.java
 ##
 @@ -23,14 +23,10 @@
 import org.apache.drill.exec.compile.TemplateClassDefinition;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.VectorAccessible;
 
 public interface PriorityQueueCopier extends AutoCloseable {
-  public static final long INITIAL_ALLOCATION = 1000;
-  public static final long MAX_ALLOCATION = 2000;
-
-  public void setup(FragmentContext context, BufferAllocator allocator, 
VectorAccessible hyperBatch,
+  public void setup(BufferAllocator allocator, VectorAccessible hyperBatch,
 
 Review comment:
   please remove redundant ```public```, ```final```, ```abstract``` modifiers 
in the interface and move constant above methods. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001118#comment-17001118
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360443648
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001098#comment-17001098
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360365036
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
 
 Review comment:
   ```suggestion
* Spill mode {@link SpilledRun}: Holds a "memento" to a set
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001138#comment-17001138
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360508558
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -123,14 +145,24 @@ public SelectionVector4 getSV4() {
 return vector4;
   }
 
+  /**
+   * Merge a set of pre-sorted runs to produce a combined
+   * result set. Merging is done in the selection vector, record data does
+   * not move.
+   * 
+   * Runs are merge pairwise in multiple passes, providing performance
+   * of O(n * m * log(n)), where n = number of runs, m = number of records
+   * per run.
+   */
+
 
 Review comment:
   ```suggestion
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001104#comment-17001104
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360368665
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
 
 Review comment:
   I really appreciate that you split the responsibilities of ```BatchGroup``` 
class and created ```InputBatch``` & ```SpilledRun```.  Could you please 
extract them into separate source files? 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
> 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1700#comment-1700
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360429440
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001117#comment-17001117
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360391704
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001135#comment-17001135
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360472617
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001119#comment-17001119
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360506932
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001126#comment-17001126
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360515173
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -123,14 +145,24 @@ public SelectionVector4 getSV4() {
 return vector4;
   }
 
+  /**
+   * Merge a set of pre-sorted runs to produce a combined
+   * result set. Merging is done in the selection vector, record data does
+   * not move.
+   * 
+   * Runs are merge pairwise in multiple passes, providing performance
+   * of O(n * m * log(n)), where n = number of runs, m = number of records
+   * per run.
+   */
+
   @Override
-  public void sort(final VectorContainer container) {
+  public void sort() {
 while (runStarts.size() > 1) {
+  final int totalCount = this.vector4.getTotalCount();
 
-  // check if we're cancelled/failed frequently
+  // check if we're cancelled/failed recently
   if (!context.getExecutorState().shouldContinue()) {
-return;
-  }
+return; }
 
 Review comment:
   ```suggestion
   return; 
 }
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001143#comment-17001143
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360397396
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001101#comment-17001101
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360364870
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
 
 Review comment:
   ```suggestion
* Input mode {@link InputBatch}: Used to buffer in-memory batches
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001109#comment-17001109
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360415588
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -17,250 +17,255 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import java.io.IOException;
-import java.util.Collection;
-import java.util.Iterator;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Set;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
-import org.apache.calcite.rel.RelFieldCollation.Direction;
-import org.apache.drill.common.AutoCloseables;
-import org.apache.drill.common.config.DrillConfig;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP;
+
 import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.common.expression.ErrorCollector;
-import org.apache.drill.common.expression.ErrorCollectorImpl;
-import org.apache.drill.common.expression.LogicalExpression;
-import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.common.logical.data.Order.Ordering;
-import org.apache.drill.exec.ExecConstants;
-import org.apache.drill.exec.compile.sig.GeneratorMapping;
-import org.apache.drill.exec.compile.sig.MappingSet;
-import org.apache.drill.exec.exception.ClassTransformationException;
-import org.apache.drill.exec.exception.OutOfMemoryException;
-import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.expr.ClassGenerator;
-import org.apache.drill.exec.expr.ClassGenerator.HoldingContainer;
-import org.apache.drill.exec.expr.CodeGenerator;
-import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
-import org.apache.drill.exec.expr.TypeHelper;
-import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
-import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.ops.MetricDef;
 import org.apache.drill.exec.physical.config.ExternalSort;
-import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
-import org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder;
-import org.apache.drill.exec.proto.ExecProtos.FragmentHandle;
-import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
+import 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator;
+import org.apache.drill.exec.physical.impl.xsort.SortImpl.SortResults;
 import org.apache.drill.exec.record.AbstractRecordBatch;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
-import org.apache.drill.exec.record.MaterializedField;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.SchemaUtil;
-import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 import org.apache.drill.exec.testing.ControlsInjector;
 import org.apache.drill.exec.testing.ControlsInjectorFactory;
-import org.apache.drill.exec.vector.CopyUtil;
 import org.apache.drill.exec.vector.ValueVector;
 import org.apache.drill.exec.vector.complex.AbstractContainerVector;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
-import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
-import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
-import org.apache.drill.shaded.guava.com.google.common.collect.Iterators;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import com.sun.codemodel.JConditional;
-import com.sun.codemodel.JExpr;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * External sort batch: a sort batch which can spill to disk in
+ * order to operate within a defined memory footprint.
+ * 
+ * Basic Operation
+ * The operator has three key phases:
+ * 
+ * 
+ * The load phase in which batches are read from upstream.
+ * The merge phase in which spilled batches are 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001102#comment-17001102
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360371811
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001125#comment-17001125
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360525635
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSortEmitOutcome.java
 ##
 @@ -15,9 +15,15 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
-package org.apache.drill.exec.physical.impl.xsort.managed;
+package org.apache.drill.exec.physical.impl.xsort;
+
+import static junit.framework.TestCase.assertTrue;
 
 Review comment:
   please avoid imports reordering when not required, this may cause merge 
conflicts for other contributors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001140#comment-17001140
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360512752
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -89,15 +98,28 @@ public void setup(final FragmentContext context, final 
BufferAllocator allocator
* ExternalSortBatch to make decisions about whether to spill or not.
*
* @param recordCount
-   * @return The amount of memory MSorter needs for a given record count.
+   * @return
 
 Review comment:
   Please convert the
   
   >// We need 4 bytes (SV4) for each record.
   // The memory allocator will round this to the next
   // power of 2.
   
   into this ```@return``` description.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001133#comment-17001133
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360501525
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001144#comment-17001144
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360521056
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/PriorityQueueCopierTemplate.java
 ##
 @@ -17,94 +17,95 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import io.netty.buffer.DrillBuf;
-
 import java.io.IOException;
 import java.util.List;
 
 import javax.inject.Named;
 
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorAccessibleUtilities;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
+import io.netty.buffer.DrillBuf;
 
 public abstract class PriorityQueueCopierTemplate implements 
PriorityQueueCopier {
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(PriorityQueueCopierTemplate.class);
 
   private SelectionVector4 vector4;
   private List batchGroups;
   private VectorAccessible hyperBatch;
   private VectorAccessible outgoing;
   private int size;
-  private int queueSize;
+  private int queueSize = 0;
 
   @Override
-  public void setup(FragmentContext context, BufferAllocator allocator,
-  VectorAccessible hyperBatch, List batchGroups,
-  VectorAccessible outgoing) throws SchemaChangeException {
+  public void setup(BufferAllocator allocator, VectorAccessible hyperBatch, 
List batchGroups,
+VectorAccessible outgoing) throws SchemaChangeException {
 this.hyperBatch = hyperBatch;
 this.batchGroups = batchGroups;
 this.outgoing = outgoing;
 this.size = batchGroups.size();
 
 final DrillBuf drillBuf = allocator.buffer(4 * size);
 vector4 = new SelectionVector4(drillBuf, size, Character.MAX_VALUE);
-doSetup(context, hyperBatch, outgoing);
+doSetup(hyperBatch, outgoing);
 
 queueSize = 0;
 for (int i = 0; i < size; i++) {
-  vector4.set(i, i, batchGroups.get(i).getNextIndex());
-  siftUp();
-  queueSize++;
+  int index = batchGroups.get(i).getNextIndex();
+  vector4.set(i, i, index);
+  if (index > -1) {
+siftUp();
+queueSize++;
+  }
 }
   }
 
   @Override
   public int next(int targetRecordCount) {
-allocateVectors(targetRecordCount);
 for (int outgoingIndex = 0; outgoingIndex < targetRecordCount; 
outgoingIndex++) {
   if (queueSize == 0) {
 return 0;
   }
   int compoundIndex = vector4.get(0);
   int batch = compoundIndex >>> 16;
   assert batch < batchGroups.size() : String.format("batch: %d 
batchGroups: %d", batch, batchGroups.size());
-  doCopy(compoundIndex, outgoingIndex);
+  try {
 
 Review comment:
   It's better to put whole loop inside try-catch and remove this and 
```siftDown()``` one.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001115#comment-17001115
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360404337
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001108#comment-17001108
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360423445
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -17,250 +17,255 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import java.io.IOException;
-import java.util.Collection;
-import java.util.Iterator;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Set;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
-import org.apache.calcite.rel.RelFieldCollation.Direction;
-import org.apache.drill.common.AutoCloseables;
-import org.apache.drill.common.config.DrillConfig;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP;
+
 import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.common.expression.ErrorCollector;
-import org.apache.drill.common.expression.ErrorCollectorImpl;
-import org.apache.drill.common.expression.LogicalExpression;
-import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.common.logical.data.Order.Ordering;
-import org.apache.drill.exec.ExecConstants;
-import org.apache.drill.exec.compile.sig.GeneratorMapping;
-import org.apache.drill.exec.compile.sig.MappingSet;
-import org.apache.drill.exec.exception.ClassTransformationException;
-import org.apache.drill.exec.exception.OutOfMemoryException;
-import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.expr.ClassGenerator;
-import org.apache.drill.exec.expr.ClassGenerator.HoldingContainer;
-import org.apache.drill.exec.expr.CodeGenerator;
-import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
-import org.apache.drill.exec.expr.TypeHelper;
-import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
-import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.ops.MetricDef;
 import org.apache.drill.exec.physical.config.ExternalSort;
-import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
-import org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder;
-import org.apache.drill.exec.proto.ExecProtos.FragmentHandle;
-import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
+import 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator;
+import org.apache.drill.exec.physical.impl.xsort.SortImpl.SortResults;
 import org.apache.drill.exec.record.AbstractRecordBatch;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
-import org.apache.drill.exec.record.MaterializedField;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.SchemaUtil;
-import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 import org.apache.drill.exec.testing.ControlsInjector;
 import org.apache.drill.exec.testing.ControlsInjectorFactory;
-import org.apache.drill.exec.vector.CopyUtil;
 import org.apache.drill.exec.vector.ValueVector;
 import org.apache.drill.exec.vector.complex.AbstractContainerVector;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
-import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
-import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
-import org.apache.drill.shaded.guava.com.google.common.collect.Iterators;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import com.sun.codemodel.JConditional;
-import com.sun.codemodel.JExpr;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * External sort batch: a sort batch which can spill to disk in
+ * order to operate within a defined memory footprint.
+ * 
+ * Basic Operation
+ * The operator has three key phases:
+ * 
+ * 
+ * The load phase in which batches are read from upstream.
+ * The merge phase in which spilled batches are 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001146#comment-17001146
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360398804
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001134#comment-17001134
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360511541
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -17,48 +17,59 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import com.typesafe.config.ConfigException;
-import io.netty.buffer.DrillBuf;
-
 import java.util.Queue;
 
 import javax.inject.Named;
 
-import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BaseAllocator;
 import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.drill.exec.vector.ValueVector;
 import org.apache.hadoop.util.IndexedSortable;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 import org.apache.drill.shaded.guava.com.google.common.collect.Queues;
 
+import io.netty.buffer.DrillBuf;
+
 public abstract class MSortTemplate implements MSorter, IndexedSortable {
 //  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MSortTemplate.class);
 
   private SelectionVector4 vector4;
   private SelectionVector4 aux;
+  @SuppressWarnings("unused")
+  private long compares;
+
+  /**
+   * Holds offsets into the SV4 of the start of each batch
+   * (sorted run.)
+   */
+
   private Queue runStarts = Queues.newLinkedBlockingQueue();
   private FragmentContext context;
 
   /**
-   * This is only useful for debugging and/or unit testing. Controls the 
maximum size of batches exposed to downstream
+   * Controls the maximum size of batches exposed to downstream
*/
   private int desiredRecordBatchCount;
 
   @Override
-  public void setup(final FragmentContext context, final BufferAllocator 
allocator, final SelectionVector4 vector4, final VectorContainer hyperBatch) 
throws SchemaChangeException{
+  public void setup(final FragmentContext context, final BufferAllocator 
allocator, final SelectionVector4 vector4,
 
 Review comment:
   ```suggestion
 public void setup(FragmentContext context, BufferAllocator allocator, 
SelectionVector4 vector4,
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001112#comment-17001112
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360391462
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001129#comment-17001129
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360525569
 
 

 ##
 File path: 
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/xsort/TestSortImpl.java
 ##
 @@ -33,32 +33,31 @@
 import org.apache.drill.exec.ops.OperatorContext;
 import org.apache.drill.exec.physical.config.Sort;
 import org.apache.drill.exec.physical.impl.spill.SpillSet;
-import org.apache.drill.exec.physical.impl.xsort.managed.SortImpl.SortResults;
+import org.apache.drill.exec.physical.impl.xsort.SortImpl.SortResults;
 
 Review comment:
   please avoid imports reordering when not required, this may cause merge 
conflicts for other contributors.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001131#comment-17001131
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360508494
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -89,15 +98,28 @@ public void setup(final FragmentContext context, final 
BufferAllocator allocator
* ExternalSortBatch to make decisions about whether to spill or not.
*
* @param recordCount
-   * @return The amount of memory MSorter needs for a given record count.
+   * @return
*/
   public static long memoryNeeded(final int recordCount) {
-// We need 4 bytes (SV4) for each record, power of 2 rounded.
+// We need 4 bytes (SV4) for each record.
+// The memory allocator will round this to the next
+// power of 2.
 
 return BaseAllocator.nextPowerOfTwo(recordCount * 4);
   }
 
-  private int merge(final int leftStart, final int rightStart, final int 
rightEnd, final int outStart) {
+  /**
+   * Given two regions within the selection vector 4 (a left and a right), 
merge
+   * the two regions to produce a combined output region in the auxiliary
+   * selection vector.
+   *
+   * @param leftStart
+   * @param rightStart
+   * @param rightEnd
+   * @param outStart
+   * @return
 
 Review comment:
   please finish the javadoc
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001145#comment-17001145
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360427244
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
 
 Review comment:
   ```suggestion
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001103#comment-17001103
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360397128
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -221,4 +363,19 @@ public SelectionVector4 getSelectionVector4() {
 throw new UnsupportedOperationException();
   }
 
+  public static void closeAll(Collection groups) {
 
 Review comment:
   Please remove the method and use one from ```AutoCloseables```. Note the 
util methods in `AutoCloseables` might be extended to not throw checked 
exception but accept function or consumer which will convert to runtime.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001113#comment-17001113
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360394111
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001142#comment-17001142
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360439104
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001121#comment-17001121
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360494754
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001139#comment-17001139
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360516350
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSorter.java
 ##
 @@ -24,11 +24,15 @@
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
-// TODO:  Doc.:  What's an MSorter?  A sorter for merge join?  something else?
-// (What's the "M" part?  Actually, rename interface to clearer.
+/**
+ * In-memory sorter. Takes a list of batches as input, produces a selection
+ * vector 4, with sorted results, as output.
+ */
+
 public interface MSorter {
-  public void setup(FragmentContext context, BufferAllocator allocator, 
SelectionVector4 vector4, VectorContainer hyperBatch) throws 
SchemaChangeException;
-  public void sort(VectorContainer container);
+  public void setup(FragmentContext context, BufferAllocator allocator, 
SelectionVector4 vector4,
 
 Review comment:
   Please remove redundant ```public``` and ```static``` modifiers and move 
constant ```TEMPLATE_DEFINITION``` to top, above methods. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001105#comment-17001105
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360365959
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/BatchGroup.java
 ##
 @@ -18,140 +18,300 @@
 package org.apache.drill.exec.physical.impl.xsort;
 
 import java.io.IOException;
+import java.io.InputStream;
+import java.util.Collection;
 import java.util.Iterator;
 import java.util.concurrent.TimeUnit;
 
+import org.apache.drill.common.exceptions.UserException;
 import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.exec.cache.VectorAccessibleSerializable;
+import org.apache.drill.exec.cache.VectorSerializer;
+import org.apache.drill.exec.cache.VectorSerializer.Writer;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.OperatorContext;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.SchemaUtil;
 import org.apache.drill.exec.record.TransferPair;
 import org.apache.drill.exec.record.TypedFieldId;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
-import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.hadoop.fs.FSDataInputStream;
-import org.apache.hadoop.fs.FSDataOutputStream;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
 
-public class BatchGroup implements VectorAccessible, AutoCloseable {
+/**
+ * Represents a group of batches spilled to disk.
+ * 
+ * The batches are defined by a schema which can change over time. When the 
schema changes,
+ * all existing and new batches are coerced into the new schema. Provides a
+ * uniform way to iterate over records for one or more batches whether
+ * the batches are in memory or on disk.
+ * 
+ * The BatchGroup operates in two modes as given by the two
+ * subclasses:
+ * 
+ * Input mode (@link InputBatchGroup): Used to buffer in-memory batches
+ * prior to spilling.
+ * Spill mode (@link SpilledBatchGroup): Holds a "memento" to a set
+ * of batches written to disk. Acts as both a reader and writer for
+ * those batches.
+ */
+
+public abstract class BatchGroup implements VectorAccessible, AutoCloseable {
   static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(BatchGroup.class);
 
-  private VectorContainer currentContainer;
-  private SelectionVector2 sv2;
-  private int pointer = 0;
-  private FSDataInputStream inputStream;
-  private FSDataOutputStream outputStream;
-  private Path path;
-  private FileSystem fs;
-  private BufferAllocator allocator;
-  private int spilledBatches = 0;
-  private OperatorContext context;
-  private BatchSchema schema;
-
-  public BatchGroup(VectorContainer container, SelectionVector2 sv2, 
OperatorContext context) {
-this.sv2 = sv2;
-this.currentContainer = container;
-this.context = context;
-  }
+  /**
+   * The input batch group gathers batches buffered in memory before
+   * spilling. The structure of the data is:
+   * 
+   * Contains a single batch received from the upstream (input)
+   * operator.
+   * Associated selection vector that provides a sorted
+   * indirection to the values in the batch.
+   * 
+   */
 
-  public BatchGroup(VectorContainer container, FileSystem fs, String path, 
OperatorContext context) {
-currentContainer = container;
-this.fs = fs;
-this.path = new Path(path);
-this.allocator = context.getAllocator();
-this.context = context;
-  }
+  public static class InputBatch extends BatchGroup {
+private final SelectionVector2 sv2;
+private final long dataSize;
+
+public InputBatch(VectorContainer container, SelectionVector2 sv2, 
BufferAllocator allocator, long dataSize) {
+  super(container, allocator);
+  this.sv2 = sv2;
+  this.dataSize = dataSize;
+}
 
-  public SelectionVector2 getSv2() {
-return sv2;
+public SelectionVector2 getSv2() { return sv2; }
+
+public long getDataSize() { return dataSize; }
+
+@Override
+public int getRecordCount() {
+  if (sv2 != null) {
+return sv2.getCount();
+  } else {
+return super.getRecordCount();
+  }
+}
+
+@Override
+public int getNextIndex() {
+  int val = super.getNextIndex();
+  if (val == -1) {
+return val;
+  }
+  return 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001114#comment-17001114
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360443790
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001132#comment-17001132
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360425824
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -17,250 +17,255 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import java.io.IOException;
-import java.util.Collection;
-import java.util.Iterator;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.Set;
-import java.util.concurrent.TimeUnit;
-
-import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
-import org.apache.calcite.rel.RelFieldCollation.Direction;
-import org.apache.drill.common.AutoCloseables;
-import org.apache.drill.common.config.DrillConfig;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP;
+
 import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.common.expression.ErrorCollector;
-import org.apache.drill.common.expression.ErrorCollectorImpl;
-import org.apache.drill.common.expression.LogicalExpression;
-import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.common.logical.data.Order.Ordering;
-import org.apache.drill.exec.ExecConstants;
-import org.apache.drill.exec.compile.sig.GeneratorMapping;
-import org.apache.drill.exec.compile.sig.MappingSet;
-import org.apache.drill.exec.exception.ClassTransformationException;
-import org.apache.drill.exec.exception.OutOfMemoryException;
-import org.apache.drill.exec.exception.SchemaChangeException;
-import org.apache.drill.exec.expr.ClassGenerator;
-import org.apache.drill.exec.expr.ClassGenerator.HoldingContainer;
-import org.apache.drill.exec.expr.CodeGenerator;
-import org.apache.drill.exec.expr.ExpressionTreeMaterializer;
-import org.apache.drill.exec.expr.TypeHelper;
-import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
-import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.ops.MetricDef;
 import org.apache.drill.exec.physical.config.ExternalSort;
-import org.apache.drill.exec.physical.impl.sort.RecordBatchData;
-import org.apache.drill.exec.physical.impl.sort.SortRecordBatchBuilder;
-import org.apache.drill.exec.proto.ExecProtos.FragmentHandle;
-import org.apache.drill.exec.proto.helper.QueryIdHelper;
+import org.apache.drill.exec.physical.impl.spill.SpillSet;
+import 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator;
+import org.apache.drill.exec.physical.impl.xsort.SortImpl.SortResults;
 import org.apache.drill.exec.record.AbstractRecordBatch;
 import org.apache.drill.exec.record.BatchSchema;
 import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
-import org.apache.drill.exec.record.MaterializedField;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.SchemaUtil;
-import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.VectorWrapper;
 import org.apache.drill.exec.record.WritableBatch;
 import org.apache.drill.exec.record.selection.SelectionVector2;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 import org.apache.drill.exec.testing.ControlsInjector;
 import org.apache.drill.exec.testing.ControlsInjectorFactory;
-import org.apache.drill.exec.vector.CopyUtil;
 import org.apache.drill.exec.vector.ValueVector;
 import org.apache.drill.exec.vector.complex.AbstractContainerVector;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-
-import org.apache.drill.shaded.guava.com.google.common.base.Joiner;
-import org.apache.drill.shaded.guava.com.google.common.base.Stopwatch;
-import org.apache.drill.shaded.guava.com.google.common.collect.Iterators;
-import org.apache.drill.shaded.guava.com.google.common.collect.Lists;
-import com.sun.codemodel.JConditional;
-import com.sun.codemodel.JExpr;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * External sort batch: a sort batch which can spill to disk in
+ * order to operate within a defined memory footprint.
+ * 
+ * Basic Operation
+ * The operator has three key phases:
+ * 
+ * 
+ * The load phase in which batches are read from upstream.
+ * The merge phase in which spilled batches are 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001147#comment-17001147
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360485448
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001110#comment-17001110
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360475113
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ##
 @@ -286,540 +291,419 @@ public void buildSchema() throws SchemaChangeException 
{
 state = BatchState.DONE;
 break;
   default:
-break;
+throw new IllegalStateException("Unexpected iter outcome: " + outcome);
 }
   }
 
+  /**
+   * Process each request for a batch. The first request retrieves
+   * all the incoming batches and sorts them, optionally spilling to
+   * disk as needed. Subsequent calls retrieve the sorted results in
+   * fixed-size batches.
+   */
+
   @Override
   public IterOutcome innerNext() {
-if (schema != null) {
-  if (spillCount == 0) {
-return (getSelectionVector4().next()) ? IterOutcome.OK : 
IterOutcome.NONE;
-  } else {
-Stopwatch w = Stopwatch.createStarted();
-int count = copier.next(targetRecordCount);
-if (count > 0) {
-  long t = w.elapsed(TimeUnit.MICROSECONDS);
-  logger.debug("Took {} us to merge {} records", t, count);
-  container.setRecordCount(count);
-  return IterOutcome.OK;
-} else {
-  logger.debug("copier returned 0 records");
-  return IterOutcome.NONE;
-}
+switch (sortState) {
+case DONE:
+  return NONE;
+case START:
+  return load();
+case LOAD:
+  if (!this.retainInMemoryBatchesOnNone) {
+resetSortState();
   }
+  return (sortState == SortState.DONE) ? NONE : load();
+case DELIVER:
+  return nextOutputBatch();
+default:
+  throw new IllegalStateException("Unexpected sort state: " + sortState);
 }
+  }
 
-int totalCount = 0;
-int totalBatches = 0; // total number of batches received so far
+  private IterOutcome nextOutputBatch() {
+// Call next on outputSV4 for it's state to progress in parallel to 
resultsIterator state
+outputSV4.next();
 
-try{
-  container.clear();
-  outer: while (true) {
-IterOutcome upstream;
-if (first) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-} else {
-  upstream = next(incoming);
-}
-if (upstream == IterOutcome.OK && sorter == null) {
-  upstream = IterOutcome.OK_NEW_SCHEMA;
-}
-switch (upstream) {
-case NONE:
-  if (first) {
-return upstream;
-  }
-  break outer;
-case NOT_YET:
-  throw new UnsupportedOperationException();
-case STOP:
-  return upstream;
-case OK_NEW_SCHEMA:
-case OK:
-  VectorContainer convertedBatch;
-  // only change in the case that the schema truly changes.  
Artificial schema changes are ignored.
-  if (upstream == IterOutcome.OK_NEW_SCHEMA && 
!incoming.getSchema().equals(schema)) {
-if (schema != null) {
-  if (unionTypeEnabled) {
-this.schema = SchemaUtil.mergeSchemas(schema, 
incoming.getSchema());
-  } else {
-throw new SchemaChangeException("Schema changes not supported 
in External Sort. Please enable Union type");
-  }
-} else {
-  schema = incoming.getSchema();
-}
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-for (BatchGroup b : batchGroups) {
-  b.setSchema(schema);
-}
-for (BatchGroup b : spilledBatchGroups) {
-  b.setSchema(schema);
-}
-this.sorter = createNewSorter(context, convertedBatch);
-  } else {
-convertedBatch = SchemaUtil.coerceContainer(incoming, schema, 
oContext);
-  }
-  if (first) {
-first = false;
-  }
-  if (convertedBatch.getRecordCount() == 0) {
-for (VectorWrapper w : convertedBatch) {
-  w.clear();
-}
-break;
-  }
-  SelectionVector2 sv2;
-  if (incoming.getSchema().getSelectionVectorMode() == 
BatchSchema.SelectionVectorMode.TWO_BYTE) {
-sv2 = incoming.getSelectionVector2().clone();
-  } else {
-try {
-  sv2 = newSV2();
-} catch(InterruptedException e) {
-  return IterOutcome.STOP;
-} catch (OutOfMemoryException e) {
-  throw new OutOfMemoryException(e);
-}
-  }
+// But if results iterator next returns true that means it has 

[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001137#comment-17001137
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360507467
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/MSortTemplate.java
 ##
 @@ -17,48 +17,59 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import com.typesafe.config.ConfigException;
-import io.netty.buffer.DrillBuf;
-
 import java.util.Queue;
 
 import javax.inject.Named;
 
-import org.apache.drill.exec.ExecConstants;
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BaseAllocator;
 import org.apache.drill.exec.memory.BufferAllocator;
 import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.RecordBatch;
 import org.apache.drill.exec.record.VectorContainer;
 import org.apache.drill.exec.record.selection.SelectionVector4;
-import org.apache.drill.exec.vector.ValueVector;
 import org.apache.hadoop.util.IndexedSortable;
 
 import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
 import org.apache.drill.shaded.guava.com.google.common.collect.Queues;
 
+import io.netty.buffer.DrillBuf;
+
 public abstract class MSortTemplate implements MSorter, IndexedSortable {
 //  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MSortTemplate.class);
 
   private SelectionVector4 vector4;
   private SelectionVector4 aux;
+  @SuppressWarnings("unused")
+  private long compares;
+
+  /**
+   * Holds offsets into the SV4 of the start of each batch
+   * (sorted run.)
+   */
+
 
 Review comment:
   ```suggestion
   
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-6832) Remove old "unmanaged" sort implementation

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17001124#comment-17001124
 ] 

ASF GitHub Bot commented on DRILL-6832:
---

ihuzenko commented on pull request #1929: DRILL-6832: Remove the old 
"unmanaged" external sort
URL: https://github.com/apache/drill/pull/1929#discussion_r360518409
 
 

 ##
 File path: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/PriorityQueueCopierTemplate.java
 ##
 @@ -17,94 +17,95 @@
  */
 package org.apache.drill.exec.physical.impl.xsort;
 
-import io.netty.buffer.DrillBuf;
-
 import java.io.IOException;
 import java.util.List;
 
 import javax.inject.Named;
 
 import org.apache.drill.exec.exception.SchemaChangeException;
 import org.apache.drill.exec.memory.BufferAllocator;
-import org.apache.drill.exec.ops.FragmentContext;
 import org.apache.drill.exec.record.VectorAccessible;
 import org.apache.drill.exec.record.VectorAccessibleUtilities;
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
+import io.netty.buffer.DrillBuf;
 
 public abstract class PriorityQueueCopierTemplate implements 
PriorityQueueCopier {
+//  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(PriorityQueueCopierTemplate.class);
 
   private SelectionVector4 vector4;
   private List batchGroups;
   private VectorAccessible hyperBatch;
   private VectorAccessible outgoing;
   private int size;
-  private int queueSize;
+  private int queueSize = 0;
 
 Review comment:
   ```suggestion
 private int queueSize;
   ```
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Remove old "unmanaged" sort implementation
> --
>
> Key: DRILL-6832
> URL: https://issues.apache.org/jira/browse/DRILL-6832
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Minor
>
> Several releases back Drill introduced a new "managed" external sort that 
> enhanced the sort operator's memory management. To be safe, at the time, the 
> new version was controlled by an option, with the ability to revert to the 
> old version.
> The new version has proven to be stable. The time has come to remove the old 
> version.
> * Remove the implementation in {{physical.impl.xsort}}.
> * Move the implementation from {{physical.impl.xsort.managed}} to the parent 
> package.
> * Remove the conditional code in the batch creator.
> * Remove the option that allowed disabling the new version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (DRILL-7483) Add support for 12 and 13 java versions

2019-12-20 Thread Vova Vysotskyi (Jira)


 [ 
https://issues.apache.org/jira/browse/DRILL-7483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-7483:
--
Issue Type: Task  (was: Improvement)

> Add support for 12 and 13 java versions
> ---
>
> Key: DRILL-7483
> URL: https://issues.apache.org/jira/browse/DRILL-7483
> Project: Apache Drill
>  Issue Type: Task
>  Components: Tools, Build  Test
>Affects Versions: 1.16.0
>Reporter: Vitalii Diravka
>Assignee: Vova Vysotskyi
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> The latest versions are Java 13, released in October 2019
> {code:java}
> [INFO] --- maven-enforcer-plugin:3.0.0-M2:enforce 
> (validate_java_and_maven_version) @ drill-root ---
> [WARNING] Rule 1: org.apache.maven.plugins.enforcer.RequireJavaVersion failed 
> with message:
> Detected JDK Version: 13.0.1 is not in the allowed range [1.8,12)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000933#comment-17000933
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r360362371
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
+```
+
+Body example:
+```
+Hi Drillers,
+ 
+It's been several months since the last release and it is time to do 
the next one. I am volunteering to be the release manager.
+ 
+If there are any issues on which work is in progress, that you feel we 
*must* include in the release, please post in reply to this thread. Based on 
your input we'll define release cut off date.
+ 
+Kind regards
+ 
+Thanks
+```
+2. Gather list of Jiras that need to be included in the release, discuss 
possible blockers.
+3. Identify final cut-off date.
+4. Before starting the release send a reminder to dev list that merge in 
Apache Drill master is forbidden during
+ release.
+
+Example:
+```
+Note for the committers:
+until the release is not over and Drill version is not changed to 
1.17.0-SNAPSHOT, please do not push any
+changes into Drill master.
+```
+ 
+2. ## Setup environment:
+1. ### GPG key:
+You will need a GPG key set up. This key is needed for signing the 
release.
+1. Read [Apache docs for release 
signing](http://www.apache.org/dev/release-signing.html).
+2. Install the `gpg2` package on Ubuntu, or `gnupg` on MacOS.
+3. Generate GPG key using the **Apache email address** if it wasn't 
done before:
+1. Run `gpg --gen-key` and follow the instructions.
+2. Remember your passphrase.
+3. Save your key pair:
+```
+   gpg --output mygpgkey_pub.gpg --armor --export MY_KEY_ID
+   gpg --output mygpgkey_sec.gpg --armor --export-secret-key 
MY_KEY_ID
+   ```
+4. Or import existing GPG key pair if required:
+```
+gpg --import mygpgkey_pub.gpg
+gpg --allow-secret-key-import --import mygpgkey_sec.gpg
+```
+5. Have another committer signed your key (add to the trust chain).
+Apache advises to do it at 
+[key signing 
parties](https://www.apache.org/dev/release-signing.html#key-signing-party).
+6. Make sure default key is the key generated using the Apache email.
+7. Publish your public key to a public server (e.g. 
http://pgp.surfnet.nl or http://pgp.mit.edu)
+using Web-UIs or CLI (for example using the following command:
+`gpg --keyserver hkp://pool.sks-keyservers.net --trust-model 
always --send-keys`)
+8. Add your key signature to the `KEYS` file in the Drill sources repo.
+Instruction on how to create the key signature is embedded at the 
beginning of the KEYS file.
+9. Add your key signature to the `KEYS` file in 
`https://dist.apache.org` (PMC permissions required for this):
+- `svn co https://dist.apache.org/repos/dist/release/drill 
~/src/release/drill-dist`
+- Update `KEYS` file with required changes
+- Commit changes: `svn ci -m "Add FirstName LastName's key to KEYS 
file"`
+2. ### Setting up LDAP credentials
+1. Go to http://id.apache.org and log in using your Apache login and 
password.
+2. Scroll to the bottom to the 'SSH Key' entry and add the SSH key 
from your `~/.ssh/id_rsa.pub` file.
+If you don't have the SSH key, you should generate it by running 
`ssh-keygen`.
+3. Note that you can add more than 1 SSH key corresponding to multiple 
machines.
+4. Enter your Apache password and submit changes.
+5. Verify that you can do an sftp to the apache server by running the 
following: `sftp @home.apache.org`
+
+3. ### SVN
 
 Review comment:
   Moved this section above `GPG key` section. Added instruction on how to 
install the `svn` client. Regarding writable access to `dist.apache.org`, there 
are no additional things to do for PMC.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add README.md with instructions for release

[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000935#comment-17000935
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r360368151
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
+```
+
+Body example:
+```
+Hi Drillers,
+ 
+It's been several months since the last release and it is time to do 
the next one. I am volunteering to be the release manager.
+ 
+If there are any issues on which work is in progress, that you feel we 
*must* include in the release, please post in reply to this thread. Based on 
your input we'll define release cut off date.
+ 
+Kind regards
+ 
+Thanks
+```
+2. Gather list of Jiras that need to be included in the release, discuss 
possible blockers.
+3. Identify final cut-off date.
+4. Before starting the release send a reminder to dev list that merge in 
Apache Drill master is forbidden during
+ release.
+
+Example:
+```
+Note for the committers:
+until the release is not over and Drill version is not changed to 
1.17.0-SNAPSHOT, please do not push any
+changes into Drill master.
+```
+ 
+2. ## Setup environment:
+1. ### GPG key:
+You will need a GPG key set up. This key is needed for signing the 
release.
+1. Read [Apache docs for release 
signing](http://www.apache.org/dev/release-signing.html).
+2. Install the `gpg2` package on Ubuntu, or `gnupg` on MacOS.
+3. Generate GPG key using the **Apache email address** if it wasn't 
done before:
+1. Run `gpg --gen-key` and follow the instructions.
+2. Remember your passphrase.
+3. Save your key pair:
+```
+   gpg --output mygpgkey_pub.gpg --armor --export MY_KEY_ID
+   gpg --output mygpgkey_sec.gpg --armor --export-secret-key 
MY_KEY_ID
+   ```
+4. Or import existing GPG key pair if required:
+```
+gpg --import mygpgkey_pub.gpg
+gpg --allow-secret-key-import --import mygpgkey_sec.gpg
+```
+5. Have another committer signed your key (add to the trust chain).
+Apache advises to do it at 
+[key signing 
parties](https://www.apache.org/dev/release-signing.html#key-signing-party).
+6. Make sure default key is the key generated using the Apache email.
+7. Publish your public key to a public server (e.g. 
http://pgp.surfnet.nl or http://pgp.mit.edu)
+using Web-UIs or CLI (for example using the following command:
+`gpg --keyserver hkp://pool.sks-keyservers.net --trust-model 
always --send-keys`)
+8. Add your key signature to the `KEYS` file in the Drill sources repo.
+Instruction on how to create the key signature is embedded at the 
beginning of the KEYS file.
+9. Add your key signature to the `KEYS` file in 
`https://dist.apache.org` (PMC permissions required for this):
+- `svn co https://dist.apache.org/repos/dist/release/drill 
~/src/release/drill-dist`
+- Update `KEYS` file with required changes
+- Commit changes: `svn ci -m "Add FirstName LastName's key to KEYS 
file"`
+2. ### Setting up LDAP credentials
+1. Go to http://id.apache.org and log in using your Apache login and 
password.
+2. Scroll to the bottom to the 'SSH Key' entry and add the SSH key 
from your `~/.ssh/id_rsa.pub` file.
+If you don't have the SSH key, you should generate it by running 
`ssh-keygen`.
+3. Note that you can add more than 1 SSH key corresponding to multiple 
machines.
+4. Enter your Apache password and submit changes.
+5. Verify that you can do an sftp to the apache server by running the 
following: `sftp @home.apache.org`
+
+3. ### SVN
+You also need access to Apache SVN. (You need to be a PMC member for 
this).
+4. ### Setup Maven
+1. Apache's Maven repository access is documented here:
+http://www.apache.org/dev/publishing-maven-artifacts.html
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+2. For deploying maven artifacts, you must add entries to your 
`~/.m2/settings.xml` file as described here:
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+3. In order to encrypt the LDAP password, see here:

[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000934#comment-17000934
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r360366168
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
+```
+
+Body example:
+```
+Hi Drillers,
+ 
+It's been several months since the last release and it is time to do 
the next one. I am volunteering to be the release manager.
+ 
+If there are any issues on which work is in progress, that you feel we 
*must* include in the release, please post in reply to this thread. Based on 
your input we'll define release cut off date.
+ 
+Kind regards
+ 
+Thanks
+```
+2. Gather list of Jiras that need to be included in the release, discuss 
possible blockers.
+3. Identify final cut-off date.
+4. Before starting the release send a reminder to dev list that merge in 
Apache Drill master is forbidden during
+ release.
+
+Example:
+```
+Note for the committers:
+until the release is not over and Drill version is not changed to 
1.17.0-SNAPSHOT, please do not push any
+changes into Drill master.
+```
+ 
+2. ## Setup environment:
+1. ### GPG key:
+You will need a GPG key set up. This key is needed for signing the 
release.
+1. Read [Apache docs for release 
signing](http://www.apache.org/dev/release-signing.html).
+2. Install the `gpg2` package on Ubuntu, or `gnupg` on MacOS.
+3. Generate GPG key using the **Apache email address** if it wasn't 
done before:
+1. Run `gpg --gen-key` and follow the instructions.
+2. Remember your passphrase.
+3. Save your key pair:
+```
+   gpg --output mygpgkey_pub.gpg --armor --export MY_KEY_ID
+   gpg --output mygpgkey_sec.gpg --armor --export-secret-key 
MY_KEY_ID
+   ```
+4. Or import existing GPG key pair if required:
+```
+gpg --import mygpgkey_pub.gpg
+gpg --allow-secret-key-import --import mygpgkey_sec.gpg
+```
+5. Have another committer signed your key (add to the trust chain).
+Apache advises to do it at 
+[key signing 
parties](https://www.apache.org/dev/release-signing.html#key-signing-party).
+6. Make sure default key is the key generated using the Apache email.
+7. Publish your public key to a public server (e.g. 
http://pgp.surfnet.nl or http://pgp.mit.edu)
+using Web-UIs or CLI (for example using the following command:
+`gpg --keyserver hkp://pool.sks-keyservers.net --trust-model 
always --send-keys`)
+8. Add your key signature to the `KEYS` file in the Drill sources repo.
+Instruction on how to create the key signature is embedded at the 
beginning of the KEYS file.
+9. Add your key signature to the `KEYS` file in 
`https://dist.apache.org` (PMC permissions required for this):
+- `svn co https://dist.apache.org/repos/dist/release/drill 
~/src/release/drill-dist`
+- Update `KEYS` file with required changes
+- Commit changes: `svn ci -m "Add FirstName LastName's key to KEYS 
file"`
+2. ### Setting up LDAP credentials
+1. Go to http://id.apache.org and log in using your Apache login and 
password.
+2. Scroll to the bottom to the 'SSH Key' entry and add the SSH key 
from your `~/.ssh/id_rsa.pub` file.
+If you don't have the SSH key, you should generate it by running 
`ssh-keygen`.
+3. Note that you can add more than 1 SSH key corresponding to multiple 
machines.
+4. Enter your Apache password and submit changes.
+5. Verify that you can do an sftp to the apache server by running the 
following: `sftp @home.apache.org`
+
+3. ### SVN
+You also need access to Apache SVN. (You need to be a PMC member for 
this).
+4. ### Setup Maven
+1. Apache's Maven repository access is documented here:
+http://www.apache.org/dev/publishing-maven-artifacts.html
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+2. For deploying maven artifacts, you must add entries to your 
`~/.m2/settings.xml` file as described here:
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+3. In order to encrypt the LDAP password, see here:

[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000932#comment-17000932
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r359998793
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
 
 Review comment:
   Thanks, done.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add README.md with instructions for release
> ---
>
> Key: DRILL-7230
> URL: https://issues.apache.org/jira/browse/DRILL-7230
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Reporter: Sorabh Hamirwasia
>Assignee: Vova Vysotskyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000936#comment-17000936
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r360362502
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
+```
+
+Body example:
+```
+Hi Drillers,
+ 
+It's been several months since the last release and it is time to do 
the next one. I am volunteering to be the release manager.
+ 
+If there are any issues on which work is in progress, that you feel we 
*must* include in the release, please post in reply to this thread. Based on 
your input we'll define release cut off date.
+ 
+Kind regards
+ 
+Thanks
+```
+2. Gather list of Jiras that need to be included in the release, discuss 
possible blockers.
+3. Identify final cut-off date.
+4. Before starting the release send a reminder to dev list that merge in 
Apache Drill master is forbidden during
+ release.
+
+Example:
+```
+Note for the committers:
+until the release is not over and Drill version is not changed to 
1.17.0-SNAPSHOT, please do not push any
+changes into Drill master.
+```
+ 
+2. ## Setup environment:
+1. ### GPG key:
+You will need a GPG key set up. This key is needed for signing the 
release.
+1. Read [Apache docs for release 
signing](http://www.apache.org/dev/release-signing.html).
+2. Install the `gpg2` package on Ubuntu, or `gnupg` on MacOS.
+3. Generate GPG key using the **Apache email address** if it wasn't 
done before:
+1. Run `gpg --gen-key` and follow the instructions.
+2. Remember your passphrase.
+3. Save your key pair:
+```
+   gpg --output mygpgkey_pub.gpg --armor --export MY_KEY_ID
+   gpg --output mygpgkey_sec.gpg --armor --export-secret-key 
MY_KEY_ID
+   ```
+4. Or import existing GPG key pair if required:
+```
+gpg --import mygpgkey_pub.gpg
+gpg --allow-secret-key-import --import mygpgkey_sec.gpg
+```
+5. Have another committer signed your key (add to the trust chain).
+Apache advises to do it at 
+[key signing 
parties](https://www.apache.org/dev/release-signing.html#key-signing-party).
+6. Make sure default key is the key generated using the Apache email.
+7. Publish your public key to a public server (e.g. 
http://pgp.surfnet.nl or http://pgp.mit.edu)
+using Web-UIs or CLI (for example using the following command:
+`gpg --keyserver hkp://pool.sks-keyservers.net --trust-model 
always --send-keys`)
+8. Add your key signature to the `KEYS` file in the Drill sources repo.
+Instruction on how to create the key signature is embedded at the 
beginning of the KEYS file.
+9. Add your key signature to the `KEYS` file in 
`https://dist.apache.org` (PMC permissions required for this):
+- `svn co https://dist.apache.org/repos/dist/release/drill 
~/src/release/drill-dist`
+- Update `KEYS` file with required changes
+- Commit changes: `svn ci -m "Add FirstName LastName's key to KEYS 
file"`
+2. ### Setting up LDAP credentials
+1. Go to http://id.apache.org and log in using your Apache login and 
password.
+2. Scroll to the bottom to the 'SSH Key' entry and add the SSH key 
from your `~/.ssh/id_rsa.pub` file.
+If you don't have the SSH key, you should generate it by running 
`ssh-keygen`.
+3. Note that you can add more than 1 SSH key corresponding to multiple 
machines.
+4. Enter your Apache password and submit changes.
+5. Verify that you can do an sftp to the apache server by running the 
following: `sftp @home.apache.org`
+
+3. ### SVN
+You also need access to Apache SVN. (You need to be a PMC member for 
this).
+4. ### Setup Maven
+1. Apache's Maven repository access is documented here:
+http://www.apache.org/dev/publishing-maven-artifacts.html
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+2. For deploying maven artifacts, you must add entries to your 
`~/.m2/settings.xml` file as described here:
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+3. In order to encrypt the LDAP password, see here:

[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000931#comment-17000931
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r35029
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
+```
+
+Body example:
+```
+Hi Drillers,
+ 
+It's been several months since the last release and it is time to do 
the next one. I am volunteering to be the release manager.
+ 
+If there are any issues on which work is in progress, that you feel we 
*must* include in the release, please post in reply to this thread. Based on 
your input we'll define release cut off date.
+ 
+Kind regards
+ 
+Thanks
+```
+2. Gather list of Jiras that need to be included in the release, discuss 
possible blockers.
+3. Identify final cut-off date.
+4. Before starting the release send a reminder to dev list that merge in 
Apache Drill master is forbidden during
+ release.
+
+Example:
+```
+Note for the committers:
+until the release is not over and Drill version is not changed to 
1.17.0-SNAPSHOT, please do not push any
+changes into Drill master.
+```
+ 
+2. ## Setup environment:
+1. ### GPG key:
+You will need a GPG key set up. This key is needed for signing the 
release.
+1. Read [Apache docs for release 
signing](http://www.apache.org/dev/release-signing.html).
+2. Install the `gpg2` package on Ubuntu, or `gnupg` on MacOS.
+3. Generate GPG key using the **Apache email address** if it wasn't 
done before:
+1. Run `gpg --gen-key` and follow the instructions.
+2. Remember your passphrase.
+3. Save your key pair:
+```
+   gpg --output mygpgkey_pub.gpg --armor --export MY_KEY_ID
+   gpg --output mygpgkey_sec.gpg --armor --export-secret-key 
MY_KEY_ID
+   ```
+4. Or import existing GPG key pair if required:
+```
+gpg --import mygpgkey_pub.gpg
+gpg --allow-secret-key-import --import mygpgkey_sec.gpg
+```
+5. Have another committer signed your key (add to the trust chain).
+Apache advises to do it at 
+[key signing 
parties](https://www.apache.org/dev/release-signing.html#key-signing-party).
+6. Make sure default key is the key generated using the Apache email.
+7. Publish your public key to a public server (e.g. 
http://pgp.surfnet.nl or http://pgp.mit.edu)
+using Web-UIs or CLI (for example using the following command:
+`gpg --keyserver hkp://pool.sks-keyservers.net --trust-model 
always --send-keys`)
+8. Add your key signature to the `KEYS` file in the Drill sources repo.
 
 Review comment:
   Good point, I have added additional info.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add README.md with instructions for release
> ---
>
> Key: DRILL-7230
> URL: https://issues.apache.org/jira/browse/DRILL-7230
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Tools, Build  Test
>Reporter: Sorabh Hamirwasia
>Assignee: Vova Vysotskyi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (DRILL-7230) Add README.md with instructions for release

2019-12-20 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-7230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000937#comment-17000937
 ] 

ASF GitHub Bot commented on DRILL-7230:
---

vvysotskyi commented on pull request #1937: DRILL-7230: Add README.md with 
instructions for release and release scripts
URL: https://github.com/apache/drill/pull/1937#discussion_r360404569
 
 

 ##
 File path: docs/dev/Release.md
 ##
 @@ -0,0 +1,365 @@
+# Drill Release Process
+
+1. ## Setup release final cut-off date
+
+1. Start mail thread on Apache dev list with the subject: 
+```
+   [DISCUSS] Drill 1.17.0 release
+```
+
+Body example:
+```
+Hi Drillers,
+ 
+It's been several months since the last release and it is time to do 
the next one. I am volunteering to be the release manager.
+ 
+If there are any issues on which work is in progress, that you feel we 
*must* include in the release, please post in reply to this thread. Based on 
your input we'll define release cut off date.
+ 
+Kind regards
+ 
+Thanks
+```
+2. Gather list of Jiras that need to be included in the release, discuss 
possible blockers.
+3. Identify final cut-off date.
+4. Before starting the release send a reminder to dev list that merge in 
Apache Drill master is forbidden during
+ release.
+
+Example:
+```
+Note for the committers:
+until the release is not over and Drill version is not changed to 
1.17.0-SNAPSHOT, please do not push any
+changes into Drill master.
+```
+ 
+2. ## Setup environment:
+1. ### GPG key:
+You will need a GPG key set up. This key is needed for signing the 
release.
+1. Read [Apache docs for release 
signing](http://www.apache.org/dev/release-signing.html).
+2. Install the `gpg2` package on Ubuntu, or `gnupg` on MacOS.
+3. Generate GPG key using the **Apache email address** if it wasn't 
done before:
+1. Run `gpg --gen-key` and follow the instructions.
+2. Remember your passphrase.
+3. Save your key pair:
+```
+   gpg --output mygpgkey_pub.gpg --armor --export MY_KEY_ID
+   gpg --output mygpgkey_sec.gpg --armor --export-secret-key 
MY_KEY_ID
+   ```
+4. Or import existing GPG key pair if required:
+```
+gpg --import mygpgkey_pub.gpg
+gpg --allow-secret-key-import --import mygpgkey_sec.gpg
+```
+5. Have another committer signed your key (add to the trust chain).
+Apache advises to do it at 
+[key signing 
parties](https://www.apache.org/dev/release-signing.html#key-signing-party).
+6. Make sure default key is the key generated using the Apache email.
+7. Publish your public key to a public server (e.g. 
http://pgp.surfnet.nl or http://pgp.mit.edu)
+using Web-UIs or CLI (for example using the following command:
+`gpg --keyserver hkp://pool.sks-keyservers.net --trust-model 
always --send-keys`)
+8. Add your key signature to the `KEYS` file in the Drill sources repo.
+Instruction on how to create the key signature is embedded at the 
beginning of the KEYS file.
+9. Add your key signature to the `KEYS` file in 
`https://dist.apache.org` (PMC permissions required for this):
+- `svn co https://dist.apache.org/repos/dist/release/drill 
~/src/release/drill-dist`
+- Update `KEYS` file with required changes
+- Commit changes: `svn ci -m "Add FirstName LastName's key to KEYS 
file"`
+2. ### Setting up LDAP credentials
+1. Go to http://id.apache.org and log in using your Apache login and 
password.
+2. Scroll to the bottom to the 'SSH Key' entry and add the SSH key 
from your `~/.ssh/id_rsa.pub` file.
+If you don't have the SSH key, you should generate it by running 
`ssh-keygen`.
+3. Note that you can add more than 1 SSH key corresponding to multiple 
machines.
+4. Enter your Apache password and submit changes.
+5. Verify that you can do an sftp to the apache server by running the 
following: `sftp @home.apache.org`
+
+3. ### SVN
+You also need access to Apache SVN. (You need to be a PMC member for 
this).
+4. ### Setup Maven
+1. Apache's Maven repository access is documented here:
+http://www.apache.org/dev/publishing-maven-artifacts.html
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+2. For deploying maven artifacts, you must add entries to your 
`~/.m2/settings.xml` file as described here:
+http://www.apache.org/dev/publishing-maven-artifacts.html#dev-env.
+3. In order to encrypt the LDAP password, see here:

[jira] [Created] (DRILL-7493) convert_fromJSON and unicode

2019-12-20 Thread benj (Jira)
benj created DRILL-7493:
---

 Summary: convert_fromJSON and unicode
 Key: DRILL-7493
 URL: https://issues.apache.org/jira/browse/DRILL-7493
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.16.0
Reporter: benj


transform a json string (with \u char) into json struct
{code:sql}
apache drill> SELECT x_str, convert_fromJSON(x_str) AS x_array 
FROM (SELECT '["test=\u0014=test"]' x_str);
+--+--+
|x_str |   x_array|
+--+--+
| ["test=\u0014=test"] | ["test=\u0014=test"] |
+--+--+
{code}
Use json struct :
{code:sql}
apache drill> SELECT x_str
, x_array
, x_array[0] AS x_array0 
FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array
FROM (SELECT '["test=\u0014=test"]' x_str));
+--+--+-+
|x_str |   x_array|  x_array0   |
+--+--+-+
| ["test=\u0014=test"] | ["test=\u0014=test"] | test==test |
+--+--+-+
{code}
Note that the char \u0014 is interpreted in x_array0

if using split function on x_array0, an array is built with non interpreted 
\u
{code:sql}
apache drill> SELECT x_str
, x_array
, x_array[0] AS x_array0
, split(x_array[0],',') AS x_array0_split 
FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
FROM (SELECT '["test=\u0014=test"]' x_str));
+--+--+-+--+
|x_str |   x_array|  x_array0   |x_array0_split 
   |
+--+--+-+--+
| ["test=\u0014=test"] | ["test=\u0014=test"] | test==test | 
["test=\u0014=test"] |
+--+--+-+--+
{code}
It's not possible to use convert_fromJSON on the interpreted \u
{code:sql}
SELECT x_str
, x_array
, x_array[0] AS x_array0
, split(x_array[0],',') AS x_array0_split
, convert_fromJSON('["' || x_array[0] || '"]') AS convertJSONerror 
FROM(SELECT x_str, convert_fromJSON(x_str) AS x_array 
FROM (SELECT '["test=\u0014=test"]' x_str));
Error: DATA_READ ERROR: Illegal unquoted character ((CTRL-CHAR, code 20)): has 
to be escaped using backslash to be included in string value
 at [Source: (org.apache.drill.exec.vector.complex.fn.DrillBufInputStream); 
line: 1, column: 9]
{code}
don't work although the string is the same as the origin but \u is 
unfortunatly interpreted




--
This message was sent by Atlassian Jira
(v8.3.4#803005)