[GitHub] drill issue #1224: DRILL-6321: Customize Drill's conformance. Allow support ...

2018-05-01 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1224
  
+1 Overall. Note that this is needed for implementing Lateral join and 
Unnest support.


---


[GitHub] drill issue #1224: DRILL-6321: Customize Drill's conformance. Allow support ...

2018-04-30 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1224
  
@vvysotskyi  The lateral join implementation committed in DRILL-6323 
supports cross join for all conditions. The main difficulty in cross join is 
the management of the memory needed to produce a cartesian product (M x N). In 
cross apply though, the cross join is only computed on one row in the left side 
at a time (1 x N), since the left and right sides are correlated. This allows 
cross apply to operate with a very small amount of memory. 


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-30 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r185063074
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RawFragmentBatch.java 
---
@@ -77,4 +83,47 @@ public long getByteCount() {
   public boolean isAckSent() {
 return ackSent.get();
   }
+
+  /**
+   * Transfer ownership of this DrillBuf to the target allocator. This is 
done for better memory
+   * accounting (that is, the operator should be charged with the body's 
Drillbuf memory).
+   *
+   * NOTES -
+   * 
+   * This operation is a NOOP when a) the current allocator 
(associated with the DrillBuf) is not the
+   * owning allocator or b) the target allocator is already the owner
+   * When transfer happens, a new RawFragmentBatch instance is 
allocated; this is done for proper
+   * DrillBuf reference count accounting
+   * The RPC handling code caches a reference to this RawFragmentBatch 
object instance; release()
+   * calls should be routed to the previous DrillBuf
+   * Caller is responsible for checking an OOM condition
+   * 
+   *
+   * @param targetAllocator target allocator
+   * @return a new {@link RawFragmentBatch} object instance on success 
(where the buffer ownership has
+   * been switched to the target allocator); otherwise this 
operation is a NOOP (current instance
+   * returned)
+   */
+  public RawFragmentBatch transferBodyOwnership(BufferAllocator 
targetAllocator) {
--- End diff --

DrillBuf has a transferOwnership method that could be used instead? See the 
transferTo methods in any of the vector classes -  
[FixedValueVectors.transferTo()](https://github.com/apache/drill/blob/master/exec/vector/src/main/codegen/templates/FixedValueVectors.java#L282)
 
Saves you the trouble of rolling your own and also avoids the need to make 
changes to any of the existing classes.


---


[GitHub] drill pull request #1237: DRILL-6348: Fixed code so that Unordered Receiver ...

2018-04-30 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1237#discussion_r185064842
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unorderedreceiver/UnorderedReceiverBatch.java
 ---
@@ -182,13 +189,15 @@ public IterOutcome next() {
 return IterOutcome.OUT_OF_MEMORY;
   }
 
+  // Transfer the ownership of this raw-batch to this operator for 
proper memory statistics reporting
+  batch = batch.transferBodyOwnership(oContext.getAllocator());
--- End diff --

This should probably be done inside the [ RecordBatchLoader.load() 
](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatchLoader.java#L78)
 method.
Note that `MergingRecordBatch` has similar code and so probably suffers 
from the same memory accounting issue. All other uses of 
`RecordBatchLoader.load()` appear to be in the client code or test code so we 
are unlikely to break anything by making the change in `RecordBatchLoader`.



---


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
```
What do you mean by "Json representation"? 
```
Sorry, my mistake, got all tangled up. 
```
 we may want to further translate the Local [Date|Time|DateTime] objects 
inside the Map|List to java.sql.[Date|Time|Timestamp] upon access. But to do 
that inside the SqlAccessor, you would need to deep copy the Map|List and build 
another version with the date|time translated into java.sql.date|time.
```
That is what I thought you wanted to get to. If the current state is 
something you can work with, then great.  I can review the final changes once 
you're done and merge them as well. 
Let's move the other discussion to another thread or JIRA.


---


[GitHub] drill issue #1240: DRILL-6327: Update unary operators to handle IterOutcome....

2018-04-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1240
  
+1. Very nicely done.


---


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-26 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
Putting aside the discussion on date/time/timezone for the moment, 
@jiang-wu let's say getObject returns to you an object that implements 
java.sql.{Struct|Array}. You now use the Struct|Array apis to get the attribute 
you are interested in. If the attribute is of type date|time the object 
returned for that attribute should now correspond to java.sql.{Date|Time} 
instead of the Json representation. Will that not address your requirement?



---


[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...

2018-04-26 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1144
  
Looks good. +1


---


[GitHub] drill pull request #1234: DRILL-5927: Fixed memory leak in TestBsonRecordRea...

2018-04-26 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1234#discussion_r184549758
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/bson/TestBsonRecordReader.java
 ---
@@ -45,21 +47,24 @@
 import org.bson.BsonTimestamp;
 import org.bson.BsonWriter;
 import org.bson.types.ObjectId;
-import org.junit.AfterClass;
-import org.junit.BeforeClass;
+import org.junit.After;
+import org.junit.Before;
 import org.junit.Test;
 
-public class TestBsonRecordReader extends BaseTestQuery {
-  private static VectorContainerWriter writer;
-  private static TestOutputMutator mutator;
-  private static BsonRecordReader bsonReader;
+public class TestBsonRecordReader {
+  private BufferAllocator allocator;
+  private VectorContainerWriter writer;
+  private TestOutputMutator mutator;
+  private BufferManager bufferManager;
+  private BsonRecordReader bsonReader;
 
-  @BeforeClass
-  public static void setUp() {
-BufferAllocator bufferAllocator = getDrillbitContext().getAllocator();
-mutator = new TestOutputMutator(bufferAllocator);
+  @Before
+  public void setUp() {
+allocator = new RootAllocator(Long.MAX_VALUE);
--- End diff --

Any reason why you had to change to RootAllocator? Should not really be 
necessary.


---


[GitHub] drill issue #1232: DRILL-6094: Decimal data type enhancements

2018-04-26 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1232
  
+1 for the C++ part. Looks really good. 


---


[GitHub] drill issue #1214: DRILL-6331: Revisit Hive Drill native parquet implementat...

2018-04-26 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1214
  
Looks Good. Thanks for making the changes Arina.
+1


---


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-26 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
>  But, if April 15 is your birthday, it is your birthday in all timezones. 
We don't say your birthday (or order date, or newspaper issue date or...) is 
one day in, say London and another day in Los Angeles.

If it is your birthday in California, it may already be the day after your 
birthday in Japan. :)

IMO, Representing dates, times, and timestamp's as UTC is not the problem. 
It is, in fact, perfectly correct (since UTC is the timezone). Converting a 
date|time|timestamp without a timezone to/from UTC, is the problem. The problem 
is made worse by java.util and JDBC APIs. java.time gets it right though.

However, as Jiang-wu points out, that still does not address the mismatch 
between Joda/Java8 representation and JDBC. It also does not address his 
original problem, the issue of how to represent a complex type in JDBC; just 
return an Object, it says, which is no help at all . It is even worse for ODBC 
which (last I checked) did not even have an API to return an Object type (which 
is why in ODBC we return a JSON string representation). 
For Jiang-wu's use case, since the string representation is not enough, we 
might look at returning a java.sql.Struct [1] type for Maps and java.sql.Array 
[2] types.

[1] https://docs.oracle.com/javase/7/docs/api/java/sql/Struct.html
[2] https://docs.oracle.com/javase/7/docs/api/java/sql/Array.html






---


[GitHub] drill issue #1238: DRILL-6281: Refactor TimedRunnable

2018-04-26 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1238
  
Fair enough. But that *still* does not give me a clue about the problem(s) 
you were trying to fix, or how the refactoring helps. Is the cause of the 
problem in TimedRunnable? 
Not an unreasonable request is it?



---


[GitHub] drill issue #1238: DRILL-6281: Refactor TimedRunnable

2018-04-25 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1238
  
I had the same question as Arina about the need for this change and I got 
info from DRILL-5908 about the problem cause and how this change will help fix 
it. Can you elucidate?


---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-25 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r184243856
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Hmm. That takes us back to the original problem, that of the 
date|time|timestamp field inside a complex object. 
```
select t.context.date, t.context from test t;

will return a java.sql.Date object for column 1, but a java.time.LocalDate 
for the same object inside column 2. This doesn't seem like a good thing.
```
Why should that be a bad thing though? Ultimately, the object returned by 
getObject() is displayed to the end user thru the toString method. The string 
representation of Local[Date|Time|Timestamp]  should be the same as that of 
java.sql.[Date|Time|Timestamp]. Isn't it?



---


[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...

2018-04-25 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1144#discussion_r184242257
  
--- Diff: src/main/resources/checkstyle-config.xml ---
@@ -30,10 +30,15 @@
 
 
 
+
--- End diff --

So what do you tell the user when you get a runtime exception (any 
exception) that is the result of a bug? It is silly to show them a stack trace 
that does not help them. It is better to let the user know that there was an 
internal error and they should log a bug with support or look for help on the 
dev list.


---


[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...

2018-04-25 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r184199682
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
@@ -158,25 +159,26 @@ public void close() {
 } catch (RuntimeException e) {
   ex = ex == null ? e : ex;
 }
-try {
-  if (fs != null) {
+
+for (DrillFileSystem fs : fileSystems) {
+  try {
 fs.close();
-fs = null;
-  }
-} catch (IOException e) {
+  } catch (IOException e) {
   throw UserException.resourceError(e)
-.addContext("Failed to close the Drill file system for " + 
getName())
-.build(logger);
+  .addContext("Failed to close the Drill file system for " + 
getName())
+  .build(logger);
+  }
 }
+
 if (ex != null) {
   throw ex;
 }
   }
 
   @Override
   public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
-Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
-fs = new DrillFileSystem(conf, getStats());
+DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --

I'm not suggesting we use the same fs for each split, but the opposite. The 
fs obect used per split/rowgroup should be different so that we get the right 
fs wait time for every minor fragment. But this change allows more than one fs 
object per operator context; which we were explicitly preventing earlier. I'm 
not sure I understand why you needed to change that.




---



[GitHub] drill issue #1231: DRILL-6342: Fix schema path unIndexed method to return co...

2018-04-25 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1231
  
+1. LGTM


---


[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...

2018-04-25 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r183909850
  
--- Diff: common/src/main/java/org/apache/drill/common/Stopwatch.java ---
@@ -0,0 +1,186 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.common;
+
+import com.google.common.base.Ticker;
+
+import java.util.concurrent.TimeUnit;
+
+/**
+ * Helper that creates stopwatch based if debug level is enabled.
--- End diff --

Do we really need this? In general we have (or should have) used Stopwatch 
to track metrics and or performance bottlenecks in production. In neither case 
do we want to enable debug.
Also, for debugging performance issues (I see that the places you've 
changed to use this Stopwatch are places where we encountered performance 
issues), would it be better to use 
```
Stopwatch timer;
if(logger.isDebugEnabled()){
   timer = Stopwatch.createStarted();
}
```
More verbose, but guaranteed to be optimized away by the JVM.
Not insisting that we change this, BTW.


---


[GitHub] drill pull request #1214: DRILL-6331: Revisit Hive Drill native parquet impl...

2018-04-25 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1214#discussion_r183919198
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ops/BaseOperatorContext.java 
---
@@ -158,25 +159,26 @@ public void close() {
 } catch (RuntimeException e) {
   ex = ex == null ? e : ex;
 }
-try {
-  if (fs != null) {
+
+for (DrillFileSystem fs : fileSystems) {
+  try {
 fs.close();
-fs = null;
-  }
-} catch (IOException e) {
+  } catch (IOException e) {
   throw UserException.resourceError(e)
-.addContext("Failed to close the Drill file system for " + 
getName())
-.build(logger);
+  .addContext("Failed to close the Drill file system for " + 
getName())
+  .build(logger);
+  }
 }
+
 if (ex != null) {
   throw ex;
 }
   }
 
   @Override
   public DrillFileSystem newFileSystem(Configuration conf) throws 
IOException {
-Preconditions.checkState(fs == null, "Tried to create a second 
FileSystem. Can only be called once per OperatorContext");
-fs = new DrillFileSystem(conf, getStats());
+DrillFileSystem fs = new DrillFileSystem(conf, getStats());
--- End diff --

I don't get why you need multiple DrillFileSystems per operator context? 
The reason for the DrillFileSystem abstraction (and the reason for tying it to 
the operator context) is to track the time a (scan) operator was waiting for a 
file system call to return. This is reported in the wait time for the operator 
in the query profile.  For scans this is a critical number as the time spent 
waiting for a disk read determines if the query is disk bound.
Associating multiple file system objects with a single operator context 
will throw the math out of whack. I think.



---


[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...

2018-04-23 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1144#discussion_r183530901
  
--- Diff: src/main/resources/checkstyle-config.xml ---
@@ -30,10 +30,15 @@
 
 
 
+
--- End diff --

I think IOBE should be caught and converted to a UserException so that the 
end user does not get an incomprehensible error message. 


---


[GitHub] drill issue #1233: Updated with links to previous releases

2018-04-23 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1233
  
For older releases a single link to http://archive.apache.org/dist/drill 
would be better (you won't have to update this very time there is a release).


---


[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation

2018-04-20 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1223#discussion_r183105915
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestImpl.java
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.unnest;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.LateralContract;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+
+/**
+ * Contains the actual unnest operation. Unnest is a simple transfer 
operation in this impelementation.
+ * For use as a table function, we will need to change the logic of the 
unnest method to operate on
+ * more than one row at a time and remove any dependence on Lateral
+ * {@link org.apache.drill.exec.physical.impl.flatten.FlattenTemplate}.
+ * This class follows the pattern of other operators that generate code at 
runtime. Normally this class
+ * would be abstract and have placeholders for doSetup and doEval. Unnest 
however, doesn't require code
+ * generation so we can simply implement the code in a simple class that 
looks similar to the code gen
+ * templates used by other operators but does not implement the doSetup 
and doEval methods.
+ */
+public class UnnestImpl implements Unnest {
+  private static final Logger logger = 
LoggerFactory.getLogger(UnnestImpl.class);
+
+  private static final int OUTPUT_ROW_COUNT = ValueVector.MAX_ROW_COUNT;
+
+  private ImmutableList transfers;
+  private LateralContract lateral; // corresponding lateral Join (or other 
operator implementing the Lateral Contract)
+  private SelectionVectorMode svMode;
+  private RepeatedValueVector fieldToUnnest;
+  private RepeatedValueVector.RepeatedAccessor accessor;
+
+  /**
+   * The output batch limit starts at OUTPUT_ROW_COUNT, but may be 
decreased
+   * if records are found to be large.
+   */
+  private int outputLimit = OUTPUT_ROW_COUNT;
+
+
+  // The index in the unnest column that is being processed.We start at 
zero and continue until
+  // InnerValueCount is reached or  if the batch limit is reached
+  // this allows for groups to be written between batches if we run out of 
space, for cases where we have finished
+  // a batch on the boundary it will be set to 0
+  private int innerValueIndex = 0;
+
+  @Override
+  public void setUnnestField(RepeatedValueVector unnestField) {
+this.fieldToUnnest = unnestField;
+this.accessor = 
RepeatedValueVector.RepeatedAccessor.class.cast(unnestField.getAccessor());
+  }
+
+  @Override
+  public RepeatedValueVector getUnnestField() {
+return fieldToUnnest;
+  }
+
+  @Override
+  public void setOutputCount(int outputCount) {
+outputLimit = outputCount;
+  }
+
+  @Override
+  public final int unnestRecords(final int recordCount) {
+switch (svMode) {
+  case FOUR_BYTE:
+throw new UnsupportedOperationException("Unnest does not support 
selection vector inputs.");
+
+  case TWO_BYTE:
+throw new UnsupportedOperationException("Unnest does not support 
selection vector inputs.");
+
+  case NONE:
+if (innerValueIndex == -1) {
+  innerValueIndex = 0;
+}
+
+// Current record being processed in the incoming record batch. We 
could keep
+// track of

[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation

2018-04-20 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1223#discussion_r183105769
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/Unnest.java
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.unnest;
+
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.LateralContract;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
+
+import java.util.List;
+
+/**
+ * Placeholder for future unnest implementation that may require code 
generation. Current implementation does not
+ * require any
+ * @see UnnestImpl
+ */
+public interface Unnest {
+  //TemplateClassDefinition TEMPLATE_DEFINITION = new 
TemplateClassDefinition(Unnest.class, UnnestImpl
+  // .class);
+
+  void setup(FragmentContext context, RecordBatch incoming, RecordBatch 
outgoing, List transfers,
+  LateralContract lateral) throws SchemaChangeException;
+
+  int unnestRecords(int recordCount);
--- End diff --

Done


---


[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation

2018-04-20 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1223#discussion_r183106563
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestRecordBatch.java
 ---
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.unnest;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Lists;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.expression.FieldReference;
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.logical.data.NamedExpression;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.exception.ClassTransformationException;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.expr.ClassGenerator;
+import org.apache.drill.exec.expr.CodeGenerator;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.ops.MetricDef;
+import org.apache.drill.exec.physical.base.LateralContract;
+import org.apache.drill.exec.physical.config.UnnestPOP;
+import org.apache.drill.exec.record.AbstractSingleRecordBatch;
+import org.apache.drill.exec.record.AbstractTableFunctionRecordBatch;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.CloseableRecordBatch;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.RecordBatchMemoryManager;
+import org.apache.drill.exec.record.RecordBatchSizer;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.record.TypedFieldId;
+import org.apache.drill.exec.record.VectorContainer;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedMapVector;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
+
+import java.io.IOException;
+import java.util.List;
+
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+
+// TODO - handle the case where a user tries to unnest a scalar, should 
just return the column as is
+public class UnnestRecordBatch extends 
AbstractTableFunctionRecordBatch {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(UnnestRecordBatch.class);
+
+  private Unnest unnest;
+  private boolean hasRemainder = false; // set to true if there is data 
left over for the current row AND if we want
+// to keep processing it. Kill may 
be called by a limit in a subquery that
+// requires us to stop processing 
thecurrent row, but not stop processing
+// the data.
+  // In some cases we need to return a predetermined state from a call to 
next. These are:
+  // 1) Kill is called due to an error occurring in the processing of the 
query. IterOutcome should be NONE
+  // 2) Kill is called by LIMIT to stop processing of the current row 
(This occurs when the LIMIT is part of a subquery
+  //between UNNEST and LATERAL. Iteroutcome should be EMIT
+  // 3) Kill is called by LIMIT downstream from LATERAL. IterOutcome 
should be NONE
+  private IterOutcome nextState = OK;
+  private int remainderIndex = 0;
+  private int recordCount;
+  private MaterializedField unnestFieldMetadata;
+  private final UnnestMemoryManager memoryManager;
+
+  public enum Metric implements MetricDef {
+INPUT_BATCH_COUNT,
+AVG_INPUT_BATCH_BYTES,
+AVG_INPUT_ROW_BYTES,
+INPUT_RECORD_COUNT,
+  

[GitHub] drill pull request #1223: DRILL-6324: Unnest initial implementation

2018-04-20 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1223#discussion_r183105882
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/unnest/UnnestImpl.java
 ---
@@ -0,0 +1,137 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.unnest;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.LateralContract;
+import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.TransferPair;
+import org.apache.drill.exec.vector.ValueVector;
+import org.apache.drill.exec.vector.complex.RepeatedValueVector;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.List;
+
+/**
+ * Contains the actual unnest operation. Unnest is a simple transfer 
operation in this impelementation.
+ * For use as a table function, we will need to change the logic of the 
unnest method to operate on
+ * more than one row at a time and remove any dependence on Lateral
+ * {@link org.apache.drill.exec.physical.impl.flatten.FlattenTemplate}.
+ * This class follows the pattern of other operators that generate code at 
runtime. Normally this class
+ * would be abstract and have placeholders for doSetup and doEval. Unnest 
however, doesn't require code
+ * generation so we can simply implement the code in a simple class that 
looks similar to the code gen
+ * templates used by other operators but does not implement the doSetup 
and doEval methods.
+ */
+public class UnnestImpl implements Unnest {
+  private static final Logger logger = 
LoggerFactory.getLogger(UnnestImpl.class);
+
+  private static final int OUTPUT_ROW_COUNT = ValueVector.MAX_ROW_COUNT;
+
+  private ImmutableList transfers;
+  private LateralContract lateral; // corresponding lateral Join (or other 
operator implementing the Lateral Contract)
+  private SelectionVectorMode svMode;
+  private RepeatedValueVector fieldToUnnest;
+  private RepeatedValueVector.RepeatedAccessor accessor;
+
+  /**
+   * The output batch limit starts at OUTPUT_ROW_COUNT, but may be 
decreased
+   * if records are found to be large.
+   */
+  private int outputLimit = OUTPUT_ROW_COUNT;
+
+
+  // The index in the unnest column that is being processed.We start at 
zero and continue until
+  // InnerValueCount is reached or  if the batch limit is reached
+  // this allows for groups to be written between batches if we run out of 
space, for cases where we have finished
+  // a batch on the boundary it will be set to 0
+  private int innerValueIndex = 0;
+
+  @Override
+  public void setUnnestField(RepeatedValueVector unnestField) {
+this.fieldToUnnest = unnestField;
+this.accessor = 
RepeatedValueVector.RepeatedAccessor.class.cast(unnestField.getAccessor());
+  }
+
+  @Override
+  public RepeatedValueVector getUnnestField() {
+return fieldToUnnest;
+  }
+
+  @Override
+  public void setOutputCount(int outputCount) {
+outputLimit = outputCount;
+  }
+
+  @Override
+  public final int unnestRecords(final int recordCount) {
+switch (svMode) {
--- End diff --

It is not. Changed this and put a precondition check instead


---


[GitHub] drill pull request #1223: Drill 6324: Unnest initial implementation

2018-04-18 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/1223

Drill 6324: Unnest initial implementation

Implementation of the unnest operator that works in sync with the Lateral 
Join operator. The code is based on the Flatten implementation except that it 
does not do the cross join that flatten does. As a result, the operator does 
not need any code generation.
Commit # 2 - adds specific handling for kill that may be as a result of a 
limit being reached by a downstream operator. 

@sohami, @amansinha100 please review.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-6324

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1223


commit 7056e66b6e05f3a8f6cb9c4326d4b0a73cd87122
Author: Parth Chandra <parthc@...>
Date:   2018-02-08T05:13:13Z

DRILL-6324: Unnest - Initial Implementation

- Based on Flatten
- Implement unnestRecords in UnnestTemplate
- Remove unnecessary code inherited from Flatten/Project. Add schema change 
handling.
- Fix build failure after rebase since RecordBatchSizer used by UNNEST was 
relocated to a different package
- Add unit tests
- Handling of input row splitting across multiple batches. Also do not kill 
incoming in killIncoming.
- Schema change generated by Unnest

commit b5aaa143089da127396b2abc766cdb14b2843817
Author: Parth Chandra <parthc@...>
Date:   2018-02-26T15:58:31Z

DRILL-6324: Unnest - kill handling, remove codegen, and unit test for non 
array columns

commit 298f90654a487ab340582ce0cd5a0bf665536b9f
Author: Parth Chandra <parthc@...>
Date:   2018-03-07T08:23:14Z

DRILL-6324: Unnest - Add tests with real Unnest and real Lateral.

commit 8d9e02b4f11cda8458288c873bc2a94765569c43
Author: Parth Chandra <parthc@...>
Date:   2018-03-26T11:16:33Z

DRILL-6324: Unnest - code cleanup, more comments, fix license headers,
and more logging.

Refactor Unnest to allow setting in incoming batch after construction
fix compilation after rebase




---


[GitHub] drill issue #1221: DRILL-6323: Fix license headers

2018-04-18 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1221
  
Closed via cc606db


---


[GitHub] drill pull request #1221: DRILL-6323: Fix license headers

2018-04-18 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/1221

DRILL-6323: Fix license headers

@ilooner can you please check these. Looks like the latest commit(s) went 
in with some broken license headers.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-6323-l

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1221.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1221


commit cc606dbb9635cee0c5ecae8dba5530a46887d481
Author: Parth Chandra <parthc@...>
Date:   2018-04-18T22:19:10Z

DRILL-6323: Fix license headers




---


[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation

2018-04-17 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1212#discussion_r182283067
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java
 ---
@@ -0,0 +1,861 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.LateralContract;
+import org.apache.drill.exec.physical.config.LateralJoinPOP;
+import org.apache.drill.exec.record.AbstractBinaryRecordBatch;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.JoinBatchMemoryManager;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.RecordBatchSizer;
+import org.apache.drill.exec.record.VectorAccessibleUtilities;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.ValueVector;
+
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP;
+
+/**
+ * RecordBatch implementation for the lateral join operator. Currently 
it's expected LATERAL to co-exists with UNNEST
+ * operator. Both LATERAL and UNNEST will share a contract with each other 
defined at {@link LateralContract}
+ */
+public class LateralJoinBatch extends 
AbstractBinaryRecordBatch implements LateralContract {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class);
+
+  // Input indexes to correctly update the stats
+  private static final int LEFT_INPUT = 0;
+
+  private static final int RIGHT_INPUT = 1;
+
+  // Maximum number records in the outgoing batch
+  private int maxOutputRowCount;
+
+  // Schema on the left side
+  private BatchSchema leftSchema;
+
+  // Schema on the right side
+  private BatchSchema rightSchema;
+
+  // Index in output batch to populate next row
+  private int outputIndex;
+
+  // Current index of record in left incoming which is being processed
+  private int leftJoinIndex = -1;
+
+  // Current index of record in right incoming which is being processed
+  private int rightJoinIndex = -1;
+
+  // flag to keep track if current left batch needs to be processed in 
future next call
+  private boolean processLeftBatchInFuture;
+
+  // Keep track if any matching right record was found for current left 
index record
+  private boolean matchedRecordFound;
+
+  private boolean useMemoryManager = true;
+
+  /* 

+   * Public Methods
+   * 
/
+  public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext 
context,
+  RecordBatch left, RecordBatch right) throws 
OutOfMemoryException {
+super(popConfig, 

[GitHub] drill issue #1208: DRILL-6295: PartitionerDecorator may close partitioners w...

2018-04-17 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1208
  
+1


---


[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation

2018-04-17 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1212#discussion_r182172836
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/LateralJoinBatch.java
 ---
@@ -0,0 +1,861 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.impl.join;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import org.apache.calcite.rel.core.JoinRelType;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.ExecConstants;
+import org.apache.drill.exec.exception.OutOfMemoryException;
+import org.apache.drill.exec.exception.SchemaChangeException;
+import org.apache.drill.exec.ops.FragmentContext;
+import org.apache.drill.exec.physical.base.LateralContract;
+import org.apache.drill.exec.physical.config.LateralJoinPOP;
+import org.apache.drill.exec.record.AbstractBinaryRecordBatch;
+import org.apache.drill.exec.record.BatchSchema;
+import org.apache.drill.exec.record.JoinBatchMemoryManager;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.record.RecordBatch;
+import org.apache.drill.exec.record.RecordBatchSizer;
+import org.apache.drill.exec.record.VectorAccessibleUtilities;
+import org.apache.drill.exec.record.VectorWrapper;
+import org.apache.drill.exec.vector.ValueVector;
+
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.EMIT;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.NONE;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.OK;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OK_NEW_SCHEMA;
+import static 
org.apache.drill.exec.record.RecordBatch.IterOutcome.OUT_OF_MEMORY;
+import static org.apache.drill.exec.record.RecordBatch.IterOutcome.STOP;
+
+/**
+ * RecordBatch implementation for the lateral join operator. Currently 
it's expected LATERAL to co-exists with UNNEST
+ * operator. Both LATERAL and UNNEST will share a contract with each other 
defined at {@link LateralContract}
+ */
+public class LateralJoinBatch extends 
AbstractBinaryRecordBatch implements LateralContract {
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(LateralJoinBatch.class);
+
+  // Input indexes to correctly update the stats
+  private static final int LEFT_INPUT = 0;
+
+  private static final int RIGHT_INPUT = 1;
+
+  // Maximum number records in the outgoing batch
+  private int maxOutputRowCount;
+
+  // Schema on the left side
+  private BatchSchema leftSchema;
+
+  // Schema on the right side
+  private BatchSchema rightSchema;
+
+  // Index in output batch to populate next row
+  private int outputIndex;
+
+  // Current index of record in left incoming which is being processed
+  private int leftJoinIndex = -1;
+
+  // Current index of record in right incoming which is being processed
+  private int rightJoinIndex = -1;
+
+  // flag to keep track if current left batch needs to be processed in 
future next call
+  private boolean processLeftBatchInFuture;
+
+  // Keep track if any matching right record was found for current left 
index record
+  private boolean matchedRecordFound;
+
+  private boolean useMemoryManager = true;
+
+  /* 

+   * Public Methods
+   * 
/
+  public LateralJoinBatch(LateralJoinPOP popConfig, FragmentContext 
context,
+  RecordBatch left, RecordBatch right) throws 
OutOfMemoryException {
+super(popConfig, 

[GitHub] drill pull request #1212: DRILL-6323: Lateral Join - Initial implementation

2018-04-17 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1212#discussion_r182172309
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java ---
@@ -186,6 +186,11 @@ private ExecConstants() {
   public static final String BIT_ENCRYPTION_SASL_ENABLED = 
"drill.exec.security.bit.encryption.sasl.enabled";
   public static final String BIT_ENCRYPTION_SASL_MAX_WRAPPED_SIZE = 
"drill.exec.security.bit.encryption.sasl.max_wrapped_size";
 
+  /**
--- End diff --

Not needed as you don't have code gen


---


[GitHub] drill issue #1211: DRILL-6322: Lateral Join: Common changes - Add new iterOu...

2018-04-16 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1211
  
+1


---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-13 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181513279
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Sounds good.


---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-13 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181461481
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Either one is fine (since java.time is based on Joda). We've switched to 
Java 8, but just for consistency with the rest of the code, we might as well 
use Joda.


---


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-13 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
I think that would be best. 


---


[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

2018-04-12 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1184
  
Can you check the unit tests after rebasing? I applied the PR to the latest 
master and get errors in the same tests. Thanks.


---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-12 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r181248969
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

Agree that messing with Timezones is a Bad Thing. Probably an artifact of 
the way java.util did things.
Anyway, I did mean using org.joda.time.Local[Data|Time|TimeStamp]  or the 
corresponding java.time.* classes.



---


[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

2018-04-11 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1184#discussion_r180914676
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
 public ${friendlyType} getObject(int index) {
   org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), 
org.joda.time.DateTimeZone.UTC);
   date = 
date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
-  return date;
+  return new java.sql.Date(date.getMillis());
--- End diff --

@jiang-wu Thanks for making these changes. Your fix is on the right track.  
However, I'm not sure if we want to introduce a dependency on JDBC classes in 
the vectors. 
Take a look at DateAccessor, TimeAccessor, and TimeStampAccessor. These are 
generated from 
[SqlAccessor](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/templates/SqlAccessors.java).
 
The get methods in these convert from UTC to a Local{Date|Time|TimeStamp}.  
Subsequently they convert to the JDBC type since they are used by the JDBC 
driver.
The vectors should be able to do the same, just returning the 
Local{Date|Time|TimeStamp} object.
I'm not sure if that might affect tests that depend on timezone though. 
Perhaps @vdiravka can comment. 


---


[GitHub] drill pull request #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsExc...

2018-04-10 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1144#discussion_r180598658
  
--- Diff: exec/vector/src/main/codegen/templates/VariableLengthVectors.java 
---
@@ -534,15 +534,11 @@ public void setSafe(int index, byte[] bytes) {
   assert index >= 0;
 
   final int currentOffset = offsetVector.getAccessor().get(index);
-  offsetVector.getMutator().setSafe(index + 1, currentOffset + 
bytes.length);
-  try {
-data.setBytes(currentOffset, bytes, 0, bytes.length);
-  } catch (IndexOutOfBoundsException e) {
-while (data.capacity() < currentOffset + bytes.length) {
-  reAlloc();
-}
-data.setBytes(currentOffset, bytes, 0, bytes.length);
+  while (data.capacity() < currentOffset + bytes.length) {
--- End diff --

I'm on the fence on this one. The change from 18 months ago resulted in a 
performance improvement of a few percent for variable length columns. This 
change essentially undoes that and takes us back to where we were. It seems 
that with the recent changes to batch sizing, we are on the path to eliminating 
the realloc behaviour that is part of the performance bottleneck for the 
setSafe methods. So, perhaps, it might be ok to let this one stay.  


---


[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...

2018-04-03 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1144
  
I think we need to include a few other folks into this. @paul-rogers, 
@sachouche, have also looked into the issue of excessive bounds checking and 
ways to write to direct memory with minimum overhead.

Both Salim and Paul have done work which has tried to eliminate the 
excessive checking and use `PlatformDependent` directly, so it might be the 
right time to agree on the right approach here. At a high level, I believe 
there is agreement that we need to 1) reduce bounds checking to (preferably) 
once per write, and 2) to minimise the number of function calls before memory 
is actually written to.

We have three layers where we could potentially check bounds - i) the 
operators, ii) the vectors, iii) DrillBuf. Right now, we do so at each level, 
at multiple times at that. Paul's work on batch sizing provides us with a layer 
that gives us the bounds check guarantees at the operator level. This means we 
could potentially use value-vectors' set methods (as opposed to the setSafe 
methods) and DrillBuf can use PlatformDependent directly. 

Some caveats - 
UDLE's check for and enforce little-endianness. Checking for endianness is 
important for value vectors because they assume little endian,  but the 
enforcement is sometimes not so desirable. Drill's Java client uses the same 
DrillBuf backed by a UDLE and that means that client applications can no longer 
run on big endian machines (and yes, I have heard this request from end users). 
However, the fact is that UDLE's are an intrinsic part of the drill-memory 
design [1] [2]. Eliminating UDLE's can lead to re-doing large parts of very 
well tested code.

The caveat to using the vector set methods is that the setSafe methods 
provide resizing capability that operators have come to rely upon. Switching 
from setSafe to set breaks practically every operator. 


[1] 
https://github.com/jacques-n/drill/blob/DRILL-4144/exec/memory/base/src/main/java/org/apache/drill/exec/memory/README.md
[2] 
https://docs.google.com/nonceSigner?nonce=nj279efks0ro0=https://doc-0o-as-docs.googleusercontent.com/docs/securesc/gipu3hlcf22l6svruqr71h7qe2k3djum/5v7eb2cm4bghq76nj658ai5hkk9h52ur/152274960/11021365158327727934/11021365158327727934/0B6CmYjIAywyCalVwcURkaFlkc1U?e%3Ddownload=41l8jspccbj1pp63750c5von8ol4ijtl

 






---


[GitHub] drill issue #1182: DRILL-6287: apache-release profile should be disabled by ...

2018-04-02 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1182
  
Sorry, Maven not being a strong point, I didn't understand initially what I 
was looking at.

+1



---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-04-02 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul thanks for making all the changes (and of course for the fix)!


---


[GitHub] drill issue #1182: DRILL-6287: apache-release profile should be disabled by ...

2018-04-02 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1182
  
I don't understand why the apache-release be disabled by default. And I 
don't see how this change achieves that anyway.

Also, moving -Xdoclint:none to all profiles implies we are no longer 
supporting development using JDK7 ? I'm OK with that, but not sure if we 
concluded that at the time of the 1.13 release.

If that's what we want to do, I'm fine with this change.



---


[GitHub] drill issue #1060: DRILL-5846: Improve parquet performance for Flat Data Typ...

2018-03-30 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1060
  
I feel putting this PR in without finalizing DRILL-6301 is putting the cart 
before the horse. (BTW, it would help the discussion if the benchmarks were 
published !). My observation based on profiling I did sometime back is that the 
performance gains seen here are roughly in line with removing bounds checks. 
Paul has seen similar gains in the batch sizing project.
Which takes us back to the question, raised by Paul in his first comment, 
of how we want to reconcile batch sizing and vectorizing of scans; a question 
we have deferred. If removing bounds checks gets us the same performance gains, 
then why not would put our efforts in implementing batch sizing with the 
accompanying elimination in bounds checking. 
I'm mostly not in favor of having MemoryUtils unless you make a compelling 
argument that it is the only way to save the planet (i.e get the performance 
you want). I feel operators should not establish the pattern of accessing 
memory directly. So far, I'm -0 on this as my arguments are mostly high level 
(and somewhat philosophical). 
Minor nitpick - The prefix VL is not as informative as say, VarLen or 
VariableLength.



---


[GitHub] drill issue #1168: DRILL-6246: Reduced the size of the jdbc-all jar file

2018-03-30 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1168
  
I would recommend trying to setup a connection using Spotfire or Squirrel 
and running a couple of metadata queries and a couple of queries on complex 
data. (These have traditionally been areas that were affected when the jar was 
incomplete). 
Also try on a secure cluster. You might need help from other folks on some 
of this. 


---


[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...

2018-03-29 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1144
  
To be honest, I don't know. There are too many paths available to write to 
direct memory, some which were developed over time and so not all vector code 
may be using them.
Ideally, every bit of code should go thru DrillBuf which wraps a UDLE 
(UnsafeDirectLittleEndian). UDLE has one level of bounds check (which I feel is 
necessary) and uses PlatformDependent directly.
Vector set safe methods provides vector level bounds checking and the 
vector set methods bypass bounds checking. ResultSetLoader methods provide 
guarantees that may make the set safe methods redundant.
Our goal should be to not depend on the set safe methods except for 
debugging.
What exactly are you thinking of when proposing using PlatformDependent 
directly.


---


[GitHub] drill issue #1194: DRILL-6300: Refresh protobuf C++ source files

2018-03-29 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1194
  
+1


---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-21 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
+1. LGTM


---


[GitHub] drill issue #1170: DRILL-6223: Fixed several Drillbit failures due to schema...

2018-03-16 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1170
  
I added a comment in the JIRA - 
[DRILL-6223](https://issues.apache.org/jira/browse/DRILL-6223?focusedCommentId=16402223=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16402223)



---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul this link is good. As expected, the int96 column is dictionary 
encoded. 
Is it possible for you to extract just a couple of records from this file 
and then use that for a unit test? 
see 
[TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange](https://github.com/apache/drill/blob/master/exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/writer/TestParquetWriter.java#L784)

@vdiravka TestParquetWriter.testImpalaParquetBinaryAsTimeStamp_DictChange 
also uses an int96 that is dictionary encoded. Any idea whether (and why) it 
might be going thru a different code path?




---


[GitHub] drill issue #1166: DRILL-6016 - Fix for Error reading INT96 created by Apach...

2018-03-14 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1166
  
@rajrahul, thanks for submitting the patch. It looks good. I guess we 
missed dictionary encoded int96 timestamps (even though timestamps with 
nanosecond precision) are the one thing that should never, ever, be dictionary 
encoded!

Just to make sure, I tried the use the sample file in DRILL-6016, but I 
could not even unzip it! Can you please check and see if the file is correct? 
WE can use that to create the unit test as well.



---


[GitHub] drill issue #1143: DRILL-1491: Support for JDK 8

2018-03-12 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1143
  
+1


---


[GitHub] drill issue #1145: DRILL-6187: Exception in RPC communication between DataCl...

2018-03-12 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1145
  
+1


---


[GitHub] drill issue #1153: DRILL-6044: Fixed shutdown button in Web UI when ssl,auth...

2018-03-12 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1153
  
+1


---


[GitHub] drill issue #1160: DRILL-6224: Publish current memory usage for Drillbits in...

2018-03-12 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1160
  
Looks wrong dunnit? Direct memory is (still?) showing as zero?


---


[GitHub] drill pull request #1143: DRILL-1491: Support for JDK 8

2018-03-08 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1143#discussion_r173093078
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/server/TestDrillbitResilience.java
 ---
@@ -650,6 +650,7 @@ public void dataArrived(final QueryDataBatch result, 
final ConnectionThrottle th
 
   @Test // DRILL-2383: Cancellation TC 3: cancel after all result set are 
produced but not all are fetched
   @Repeat(count = NUM_RUNS)
+  @Ignore
--- End diff --

Please log a JIRA and reference it in this ignore statement


---


[GitHub] drill pull request #1156: DRILL-6218: Update release profile to not generate...

2018-03-07 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/1156

DRILL-6218: Update release profile to not generate MD5 checksum

@arina-ielchiieva, @amansinha100, please review

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-6218

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1156.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1156


commit a9c4cfde7ae41a42d396178b29c451d21a2daca7
Author: Parth Chandra <parthc@...>
Date:   2018-03-07T09:53:42Z

DRILL-6218: Update release profile to not generate MD5 checksum




---


[GitHub] drill issue #1144: DRILL-6202: Deprecate usage of IndexOutOfBoundsException ...

2018-03-06 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1144
  
AFAIK, IndexOutOfBoundsException is thrown by Netty when a write to a 
DrillBuf goes out of bounds (see 
https://github.com/netty/netty/blob/netty-4.0.48.Final/buffer/src/main/java/io/netty/buffer/AbstractByteBuf.java#L1123
  and 
https://github.com/netty/netty/blob/netty-4.0.48.Final/buffer/src/main/java/io/netty/buffer/UnsafeByteBufUtil.java#L283
 ).
This is independent of the bounds check in Drill which may be disabled.



---


[GitHub] drill issue #1146: DRILL-6204: Pass tables columns without partition columns...

2018-03-03 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1146
  
+1


---


[GitHub] drill issue #1134: DRILL-6191 - Add acknowledgement sequence number and flag...

2018-02-28 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1134
  
+1


---


[GitHub] drill issue #1133: DRILL-6190 - Fix handling of packets longer than legally ...

2018-02-28 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1133
  
+1


---


[GitHub] drill issue #1132: DRILL-6188: Fix C++ client build on Centos7, OS X

2018-02-27 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1132
  
@pwong-mapr please review 


---


[GitHub] drill pull request #1132: DRILL-6188: Fix C++ client build on Centos7, OS X

2018-02-27 Thread parthchandra
GitHub user parthchandra opened a pull request:

https://github.com/apache/drill/pull/1132

DRILL-6188: Fix C++ client build on Centos7, OS X

Removed unused method that didn't compile. 
Included iostream for declaration of std::cout

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/parthchandra/drill DRILL-6188

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1132.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1132


commit bf1f90138502300649b0945fcfc2b1b109512b52
Author: Parth Chandra <parthc@...>
Date:   2018-02-27T12:01:51Z

DRILL-6188: Fix C++ client build on Centos7, OS X




---


[GitHub] drill issue #1124: DRILL-6172: setValueCount of VariableLengthVectors throws...

2018-02-20 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1124
  
+1



---


[GitHub] drill issue #1122: DRILL-6164: Heap memory leak during parquet scan and OOM

2018-02-15 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1122
  
+1. Very nice catch!


---


[GitHub] drill issue #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout(int)

2018-01-24 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1024
  
+1. Look like there are no more review comments to be addressed.


---


[GitHub] drill issue #1100: DRILL-6102: Fix ConcurrentModificationException in the Ba...

2018-01-24 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1100
  
+1

BTW, I think the synchronization is needed only for the historical log, but 
this is a debug print statement so it doesn't matter.


---


[GitHub] drill issue #1097: DRILL-6100: Intermittent failure while reading Parquet fi...

2018-01-24 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1097
  
+1. 


---


[GitHub] drill issue #1087: DRILL-6079: Attempt to fix memory leak in Parquet

2018-01-22 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1087
  
+1
Thanks Salim. 


---


[GitHub] drill issue #1086: DRILL-6076: Reduce the default memory from a total of 13G...

2018-01-12 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1086
  
The settings for running the unit tests are (from the pom file :
  -Xms512m -Xmx4096m 
  -XX:MaxPermSize=512M -XX:MaxDirectMemorySize=4096M
  -XX:+CMSClassUnloadingEnabled -ea

At the very least, the default settings for Drill and the unit tests should 
match.


---


[GitHub] drill issue #1076: DRILL-6036: Create sys.connections table

2018-01-04 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1076
  
+1 (binding)


---


[GitHub] drill pull request #1079: DRILL-6063: Set correct ThreadContext ClassLoader ...

2018-01-03 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1079#discussion_r159508398
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java ---
@@ -344,7 +344,18 @@ private boolean 
clientNeedsAuthExceptPlain(DrillProperties props) {
   mechanismName = factory.getSimpleName();
   logger.trace("Will try to authenticate to server using {} mechanism 
with encryption context {}",
   mechanismName, connection.getEncryptionCtxtString());
+
+  // Update the thread context class loader to current class loader
+  // See DRILL-6063 for detailed description
+  final ClassLoader oldThreadCtxtCL = 
Thread.currentThread().getContextClassLoader();
+  final ClassLoader newThreadCtxtCL = this.getClass().getClassLoader();
+  Thread.currentThread().setContextClassLoader(newThreadCtxtCL);
+
--- End diff --

makes sense


---


[GitHub] drill pull request #1079: DRILL-6063: Set correct ThreadContext ClassLoader ...

2018-01-03 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1079#discussion_r159487621
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/user/UserClient.java ---
@@ -344,7 +344,18 @@ private boolean 
clientNeedsAuthExceptPlain(DrillProperties props) {
   mechanismName = factory.getSimpleName();
   logger.trace("Will try to authenticate to server using {} mechanism 
with encryption context {}",
   mechanismName, connection.getEncryptionCtxtString());
+
+  // Update the thread context class loader to current class loader
+  // See DRILL-6063 for detailed description
+  final ClassLoader oldThreadCtxtCL = 
Thread.currentThread().getContextClassLoader();
+  final ClassLoader newThreadCtxtCL = this.getClass().getClassLoader();
+  Thread.currentThread().setContextClassLoader(newThreadCtxtCL);
+
--- End diff --

Is this the only place we need to update? For eaxmple, with SASL the 
authentication mechanism plugin would have to be in the class path when the 
authentication method is dtermined. But you have already restored the thread 
context. (Or does the oldThreadContextCL have the SASL jars in the classpath?)


---


[GitHub] drill issue #1067: DRILL-3958: Return a valid error message when storage plu...

2017-12-29 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1067
  
+1


---


[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...

2017-12-28 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1049#discussion_r158985937
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/ColumnReaderFactory.java
 ---
@@ -138,6 +138,8 @@
 return new 
ParquetFixedWidthDictionaryReaders.DictionaryBigIntReader(recordReader, 
allocateSize, descriptor, columnChunkMetaData, fixedLength, (BigIntVector) v, 
schemaElement);
--- End diff --

Hmm. Looks like I missed INT_32 here. Added it but I cannot seem to force 
the test generator to generate dictionary encoded values for the logical int 
types. I'll make a note to enhance it later. Note that the DATE type is 
(apparently) handled by the code on line 101.


---


[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...

2017-12-28 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1049#discussion_r158986798
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetComplex.java
 ---
@@ -193,4 +200,168 @@ public void notxistsField() throws Exception {
 .run();
   }
 
+  @Test //DRILL-5971
+  public void testComplexLogicalIntTypes() throws Exception {
+String query = String.format("select t.complextype as complextype,  " +
+"t.uint_64 as uint_64, t.uint_32 as uint_32, t.uint_16 as 
uint_16, t.uint_8 as uint_8,  " +
+"t.int_64 as int_64, t.int_32 as int_32, t.int_16 as int_16, 
t.int_8 as int_8  " +
+"from cp.`store/parquet/complex/logical_int_complex.parquet` 
t" );
+String[] columns = {"complextype", "uint_64", "uint_32", "uint_16", 
"uint_8", "int_64", "int_32", "int_16", "int_8" };
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns(columns)
+.baselineValues( mapOf("a","a","b","b")  , 0L   , 
0   , 0, 0   , 0L, 0, 0 
  ,0   )
+.baselineValues( mapOf("a","a","b","b")  , -1L  , 
-1  , -1   , -1  , -1L   , -1   , -1
  , -1 )
+.baselineValues( mapOf("a","a","b","b")  , 1L   , 
1   , 1, 1   , -9223372036854775808L , 1, 1 
  , 1  )
+.baselineValues( mapOf("a","a","b","b")  , 9223372036854775807L , 
2147483647  , 65535, 255 , 9223372036854775807L  , -2147483648  , 
-32768  , -128   )
+.build()
+.run();
+  }
+
+  @Test //DRILL-5971
+  public void testComplexLogicalIntTypes2() throws Exception {
+byte[] bytes12 = {'1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 
'a', 'b' };
+byte[] bytesOnes = new byte[12]; Arrays.fill(bytesOnes, (byte)1);
+byte[] bytesZeros = new byte[12];
+String query = String.format(
+" select " +
+" t.rowKey as rowKey, " +
+" t.StringTypes._UTF8 as _UTF8, " +
+" t.StringTypes._Enum as _Enum, " +
+" t.NumericTypes.Int32._INT32_RAW as _INT32_RAW, " +
+" t.NumericTypes.Int32._INT_8 as _INT_8, " +
+" t.NumericTypes.Int32._INT_16 as _INT_16, " +
+" t.NumericTypes.Int32._INT_32 as _INT_32, " +
+" t.NumericTypes.Int32._UINT_8 as _UINT_8, " +
+" t.NumericTypes.Int32._UINT_16 as _UINT_16, " +
+" t.NumericTypes.Int32._UINT_32 as _UINT_32, " +
+" t.NumericTypes.Int64._INT64_RAW as _INT64_RAW, " +
+" t.NumericTypes.Int64._INT_64 as _INT_64, " +
+" t.NumericTypes.Int64._UINT_64 as _UINT_64, " +
+" t.NumericTypes.DateTimeTypes._DATE_int32 as _DATE_int32, " +
+" t.NumericTypes.DateTimeTypes._TIME_MILLIS_int32 as 
_TIME_MILLIS_int32, " +
+" t.NumericTypes.DateTimeTypes._TIMESTAMP_MILLIS_int64 as 
_TIMESTAMP_MILLIS_int64, " +
+" t.NumericTypes.DateTimeTypes._INTERVAL_fixed_len_byte_array_12 
as _INTERVAL_fixed_len_byte_array_12, " +
+" t.NumericTypes.Int96._INT96_RAW as _INT96_RAW " +
+" from " +
+" cp.`store/parquet/complex/parquet_logical_types_complex.parquet` 
t " +
+" order by t.rowKey "
+);
+String[] columns = {
+"rowKey " ,
+"_UTF8" ,
+"_Enum" ,
+"_INT32_RAW" ,
+"_INT_8" ,
+"_INT_16" ,
+"_INT_32" ,
+"_UINT_8" ,
+"_UINT_16" ,
+"_UINT_32" ,
+"_INT64_RAW" ,
+"_INT_64" ,
+"_UINT_64" ,
+"_DATE_int32" ,
+"_TIME_MILLIS_int32" ,
+"_TIMESTAMP_MILLIS_int64" ,
+"_INTERVAL_fixed_len_byte_array_12" ,
+"_INT96_RAW"
+
+};
+testBuilder()
+.sqlQuery(query)
+.ordered()
--- End diff --

In these tests the result of the query is always ordered since there is 
only one fragment. Ordered tests run sightly faster and if/when the test fails, 
ordered tests are _much_ easier to debug. 


---


[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...

2017-12-28 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1049#discussion_r158986526
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetComplex.java
 ---
@@ -193,4 +200,168 @@ public void notxistsField() throws Exception {
 .run();
   }
 
+  @Test //DRILL-5971
+  public void testComplexLogicalIntTypes() throws Exception {
+String query = String.format("select t.complextype as complextype,  " +
+"t.uint_64 as uint_64, t.uint_32 as uint_32, t.uint_16 as 
uint_16, t.uint_8 as uint_8,  " +
+"t.int_64 as int_64, t.int_32 as int_32, t.int_16 as int_16, 
t.int_8 as int_8  " +
+"from cp.`store/parquet/complex/logical_int_complex.parquet` 
t" );
+String[] columns = {"complextype", "uint_64", "uint_32", "uint_16", 
"uint_8", "int_64", "int_32", "int_16", "int_8" };
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns(columns)
+.baselineValues( mapOf("a","a","b","b")  , 0L   , 
0   , 0, 0   , 0L, 0, 0 
  ,0   )
+.baselineValues( mapOf("a","a","b","b")  , -1L  , 
-1  , -1   , -1  , -1L   , -1   , -1
  , -1 )
+.baselineValues( mapOf("a","a","b","b")  , 1L   , 
1   , 1, 1   , -9223372036854775808L , 1, 1 
  , 1  )
+.baselineValues( mapOf("a","a","b","b")  , 9223372036854775807L , 
2147483647  , 65535, 255 , 9223372036854775807L  , -2147483648  , 
-32768  , -128   )
+.build()
+.run();
+  }
+
+  @Test //DRILL-5971
+  public void testComplexLogicalIntTypes2() throws Exception {
+byte[] bytes12 = {'1', '2', '3', '4', '5', '6', '7', '8', '9', '0', 
'a', 'b' };
+byte[] bytesOnes = new byte[12]; Arrays.fill(bytesOnes, (byte)1);
--- End diff --

Can't see how that makes things more readable. This way, the declaration 
and initialization is together and all the declarations are clearly readable. 
Changed it anyway since you asked :)


---


[GitHub] drill pull request #1049: DRILL-5971: Fix INT64, INT32 logical types in comp...

2017-12-28 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1049#discussion_r159003539
  
--- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
@@ -394,9 +394,11 @@ public void get(int index, 
Nullable${minor.class}Holder holder){
   final int offsetIndex = index * VALUE_WIDTH;
   final int months  = data.getInt(offsetIndex);
   final int days= data.getInt(offsetIndex + ${minor.daysOffset});
-  final int millis = data.getInt(offsetIndex + 
${minor.millisecondsOffset});
+  int millis = data.getInt(offsetIndex + 8);
--- End diff --

Hmm. The only reason for this change was to overcome a Joda problem that 
caused the unit tests to fail. I fixed the unit tests and undid the change. 


---


[GitHub] drill pull request #1060: DRILL-5846: Improve parquet performance for Flat D...

2017-12-22 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1060#discussion_r158549859
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VLAbstractEntryReader.java
 ---
@@ -0,0 +1,215 @@

+/***
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ 
**/
+package org.apache.drill.exec.store.parquet.columnreaders;
+
+import java.nio.ByteBuffer;
+
+import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.ColumnPrecisionInfo;
+import 
org.apache.drill.exec.store.parquet.columnreaders.VLColumnBulkInput.PageDataInfo;
+import org.apache.drill.exec.util.MemoryUtils;
+
+/** Abstract class for sub-classes implementing several algorithms for 
loading a Bulk Entry */
+abstract class VLAbstractEntryReader {
+
+  /** byte buffer used for buffering page data */
+  protected final ByteBuffer buffer;
+  /** Page Data Information */
+  protected final PageDataInfo pageInfo;
+  /** expected precision type: fixed or variable length */
+  protected final ColumnPrecisionInfo columnPrecInfo;
+  /** Bulk entry */
+  protected final VLColumnBulkEntry entry;
+
+  /**
+   * CTOR.
+   * @param _buffer byte buffer for data buffering (within CPU cache)
+   * @param _pageInfo page being processed information
+   * @param _columnPrecInfo column precision information
+   * @param _entry reusable bulk entry object
+   */
+  VLAbstractEntryReader(ByteBuffer _buffer,
+PageDataInfo _pageInfo,
+ColumnPrecisionInfo _columnPrecInfo,
+VLColumnBulkEntry _entry) {
+
+this.buffer = _buffer;
+this.pageInfo   = _pageInfo;
+this.columnPrecInfo = _columnPrecInfo;
+this.entry  = _entry;
+  }
+
+  /**
+   * @param valuesToRead maximum values to read within the current page
+   * @return a bulk entry object
+   */
+  abstract VLColumnBulkEntry getEntry(int valsToReadWithinPage);
+
+  /**
+   * Indicates whether to use bulk processing
+   */
+  protected final boolean bulkProcess() {
+return columnPrecInfo.bulkProcess;
+  }
+
+  /**
+   * Loads new data into the buffer if empty or the force flag is set.
+   *
+   * @param force flag to force loading new data into the buffer
+   */
+  protected final boolean load(boolean force) {
+
+if (!force && buffer.hasRemaining()) {
+  return true; // NOOP
+}
+
+// We're here either because the buffer is empty or we want to force a 
new load operation.
+// In the case of force, there might be unprocessed data (still in the 
buffer) which is fine
+// since the caller updates the page data buffer's offset only for the 
data it has consumed; this
+// means unread data will be loaded again but this time will be 
positioned in the beginning of the
+// buffer. This can happen only for the last entry in the buffer when 
either of its length or value
+// is incomplete.
+buffer.clear();
+
+int remaining = remainingPageData();
+int toCopy= remaining > buffer.capacity() ? buffer.capacity() : 
remaining;
+
+if (toCopy == 0) {
+  return false;
+}
+
+pageInfo.pageData.getBytes(pageInfo.pageDataOff, buffer.array(), 
buffer.position(), toCopy);
--- End diff --

So seriously, this is faster? I would have expected the copy from direct to 
java heap memory to be a big issue. There are HDFS APIs to read into ByteBuffer 
(not DirectByteBuffer) that we could leverage and reduce the memory copy across 
 direct memory and Java heap memory.  


---


[GitHub] drill issue #1076: DRILL-6036: Create sys.connections table

2017-12-20 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1076
  
+1. Nice work.


---


[GitHub] drill issue #1049: DRILL-5971: Fix INT64, INT32 logical types in complex par...

2017-12-18 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1049
  
@vvysotskyi please review - 
added fixes for date time and interval types, fixed the reading of 
dictionary encoded fixed width array types (including int96),  fixed object 
representation of interval types (used by unit tests and sqlline/jdbc), and  
added a new test file generator that generates data for all (primitive) logical 
types in either flat or hierarchical schemas. Added those files as unit tests 
for the logical types.


---


[GitHub] drill issue #1055: DRILL-5973 : Support injection of time-bound pauses in se...

2017-12-13 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1055
  
+1. 
No need to squash/rebase. 


---


[GitHub] drill pull request #1062: DRILL-6007: Use default refresh timeout (10 second...

2017-12-08 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1062#discussion_r155875821
  
--- Diff: exec/java-exec/src/main/resources/rest/index.ftl ---
@@ -185,11 +185,10 @@
   
   

[GitHub] drill issue #1063: DRILL-5702: Jdbc Driver Class not found

2017-12-08 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1063
  
+1 LGTM 


---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-12-06 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r155402523
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillStatementImpl.java ---
@@ -156,29 +156,19 @@ public void cleanUp() {
   }
 
   @Override
-  public int getQueryTimeout() throws AlreadyClosedSqlException
+  public int getQueryTimeout() throws AlreadyClosedSqlException, 
SQLException
   {
 throwIfClosed();
-return 0;  // (No no timeout.)
+return super.getQueryTimeout();
   }
 
   @Override
-  public void setQueryTimeout( int milliseconds )
+  public void setQueryTimeout( int seconds )
   throws AlreadyClosedSqlException,
  InvalidParameterSqlException,
- SQLFeatureNotSupportedException {
+ SQLException {
--- End diff --

the parent setQueryTimeout will throw a SQLException if the parameter is 
invalid, so this method now no longer throws an InvalidParameterSqlException


---


[GitHub] drill pull request #1024: DRILL-3640: Support JDBC Statement.setQueryTimeout...

2017-12-06 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1024#discussion_r155409333
  
--- Diff: 
exec/jdbc/src/main/java/org/apache/drill/jdbc/impl/DrillResultSetImpl.java ---
@@ -96,6 +105,14 @@ private void throwIfClosed() throws 
AlreadyClosedSqlException,
 throw new AlreadyClosedSqlException( "ResultSet is already 
closed." );
   }
 }
+
+//Implicit check for whether timeout is set
+if (elapsedTimer != null) {
--- End diff --

Do you want to try statement.cancel() to release the memory ?


---


[GitHub] drill issue #1055: DRILL-5973 : Support injection of time-bound pauses in se...

2017-12-06 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1055
  
This PR seems to be failing a bunch of unit that don't seem related. do you 
need to rebase ?


---


[GitHub] drill pull request #1041: DRILL-5961: For long running queries (> 10 min) Dr...

2017-12-06 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1041#discussion_r155397359
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/rpc/control/WorkEventBus.java
 ---
@@ -74,83 +70,42 @@ public void statusUpdate(final FragmentStatus status) {
 
   public void addFragmentManager(final FragmentManager fragmentManager) {
 if (logger.isDebugEnabled()) {
-  logger.debug("Manager created: {}", 
QueryIdHelper.getQueryIdentifier(fragmentManager.getHandle()));
+  logger.debug("Fragment {} manager created: {}", 
QueryIdHelper.getQueryIdentifier(fragmentManager.getHandle()), fragmentManager);
 }
 final FragmentManager old = 
managers.putIfAbsent(fragmentManager.getHandle(), fragmentManager);
-  if (old != null) {
-throw new IllegalStateException(
-"Tried to set fragment manager when has already been set for 
the provided fragment handle.");
-}
-  }
-
-  public FragmentManager getFragmentManagerIfExists(final FragmentHandle 
handle) {
-synchronized (this) {
-  return managers.get(handle);
+if (old != null) {
+  throw new IllegalStateException(
+  String.format("Manager {} for fragment {} already exists.", old, 
QueryIdHelper.getQueryIdentifier(fragmentManager.getHandle(;
 }
   }
 
-  public FragmentManager getFragmentManager(final FragmentHandle handle) 
throws FragmentSetupException {
-synchronized (this) {
-  // Check if this was a recently finished (completed or cancelled) 
fragment.  If so, throw away message.
-  if (recentlyFinishedFragments.asMap().containsKey(handle)) {
-if (logger.isDebugEnabled()) {
-  logger.debug("Fragment: {} was cancelled. Ignoring fragment 
handle", handle);
-}
-return null;
-  }
-
-  // since non-leaf fragments are sent first, it is an error condition 
if the manager is unavailable.
-  final FragmentManager m = managers.get(handle);
-  if (m != null) {
-return m;
-  }
-}
-throw new FragmentSetupException("Failed to receive plan fragment that 
was required for id: "
-+ QueryIdHelper.getQueryIdentifier(handle));
+  public FragmentManager getFragmentManager(final FragmentHandle handle) {
+return managers.get(handle);
   }
 
   /**
-   * Removes fragment manager (for the corresponding the handle) from the 
work event bus. This method can be called
-   * multiple times. The manager will be removed only once (the first 
call).
-   * @param handle the handle to the fragment
-   */
-  public void removeFragmentManager(final FragmentHandle handle) {
-if (logger.isDebugEnabled()) {
-  logger.debug("Removing fragment manager: {}", 
QueryIdHelper.getQueryIdentifier(handle));
-}
-
-synchronized (this) {
-  final FragmentManager manager = managers.get(handle);
-  if (manager != null) {
-recentlyFinishedFragments.put(handle, 1);
-managers.remove(handle);
-  } else {
-logger.warn("Fragment {} not found in the work bus.", 
QueryIdHelper.getQueryIdentifier(handle));
-  }
-}
-  }
-
-  /**
-   * Cancels and removes fragment manager (for the corresponding the 
handle) from the work event bus, Currently, used
-   * for fragments waiting on data (root and intermediate).
+   * Optionally cancels and removes fragment manager (for the 
corresponding the handle) from the work event bus. Currently, used
+   * for fragments waiting on data (root and intermediate). This method 
can be called multiple times. The manager will be removed
+   * only once (the first call).
* @param handle the handle to the fragment
+   * @param cancel
* @return if the fragment was found and removed from the event bus
*/
-  public boolean cancelAndRemoveFragmentManagerIfExists(final 
FragmentHandle handle) {
-if (logger.isDebugEnabled()) {
-  logger.debug("Cancelling and removing fragment manager: {}", 
QueryIdHelper.getQueryIdentifier(handle));
-}
-
-synchronized (this) {
-  final FragmentManager manager = managers.get(handle);
-  if (manager == null) {
-return false;
+  public boolean removeFragmentManager(final FragmentHandle handle, final 
boolean cancel) {
+final FragmentManager manager = managers.remove(handle);
+if (manager != null) {
+  assert !manager.isCancelled() : String.format("Fragment {} manager 
{} is already cancelled.", QueryIdHelper.getQueryIdentifier(handle), manager);
+  if (cancel) {
+manager.can

[GitHub] drill pull request #1041: DRILL-5961: For long running queries (> 10 min) Dr...

2017-12-06 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1041#discussion_r155396889
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/work/WorkManager.java ---
@@ -277,7 +277,9 @@ public void startFragmentPendingRemote(final 
FragmentManager fragmentManager) {
 @Override
 protected void cleanup() {
   runningFragments.remove(fragmentHandle);
-  workBus.removeFragmentManager(fragmentHandle);
+  if (!fragmentManager.isCancelled()) {
+workBus.removeFragmentManager(fragmentHandle, false);
--- End diff --

Not sure why you don't want to cancel here. 


---


[GitHub] drill pull request #1055: DRILL-5973 : Support injection of time-bound pause...

2017-12-06 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1055#discussion_r155326676
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/testing/CountDownLatchInjectionImpl.java
 ---
@@ -41,8 +41,11 @@
   private CountDownLatchInjectionImpl(@JsonProperty("address") final 
String address,
   @JsonProperty("port") final int port,
   @JsonProperty("siteClass") final 
String siteClass,
-  @JsonProperty("desc") final String 
desc) throws InjectionConfigurationException {
-super(address, port, siteClass, desc, 0, 1);
+  @JsonProperty("desc") final String 
desc,
+  @JsonProperty("nSkip") final int 
nSkip,
+  @JsonProperty("nFire") final int 
nFire,
+  @JsonProperty("msPause") final Long 
msPause) throws InjectionConfigurationException {
--- End diff --

long instead of Long? You will save a lot of unnecessary checking for null 


---


[GitHub] drill issue #1050: DRILL-5964: Do not allow queries to access paths outside ...

2017-11-28 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1050
  
Changed the type of allowAccessOutsideWorkspace to boolean. This changes 
behavior. All existing dfs workspaces will also disallow access outside the 
workspace. Ideally, we should deprecate this parameter in a future release, but 
for the moment it allows existing users to access paths outside the workspace 
if they want to.


---


[GitHub] drill issue #1049: DRILL-5971: Fix INT64, INT32 logical types in complex par...

2017-11-28 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1049
  
Updated the PR to include all the remaining logical int types. While 
generating the test file for the unit test, found that some of the types were 
missing from the fast parquet reader so added those as well.


---


[GitHub] drill issue #1038: DRILL-5972: Slow performance for query on INFORMATION_SCH...

2017-11-28 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/1038
  
+1. LGTM


---


[GitHub] drill pull request #1038: DRILL-5972: Slow performance for query on INFORMAT...

2017-11-28 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1038#discussion_r153658073
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/ischema/InfoSchemaFilter.java
 ---
@@ -202,14 +202,19 @@ private Result evaluateHelperFunction(Map<String, 
String> recordValues, Function
 // If at least one arg returns FALSE, then the AND function value 
is FALSE
 // If at least one arg returns INCONCLUSIVE, then the AND function 
value is INCONCLUSIVE
 // If all args return TRUE, then the AND function value is TRUE
+Result result = Result.TRUE;
+
 for(ExprNode arg : exprNode.args) {
   Result exprResult = evaluateHelper(recordValues, arg);
-  if (exprResult != Result.TRUE) {
+  if (exprResult == Result.FALSE) {
 return exprResult;
   }
+  if (exprResult == Result.INCONCLUSIVE) {
--- End diff --

Just to be clear. You want to return `Result.INCONCLUSIVE` if any one of 
the expressions is inconclusive  *and* there is no expression that is false.  
Correct ?


---


[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...

2017-11-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1050#discussion_r153388800
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceConfig.java
 ---
@@ -30,18 +30,24 @@
 public class WorkspaceConfig {
 
   /** Default workspace is a root directory which supports read, but not 
write. */
-  public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", 
false, null);
+  public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", 
false, null, false);
 
   private final String location;
   private final boolean writable;
   private final String defaultInputFormat;
-
+  private final Boolean allowAccessOutsideWorkspace; // allow access 
outside the workspace by default. This
--- End diff --

Yes it would, I believe. But we want the value to be `true` for backward 
compatibility. (This also addresses your next comment). So we need to know if 
the value is missing. Can only do that with a non primitive AFAIK.


---


[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...

2017-11-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1050#discussion_r152862636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -252,11 +252,15 @@ private static String buildPath(final String[] path, 
final int folderIndex) {
 return builder.toString();
   }
 
-  public static FileSelection create(final DrillFileSystem fs, final 
String parent, final String path) throws IOException {
+  public static FileSelection create(final DrillFileSystem fs, final 
String parent, final String path,
+  final boolean allowAccessOutsideWorkspace) throws IOException {
 Stopwatch timer = Stopwatch.createStarted();
 boolean hasWildcard = path.contains(WILD_CARD);
 
 final Path combined = new Path(parent, removeLeadingSlash(path));
+if (!allowAccessOutsideWorkspace) {
+  checkBackPaths(parent, combined.toString(), path);
--- End diff --

Done


---


[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...

2017-11-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1050#discussion_r152862954
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceConfig.java
 ---
@@ -30,18 +30,24 @@
 public class WorkspaceConfig {
 
   /** Default workspace is a root directory which supports read, but not 
write. */
-  public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", 
false, null);
+  public static final WorkspaceConfig DEFAULT = new WorkspaceConfig("/", 
false, null, false);
 
   private final String location;
   private final boolean writable;
   private final String defaultInputFormat;
-
+  private final Boolean allowAccessOutsideWorkspace; // allow access 
outside the workspace by default. This
--- End diff --

I need to check if the value is not present (i.e. null). That will be the 
case with all storage plugin configurations that have already been created.


---


[GitHub] drill pull request #1050: DRILL-5964: Do not allow queries to access paths o...

2017-11-27 Thread parthchandra
Github user parthchandra commented on a diff in the pull request:

https://github.com/apache/drill/pull/1050#discussion_r152862722
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java 
---
@@ -359,15 +363,30 @@ private static Path handleWildCard(final String root) 
{
 }
   }
 
-  private static String removeLeadingSlash(String path) {
-if (path.charAt(0) == '/') {
+  public static String removeLeadingSlash(String path) {
+if (!path.isEmpty() && path.charAt(0) == '/') {
   String newPath = path.substring(1);
   return removeLeadingSlash(newPath);
 } else {
   return path;
 }
   }
 
+  // Check if the path is a valid sub path under the parent after removing 
backpaths. Throw an exception if
+  // it is not
+  // We pass subpath in as a parameter only for the error message
+  public static boolean checkBackPaths(String parent, String combinedPath, 
String subpath) {
+Preconditions.checkArgument(!parent.isEmpty());
+Preconditions.checkArgument(!combinedPath.isEmpty());
--- End diff --

Done


---


  1   2   3   4   5   6   >