[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984771#comment-14984771
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43598040
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
 ---
@@ -93,13 +93,15 @@ public void ensureAtLeastOneField(ComplexWriter writer) 
{
 if (!atLeastOneWrite) {
   // if we had no columns, create one empty one so we can return some 
data for count purposes.
   SchemaPath sp = columns.get(0);
-  PathSegment root = sp.getRootSegment();
+  PathSegment fieldPath = sp.getRootSegment();
   BaseWriter.MapWriter fieldWriter = writer.rootAsMap();
-  while (root.getChild() != null && !root.getChild().isArray()) {
-fieldWriter = fieldWriter.map(root.getNameSegment().getPath());
-root = root.getChild();
+  while (fieldPath.getChild() != null && ! 
fieldPath.getChild().isArray()) {
+fieldWriter = 
fieldWriter.map(fieldPath.getNameSegment().getPath());
+fieldPath = fieldPath.getChild();
+  }
+  if (fieldWriter.isEmptyMap()) {
--- End diff --

can you explain this change?


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984767#comment-14984767
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597968
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java
 ---
@@ -189,8 +189,8 @@ protected void putVector(String name, ValueVector 
vector) {
 Preconditions.checkNotNull(vector, "vector cannot be null")
 );
 if (old != null && old != vector) {
-  logger.debug("Field [%s] mutated from [%s] to [%s]", name, 
old.getClass().getSimpleName(),
-  vector.getClass().getSimpleName());
+  logger.debug("Field [{}] mutated from [{}] to [{}]", name, 
old.getClass().getSimpleName(),
--- End diff --

I think there is a limit to the number of bracket replacements on the 
logger interface. Is three allowed?


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984769#comment-14984769
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597994
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java
 ---
@@ -309,7 +309,14 @@ public Object getObject(int index) {
   Map vv = new JsonStringHashMap<>();
   for (String child:getChildFieldNames()) {
 ValueVector v = getChild(child);
-if (v != null) {
+// TODO(DRILL-4001):  Resolve this hack:
--- End diff --

Can you add what happens if this hack isn't done? What is the error?


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3983) Small test improvements

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984768#comment-14984768
 ] 

ASF GitHub Bot commented on DRILL-3983:
---

Github user julienledem commented on the pull request:

https://github.com/apache/drill/pull/221#issuecomment-152918240
  
Thanks!


> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984764#comment-14984764
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597916
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java ---
@@ -23,60 +23,214 @@
 import org.apache.drill.exec.record.selection.SelectionVector4;
 
 /**
- * A record batch contains a set of field values for a particular range of 
records. In the case of a record batch
- * composed of ValueVectors, ideally a batch fits within L2 cache (~256k 
per core). The set of value vectors do not
- * change unless the next() IterOutcome is a *_NEW_SCHEMA type.
- *
- * A key thing to know is that the Iterator provided by record batch must 
align with the rank positions of the field ids
- * provided utilizing getValueVectorId();
+ * A record batch contains a set of field values for a particular range of
--- End diff --

Nice documentation on this code.


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984762#comment-14984762
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597810
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java
 ---
@@ -335,7 +343,8 @@ public IterOutcome nextBatch() throws 
SchemaChangeException {
 return iterRight;
 
   default:
-throw new IllegalStateException(String.format("Unknown state 
%s.", iterRight));
+throw new IllegalStateException(
--- End diff --

Why did you change the formatting here?


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984760#comment-14984760
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597780
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java
 ---
@@ -122,6 +122,7 @@ protected void killIncoming(final boolean sendUpstream) 
{
 
   @Override
   public IterOutcome innerNext() {
+recordCount = 0;
--- End diff --

Why don't you add this to parent class rather than each child of the 
parent? (Either AbstractRecordBatch or AbstractSingleRecordBatch.)


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984756#comment-14984756
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597741
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java
 ---
@@ -325,10 +325,13 @@ public AggOutcome doWork() {
 if (EXTRA_DEBUG_1) {
   logger.debug("Received new schema.  Batch has {} 
records.", incoming.getRecordCount());
 }
-newSchema = true;
-this.cleanup();
-// TODO: new schema case needs to be handled appropriately
-return AggOutcome.UPDATE_AGGREGATOR;
+final BatchSchema newIncomingSchema = incoming.getSchema();
+if ((! newIncomingSchema.equals(schema)) && schema != 
null) {
--- End diff --

You are quelling a new schema here without correctly reloading vector 
references. A schema change means: the schema has changed and/or the vectors 
have changed. Your check here isn't enough to guarantee correct behavior. It is 
possible that the schema is equal but the vectors changed.


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984752#comment-14984752
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/228#discussion_r43597587
  
--- Diff: 
contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java
 ---
@@ -142,10 +148,12 @@ public void setup(OperatorContext context, 
OutputMutator output) throws Executio
   getOrCreateFamilyVector(column.getRootSegment().getPath(), 
false);
 }
   }
-  logger.debug("Opening scanner for HBase table '{}', Zookeeper quorum 
'{}', port '{}', znode '{}'.",
-  hbaseTableName, hbaseConf.get(HConstants.ZOOKEEPER_QUORUM),
-  hbaseConf.get(HBASE_ZOOKEEPER_PORT), 
hbaseConf.get(HConstants.ZOOKEEPER_ZNODE_PARENT));
-  hTable = new HTable(hbaseConf, hbaseTableName);
+  // Add vector for any column families not mentioned yet (in order to 
avoid
+  // creation of dummy NullableIntVectors for them).
+  for (HColumnDescriptor columnFamily :
--- End diff --

This also needs to be done for any requested column vectors.


> ScanBatch violates IterOutcome protocol for zero-row sources [was: missing 
> JDBC metadata (schema) for 0-row results...]
> ---
>
> Key: DRILL-2288
> URL: https://issues.apache.org/jira/browse/DRILL-2288
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Information Schema
>Reporter: Daniel Barclay (Drill)
>Assignee: Daniel Barclay (Drill)
> Fix For: 1.3.0
>
> Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java
>
>
> The ResultSetMetaData object from getMetadata() of a ResultSet is not set up 
> (getColumnCount() returns zero, and trying to access any other metadata 
> throws IndexOutOfBoundsException) for a result set with zero rows, at least 
> for one from DatabaseMetaData.getColumns(...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4004) Fix bugs in JDK8 Tests before updating enforcer to JDK8

2015-11-01 Thread Jacques Nadeau (JIRA)
Jacques Nadeau created DRILL-4004:
-

 Summary: Fix bugs in JDK8 Tests before updating enforcer to JDK8
 Key: DRILL-4004
 URL: https://issues.apache.org/jira/browse/DRILL-4004
 Project: Apache Drill
  Issue Type: Bug
Reporter: Jacques Nadeau


The following tests fail on JDK8
{code}
org.apache.drill.exec.store.mongo.TestMongoFilterPushDown.testFilterPushDownIsEqual
org.apache.drill.exec.store.mongo.TestMongoFilterPushDown.testFilterPushDownGreaterThanWithSingleField
org.apache.drill.exec.store.mongo.TestMongoFilterPushDown.testFilterPushDownLessThanWithSingleField
org.apache.drill.TestFrameworkTest.testRepeatedColumnMatching
org.apache.drill.TestFrameworkTest.testCSVVerificationOfOrder_checkFailure
org.apache.drill.exec.physical.impl.flatten.TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject
org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation
org.apache.drill.exec.record.vector.TestValueVector.testVariableVectorReallocation
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984731#comment-14984731
 ] 

ASF GitHub Bot commented on DRILL-2288:
---

GitHub user dsbos opened a pull request:

https://github.com/apache/drill/pull/228

DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream 
chain of bugs



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dsbos/incubator-drill bugs/drill-2288_etc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/228.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #228


commit faefc960f1a36cf25fa0b553bab75e1c3fc71222
Author: dbarclay 
Date:   2015-10-28T02:25:25Z

2288:  Pt. 1 Core:  Added unit test.  
[Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json]

commit 396a41b155d1ad9413897ae4d87db2337642a1ea
Author: dbarclay 
Date:   2015-11-01T03:36:12Z

2288:  Pt. 1 Core:  Changed HBase test table #1's # of regions from 1 to 2. 
 [HBaseTestsSuite]

Also added TODO(DRILL-3954) comment about # of regions.

commit a0fe6b0787c284cda2006355e507c7caa6cae2a7
Author: dbarclay 
Date:   2015-10-28T02:35:11Z

2288:  Pt. 2 Core:  Documented IterOutcome much more clearly.  [RecordBatch]

Also edited some related Javadoc.

commit 24b3f4df90711b630b30fe4e2ad68ff798e5731a
Author: dbarclay 
Date:   2015-10-28T02:41:04Z

2288:  Pt. 2 Hyg.:  Edited doc., added @Override, etc.  
[AbstractRecordBatch, RecordBatch]

Purged unused SetupOutcome.
Added @Override.
Edited comments.
Fix some comments to doc. comments.

commit a3108ab18e5f99e9d18f57786b2efa717a61c432
Author: dbarclay 
Date:   2015-10-28T03:00:26Z

2288:  Pt. 3 Core&Hyg.:  Added validation of IterOutcome sequence.  
[IteratorValidatorBatchIterator]

Also:
Renamed internal members for clarity.
Added comments.

commit c8fc4b3f5c3df5871a9b07b8f4ae800ddbe0ce64
Author: dbarclay 
Date:   2015-10-28T03:31:14Z

2288:  Pt. 4 Core:  Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next().  
[ScanBatch]

(With nearby comments.)

commit fcb1438724df83865a83749b097ca19d21cec444
Author: dbarclay 
Date:   2015-10-28T03:56:33Z

2288:  Pt. 4 Hyg.:  Edited comments, reordered, whitespace.  [ScanBatch]

Reordered
Added comments.
Aligned.

commit 9520345aae524d246876a744a8a676e00b0294dc
Author: dbarclay 
Date:   2015-10-28T04:02:25Z

2288:  Pt. 4 Core+:  Fixed UnionAllRecordBatch to receive IterOutcome 
sequence right.  (3659)  [UnionAllRecordBatch]

commit 3acf8ec8927974901d825859c8f00d1382aa2d87
Author: dbarclay 
Date:   2015-10-28T04:05:01Z

2288:  Pt. 5 Core:  Fixed ScanBatch.Mutator.isNewSchema() to stop spurious 
"new schema" reports (fix short-circuit OR, to call resetting method right).  
[ScanBatch]

commit 59ede9bda0e73c0bb6840018781f1f404d73f85c
Author: dbarclay 
Date:   2015-10-28T04:11:55Z

2288:  Pt. 5 Hyg.:  Renamed, edited comments, reordered.  [ScanBatch, 
SchemaChangeCallBack, AbstractSingleRecordBatch]

Renamed getSchemaChange -> getSchemaChangedAndReset.
Renamed schemaChange -> schemaChanged.
Added doc. comments.
Aligned.

commit d93dc1ee518db409e72b8672679640be65ae7a62
Author: dbarclay 
Date:   2015-10-28T04:20:47Z

2288:  Pt. 6 Core:  Avoided dummy Null.IntVec. column in JsonReader when 
not needed (MapWriter.isEmptyMap()).  [JsonReader, 3 vector files]

commit f989229ac5510273698af19d87b50044efed81c5
Author: dbarclay 
Date:   2015-10-28T04:32:44Z

2288:  Pt. 6 Hyg.:  Edited comments, message.  Fixed message formatting.  
[RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, 
JsonReader]

Fixed message formatting.
Edited comments.
Edited message.
Fixed spurious line break.

commit a360e9d0d7bfac0aa75831c35585f8ea890858dc
Author: dbarclay 
Date:   2015-10-28T05:06:13Z

2288:  Pt. 7 Core:  Added column families in HBaseRecordReader* to avoid 
dummy Null.IntVec. clash.  [HBaseRecordReader]

commit 18467a9ce14f6328b4766541267183927ebcdc9c
Author: dbarclay 
Date:   2015-10-28T05:06:52Z

2288:  Pt. 8 Core.1:  Cleared recordCount in 
OrderedPartitionRecordBatch.innerNext().  [OrderedPartitionRecordBatch]

commit 73bf71fc4af8e54599e908abbb993a64b066c097
Author: dbarclay 
Date:   2015-10-28T05:07:14Z

2288:  Pt. 8 Core.2:  Cleared recordCount in ProjectRecordBatch.innerNext.  
[ProjectRecordBatch]

commit 064187d1c23f2b4fc09e94066d66e79ef961c4f1
Author: dbarclay 
Date:   2015-10-28T05:08:22Z

2288:  Pt. 8 Core.3:  Cleared recordCount in TopNBatch.innerNext.  
[TopNBatch]

commit 8b9d1657ee22cee4432074f778e5d10e2c06e8e8
Author: dbarclay 
Date:   2015-10-28T05:24:35Z

2288:  Pt. 9 Core:  Had UnorderedRec

[jira] [Created] (DRILL-4003) Tests expecting Drill OversizedAllocationException yield OutOfMemoryError

2015-11-01 Thread Daniel Barclay (Drill) (JIRA)
Daniel Barclay (Drill) created DRILL-4003:
-

 Summary: Tests expecting Drill OversizedAllocationException yield 
OutOfMemoryError
 Key: DRILL-4003
 URL: https://issues.apache.org/jira/browse/DRILL-4003
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Tools, Build & Test
Reporter: Daniel Barclay (Drill)


Tests that expect Drill's {{OversizedAllocationException}} (for example, 
{{TestValueVector.testFixedVectorReallocation()}}) sometimes fail with an 
{{OutOfMemoryError}} instead.

(Do the tests check whether there's enough memory available for the test before 
proceeding?)





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3983) Small test improvements

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984674#comment-14984674
 ] 

ASF GitHub Bot commented on DRILL-3983:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/221#issuecomment-152898335
  
For reference, final tally shows that full test output went from 5.9M to 
2.4M with this change. Nice improvement.


> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3983) Small test improvements

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3983:
--
Fix Version/s: 1.3.0

> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3956) TEXT MySQL type unsupported

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3956:
--
Fix Version/s: 1.3.0

> TEXT MySQL type unsupported
> ---
>
> Key: DRILL-3956
> URL: https://issues.apache.org/jira/browse/DRILL-3956
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.2.0
>Reporter: Andrew
>Assignee: Jacques Nadeau
> Fix For: 1.3.0
>
> Attachments: DRILL-3956.patch
>
>
> The JDBC storage plugin will fail with an NPE when querying a MySQL table 
> that has a 'TEXT' column. The underlying problem appears to be that Calcite 
> has no notion of this type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3921) Hive LIMIT 1 queries take too long

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3921:
--
Fix Version/s: 1.3.0

> Hive LIMIT 1 queries take too long
> --
>
> Key: DRILL-3921
> URL: https://issues.apache.org/jira/browse/DRILL-3921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
> Fix For: 1.3.0
>
>
> Fragment initialization on a Hive table (that is backed by a directory of 
> many files) can take really long. This is evident through LIMIT 1 queries. 
> The root cause is that the underlying reader in the HiveRecordReader is 
> initialized when the ctor is called, rather than when setup is called.
> Two changes need to be made:
> 1) lazily initialize the underlying record reader in HiveRecordReader
> 2) allow for running a callable as a proxy user within an operator (through 
> OperatorContext). This is required as initialization of the underlying record 
> reader needs to be done as a proxy user (proxy for owner of the file). 
> Previously, this was handled while creating the record batch tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3992.
---
   Resolution: Fixed
 Assignee: Jacques Nadeau
Fix Version/s: (was: 1.2.0)
   1.3.0

Resolved in 22e5316

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Assignee: Jacques Nadeau
>Priority: Minor
>  Labels: newbie
> Fix For: 1.3.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-1752) Drill cluster returns error when querying Mongo shards on an unsharded collection

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-1752:
--
Fix Version/s: (was: Future)
   1.3.0

> Drill cluster returns error when querying Mongo shards on an unsharded 
> collection
> -
>
> Key: DRILL-1752
> URL: https://issues.apache.org/jira/browse/DRILL-1752
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 0.6.0, 0.7.0
> Environment: Drill cluster on nodes with Mongo Shards
>Reporter: Andries Engelbrecht
>Priority: Minor
> Fix For: 1.3.0
>
> Attachments: DRILL-1752.patch
>
>
> Query fails on a large unsharded collection in MongoDB sharded cluster with 
> drillbits on each node with Mongo shards.
> Error message:
> 0: jdbc:drill:se0:5181> select * from unshard limit 2;
> Query failed: Failure while setting up query. Incoming endpoints 1 is greater 
> than number of chunks 0 [cb2121f7-eb3e-48cd-8530-474ca76c598d]
> Error: exception while executing query: Failure while trying to get next 
> result batch. (state=,code=0)
> 0: jdbc:drill:se0:5181> explain plan for select * from unshard limit 2;
> +++
> |text|json|
> +++
> | 00-00Screen
> 00-01  SelectionVectorRemover
> 00-02Limit(fetch=[2])
> 00-03  Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec 
> [dbName=review_syn, collectionName=unshard, filters=null], 
> columns=[SchemaPath [`*`)
>  | {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "mongo-scan",
> "@id" : 3,
> "mongoScanSpec" : {
>   "dbName" : "review_syn",
>   "collectionName" : "unshard",
>   "filters" : null
> },
> "storage" : {
>   "type" : "mongo",
>   "connection" : "mongodb://se4.dmz:27017",
>   "enabled" : true
> },
> "columns" : [ "`*`" ],
> "cost" : 625000.0
>   }, {
> "pop" : "limit",
> "@id" : 2,
> "child" : 3,
> "first" : 0,
> "last" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 625000.0
>   }, {
> "pop" : "selection-vector-remover",
> "@id" : 1,
> "child" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 625000.0
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 625000.0
>   } ]
> } |
> +++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4000) In all non-root fragments, Drill recreates storage plugin instances for every minor fragment

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-4000.
---
Resolution: Fixed

Resolved in 7f55051

> In all non-root fragments, Drill recreates storage plugin instances for every 
> minor fragment
> 
>
> Key: DRILL-4000
> URL: https://issues.apache.org/jira/browse/DRILL-4000
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.3.0
>
>
> Drill is creating ephemeral storage plugin instances when a plan is 
> deserialized. As such, every minor fragment of a query has Drill create a 
> separate storage plugin instance. Depending on the cost of storage plugin 
> creation, this could be quite expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3983) Small test improvements

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3983.
---
Resolution: Fixed

Fixed in 77e7de4

> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3810) Filesystem plugin's support for file format's schema

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-3810.
---
Resolution: Fixed

Fixed in ce593eb

> Filesystem plugin's support for file format's schema
> 
>
> Key: DRILL-3810
> URL: https://issues.apache.org/jira/browse/DRILL-3810
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON, Storage - Other, Storage - Parquet, 
> Storage - Text & CSV
>Reporter: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.3.0
>
>
> Filesystem Plugin supports multiple type of file formats like
>   * json
>   * avro
>   * text (csv|psv|tsv)
>   * parquet
> and can support any type of file formats.
> Among these file formats, some of the file formats are schema based like
> *avro* and *parquet* and some of them are schema less like *json*.
> For schema based file formats, Drill should have capability to validate the 
> query against file schema, before start executing the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3810) Filesystem plugin's support for file format's schema

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3810:
--
Fix Version/s: (was: Future)
   1.3.0

> Filesystem plugin's support for file format's schema
> 
>
> Key: DRILL-3810
> URL: https://issues.apache.org/jira/browse/DRILL-3810
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - JSON, Storage - Other, Storage - Parquet, 
> Storage - Text & CSV
>Reporter: Bhallamudi Venkata Siva Kamesh
> Fix For: 1.3.0
>
>
> Filesystem Plugin supports multiple type of file formats like
>   * json
>   * avro
>   * text (csv|psv|tsv)
>   * parquet
> and can support any type of file formats.
> Among these file formats, some of the file formats are schema based like
> *avro* and *parquet* and some of them are schema less like *json*.
> For schema based file formats, Drill should have capability to validate the 
> query against file schema, before start executing the query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-1752) Drill cluster returns error when querying Mongo shards on an unsharded collection

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau resolved DRILL-1752.
---
Resolution: Fixed

Fixed in ce593eb4c1dc5388787f0896f8845c9b0bc5d2e8

> Drill cluster returns error when querying Mongo shards on an unsharded 
> collection
> -
>
> Key: DRILL-1752
> URL: https://issues.apache.org/jira/browse/DRILL-1752
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - MongoDB
>Affects Versions: 0.6.0, 0.7.0
> Environment: Drill cluster on nodes with Mongo Shards
>Reporter: Andries Engelbrecht
>Priority: Minor
> Fix For: Future
>
> Attachments: DRILL-1752.patch
>
>
> Query fails on a large unsharded collection in MongoDB sharded cluster with 
> drillbits on each node with Mongo shards.
> Error message:
> 0: jdbc:drill:se0:5181> select * from unshard limit 2;
> Query failed: Failure while setting up query. Incoming endpoints 1 is greater 
> than number of chunks 0 [cb2121f7-eb3e-48cd-8530-474ca76c598d]
> Error: exception while executing query: Failure while trying to get next 
> result batch. (state=,code=0)
> 0: jdbc:drill:se0:5181> explain plan for select * from unshard limit 2;
> +++
> |text|json|
> +++
> | 00-00Screen
> 00-01  SelectionVectorRemover
> 00-02Limit(fetch=[2])
> 00-03  Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec 
> [dbName=review_syn, collectionName=unshard, filters=null], 
> columns=[SchemaPath [`*`)
>  | {
>   "head" : {
> "version" : 1,
> "generator" : {
>   "type" : "ExplainHandler",
>   "info" : ""
> },
> "type" : "APACHE_DRILL_PHYSICAL",
> "options" : [ ],
> "queue" : 0,
> "resultMode" : "EXEC"
>   },
>   "graph" : [ {
> "pop" : "mongo-scan",
> "@id" : 3,
> "mongoScanSpec" : {
>   "dbName" : "review_syn",
>   "collectionName" : "unshard",
>   "filters" : null
> },
> "storage" : {
>   "type" : "mongo",
>   "connection" : "mongodb://se4.dmz:27017",
>   "enabled" : true
> },
> "columns" : [ "`*`" ],
> "cost" : 625000.0
>   }, {
> "pop" : "limit",
> "@id" : 2,
> "child" : 3,
> "first" : 0,
> "last" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 625000.0
>   }, {
> "pop" : "selection-vector-remover",
> "@id" : 1,
> "child" : 2,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 625000.0
>   }, {
> "pop" : "screen",
> "@id" : 0,
> "child" : 1,
> "initialAllocation" : 100,
> "maxAllocation" : 100,
> "cost" : 625000.0
>   } ]
> } |
> +++



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3983) Small test improvements

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984664#comment-14984664
 ] 

ASF GitHub Bot commented on DRILL-3983:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/221


> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4000) In all non-root fragments, Drill recreates storage plugin instances for every minor fragment

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984663#comment-14984663
 ] 

ASF GitHub Bot commented on DRILL-4000:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/227


> In all non-root fragments, Drill recreates storage plugin instances for every 
> minor fragment
> 
>
> Key: DRILL-4000
> URL: https://issues.apache.org/jira/browse/DRILL-4000
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.3.0
>
>
> Drill is creating ephemeral storage plugin instances when a plan is 
> deserialized. As such, every minor fragment of a query has Drill create a 
> separate storage plugin instance. Depending on the cost of storage plugin 
> creation, this could be quite expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3921) Hive LIMIT 1 queries take too long

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984665#comment-14984665
 ] 

ASF GitHub Bot commented on DRILL-3921:
---

Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/197


> Hive LIMIT 1 queries take too long
> --
>
> Key: DRILL-3921
> URL: https://issues.apache.org/jira/browse/DRILL-3921
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Reporter: Sudheesh Katkam
>Assignee: Sudheesh Katkam
>
> Fragment initialization on a Hive table (that is backed by a directory of 
> many files) can take really long. This is evident through LIMIT 1 queries. 
> The root cause is that the underlying reader in the HiveRecordReader is 
> initialized when the ctor is called, rather than when setup is called.
> Two changes need to be made:
> 1) lazily initialize the underlying record reader in HiveRecordReader
> 2) allow for running a callable as a proxy user within an operator (through 
> OperatorContext). This is required as initialization of the underlying record 
> reader needs to be done as a proxy user (proxy for owner of the file). 
> Previously, this was handled while creating the record batch tree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In

2015-11-01 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984635#comment-14984635
 ] 

Steven Phillips commented on DRILL-3992:


+1

> Unable to query Oracle DB using JDBC Storage Plug-In
> 
>
> Key: DRILL-3992
> URL: https://issues.apache.org/jira/browse/DRILL-3992
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.2.0
> Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00
>Reporter: Eric Roma
>Priority: Minor
>  Labels: newbie
> Fix For: 1.2.0
>
>
> *See External Issue URL for Stack Overflow Post*
> *Appears to be similar issue at 
> http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc*
> Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release 
> 10.2.0.4.0 - 64bit in embedded mode.
> I'm curious if anyone has had any success connecting Apache Drill to an 
> Oracle DB. I've updated the drill-override.conf with the following 
> configurations (per documents):
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "localhost:2181",
>   drill.exec.sys.store.provider.local.path = "/mypath"
> }
> and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can 
> successfully create the storage plug-in:
> {
>   "type": "jdbc",
>   "driver": "oracle.jdbc.driver.OracleDriver",
>   "url": "jdbc:oracle:thin:@::",
>   "username": "USERNAME",
>   "password": "PASSWORD",
>   "enabled": true
> }
> but when I issue a query such as:
> select * from ..`dual`; 
> I get the following error:
> Query Failed: An Error Occurred
> org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: 
> From line 1, column 15 to line 1, column 20: Table 
> '..dual' not found [Error Id: 
> 57a4153c-6378-4026-b90c-9bb727e131ae on :].
> I've tried to query other schema/tables and get a similar result. I've also 
> tried connecting to Teradata and get the same error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4000) In all non-root fragments, Drill recreates storage plugin instances for every minor fragment

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984632#comment-14984632
 ] 

ASF GitHub Bot commented on DRILL-4000:
---

Github user StevenMPhillips commented on the pull request:

https://github.com/apache/drill/pull/227#issuecomment-152887889
  
+1


> In all non-root fragments, Drill recreates storage plugin instances for every 
> minor fragment
> 
>
> Key: DRILL-4000
> URL: https://issues.apache.org/jira/browse/DRILL-4000
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Jacques Nadeau
>Assignee: Jacques Nadeau
> Fix For: 1.3.0
>
>
> Drill is creating ephemeral storage plugin instances when a plan is 
> deserialized. As such, every minor fragment of a query has Drill create a 
> separate storage plugin instance. Depending on the cost of storage plugin 
> creation, this could be quite expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3956) TEXT MySQL type unsupported

2015-11-01 Thread Steven Phillips (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984629#comment-14984629
 ] 

Steven Phillips commented on DRILL-3956:


+1

> TEXT MySQL type unsupported
> ---
>
> Key: DRILL-3956
> URL: https://issues.apache.org/jira/browse/DRILL-3956
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.2.0
>Reporter: Andrew
>Assignee: Steven Phillips
> Attachments: DRILL-3956.patch
>
>
> The JDBC storage plugin will fail with an NPE when querying a MySQL table 
> that has a 'TEXT' column. The underlying problem appears to be that Calcite 
> has no notion of this type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3983) Small test improvements

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984619#comment-14984619
 ] 

ASF GitHub Bot commented on DRILL-3983:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/221#issuecomment-152885146
  
lgtm. +1. Will merge shortly.


> Small test improvements
> ---
>
> Key: DRILL-3983
> URL: https://issues.apache.org/jira/browse/DRILL-3983
> Project: Apache Drill
>  Issue Type: Test
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3956) TEXT MySQL type unsupported

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3956:
--
Assignee: Steven Phillips  (was: Jacques Nadeau)

> TEXT MySQL type unsupported
> ---
>
> Key: DRILL-3956
> URL: https://issues.apache.org/jira/browse/DRILL-3956
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.2.0
>Reporter: Andrew
>Assignee: Steven Phillips
> Attachments: DRILL-3956.patch
>
>
> The JDBC storage plugin will fail with an NPE when querying a MySQL table 
> that has a 'TEXT' column. The underlying problem appears to be that Calcite 
> has no notion of this type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-3956) TEXT MySQL type unsupported

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau updated DRILL-3956:
--
Attachment: DRILL-3956.patch

> TEXT MySQL type unsupported
> ---
>
> Key: DRILL-3956
> URL: https://issues.apache.org/jira/browse/DRILL-3956
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.2.0
>Reporter: Andrew
>Assignee: Jacques Nadeau
> Attachments: DRILL-3956.patch
>
>
> The JDBC storage plugin will fail with an NPE when querying a MySQL table 
> that has a 'TEXT' column. The underlying problem appears to be that Calcite 
> has no notion of this type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (DRILL-3956) TEXT MySQL type unsupported

2015-11-01 Thread Jacques Nadeau (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacques Nadeau reassigned DRILL-3956:
-

Assignee: Jacques Nadeau  (was: Andrew)

> TEXT MySQL type unsupported
> ---
>
> Key: DRILL-3956
> URL: https://issues.apache.org/jira/browse/DRILL-3956
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - Other
>Affects Versions: 1.2.0
>Reporter: Andrew
>Assignee: Jacques Nadeau
>
> The JDBC storage plugin will fail with an NPE when querying a MySQL table 
> that has a 'TEXT' column. The underlying problem appears to be that Calcite 
> has no notion of this type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3994) Build Fails on Windows after DRILL-3742

2015-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984612#comment-14984612
 ] 

ASF GitHub Bot commented on DRILL-3994:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/226#issuecomment-152881991
  
Still broken: 

[ERROR] Failed to execute goal 
org.apache.drill.tools:drill-fmpp-maven-plugin:1.3.0-SNAPSHOT:generate 
(generate-fmpp) on project drill-java-exec: FMPP processin
g session failed.
[ERROR] Caused by: A listener Java object has failed to handle event "end 
processing session". The class of the failing listener object is 
org.apache.drill.fmpp.mojo.FMPPMojo$1.
[ERROR] Caused by: org.apache.maven.plugin.MojoFailureException: 
C:\Users\jnadeau\AppData\Local\Temp\freemarker-tmp8027657275379826459\javacc\Parser.jj
 should start with 
C:\Users\jnadeau\AppData\Local\Temp\freemarker-tmp8027657275379826459/


> Build Fails on Windows after DRILL-3742
> ---
>
> Key: DRILL-3994
> URL: https://issues.apache.org/jira/browse/DRILL-3994
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Tools, Build & Test
>Reporter: Sudheesh Katkam
>Assignee: Julien Le Dem
>Priority: Critical
> Fix For: 1.3.0
>
>
> Build fails on Windows on the latest master:
> {code}
> c:\drill> mvn clean install -DskipTests 
> ...
> [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 
> approved: 169 licence.
> [INFO] 
> [INFO] <<< exec-maven-plugin:1.2.1:java (default) < validate @ drill-common 
> <<<
> [INFO] 
> [INFO] --- exec-maven-plugin:1.2.1:java (default) @ drill-common ---
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See 
> http://www.slf4j.org/codes.html#StaticLoggerBinder
>  for further details.
> Scanning: C:\drill\common\target\classes
> [WARNING] 
> java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: 
> file:C:/drill/common/target/classes/ not in 
> [file:/C:/drill/common/target/classes/]
>   at 
> org.apache.drill.common.scanner.BuildTimeScan.main(BuildTimeScan.java:129)
>   ... 6 more
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO] 
> [INFO] Apache Drill Root POM .. SUCCESS [ 10.016 
> s]
> [INFO] tools/Parent Pom ... SUCCESS [  1.062 
> s]
> [INFO] tools/freemarker codegen tooling ... SUCCESS [  6.922 
> s]
> [INFO] Drill Protocol . SUCCESS [ 10.062 
> s]
> [INFO] Common (Logical Plan, Base expressions)  FAILURE [  9.954 
> s]
> [INFO] contrib/Parent Pom . SKIPPED
> [INFO] contrib/data/Parent Pom  SKIPPED
> [INFO] contrib/data/tpch-sample-data .. SKIPPED
> [INFO] exec/Parent Pom  SKIPPED
> [INFO] exec/Java Execution Engine . SKIPPED
> [INFO] exec/JDBC Driver using dependencies  SKIPPED
> [INFO] JDBC JAR with all dependencies . SKIPPED
> [INFO] contrib/mongo-storage-plugin ... SKIPPED
> [INFO] contrib/hbase-storage-plugin ... SKIPPED
> [INFO] contrib/jdbc-storage-plugin  SKIPPED
> [INFO] contrib/hive-storage-plugin/Parent Pom . SKIPPED
> [INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SKIPPED
> [INFO] contrib/hive-storage-plugin/core ... SKIPPED
> [INFO] contrib/drill-gis-plugin ... SKIPPED
> [INFO] Packaging and Distribution Assembly  SKIPPED
> [INFO] contrib/sqlline  SKIPPED
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 38.813 s
> [INFO] Finished at: 2015-10-28T12:17:19-07:00
> [INFO] Final Memory: 67M/466M
> [INFO] 
> 
> [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:j

[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread Daniel Barclay (Drill) (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410
 ] 

Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 10:10 PM:
-

Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning 
{{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't 
get its schema, even for static-schema sources, or even get trigger to update 
their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was 
not documented clearly (so developers didn't know correctly what to expect or 
provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of 
{{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} 
correctly (so it reported spurious/incorrect schema-change and/or 
empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR 
{"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so 
it didn't reset nested schema-change state, and so caused spurious 
{{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field 
already existed in the batch (so in that case it forcibly changed the type to 
{{NullableIntVector}}, causing schema changes and downstream exceptions). 
\[Note:  DRILL-2288 does not address other problems with {{NullableIntVector}} 
dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with 
multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left 
undetected and unresolved.)   \[Note: DRILL-2288 addresses only one test table 
(increasing the number of regions on the other test tables exposed at least one 
other problem; others remain).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family 
(so {{NullableIntVector}} dummy columns got created, causing spurious schema 
changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero 
({{OrderedPartitionRecordBatch.recordCount}}, 
{{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so 
downstream code tried to access elements of (correctly) empty vectors, yielding 
{{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by 
{{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements 
of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the 
returned length and the length of sibling vectors (so 
{{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case 
of a zero-record record batch (so when it read a zero-row record batch, it 
caused a memory leak reported at Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited 
identifiers of a form (with a period) that Drill can't handle (so the test 
failed when the test ran with multiple fragments).





was (Author: dsbos):

Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning 
{{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't 
get its schema, even for static-schema sources, or even get trigger to update 
their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was 
not documented clearly (so developers didn't know correctly what to expect or 
provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of 
{{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} 
correctly (so it reported spurious/incorrect schema-change and/or 
empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR 
{"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so 
it didn't reset nested schema-change state, and so caused spurious 
{{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field 
already existed in the batch (so in that case it forcibly changed the type to 
{{NullableIntVector}}, causing schema changes and downstream exceptions). 
\[

[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]

2015-11-01 Thread Daniel Barclay (Drill) (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410
 ] 

Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 9:59 PM:



Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning 
{{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't 
get its schema, even for static-schema sources, or even get trigger to update 
their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was 
not documented clearly (so developers didn't know correctly what to expect or 
provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of 
{{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} 
correctly (so it reported spurious/incorrect schema-change and/or 
empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR 
{"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so 
it didn't reset nested schema-change state, and so caused spurious 
{{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field 
already existed in the batch (so in that case it forcibly changed the type to 
{{NullableIntVector}}, causing schema changes and downstream exceptions). 
\[Note:  DRILL-2288 does not address other problems with {{NullableIntVector}} 
dummy columns from {{JsonRecordReader}}.]

7.  HBase tests used only one table region, ignoring known problems with 
multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left 
undetected and unresolved.)   \[Note: DRILL-2288 addresses only one test table 
(increasing the number of regions on the other test tables exposes at least one 
other problem).]

8.  {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family 
(so {{NullableIntVector}} dummy columns got created, causing spurious schema 
changes and downstream exceptions).

9.  Some {{RecordBatch}} classes didn't reset their record counts to zero 
({{OrderedPartitionRecordBatch.recordCount}}, 
{{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so 
downstream code tried to access elements of (correctly) empty vectors, yielding 
{{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ).

10.  {{RecordBatchLoader}}'s record count was not reset to zero by 
{{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements 
of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}") ).

11.  {{MapVector.load(...)}} left some existing vectors empty, not matching the 
returned length and the length of sibling vectors (so 
{{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... 
{{range (0, 0)}}").  \[Note: DRILL-2288 does not address the root problem.]

12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case 
of a zero-record record batch (so when it read a zero-row record batch, it 
caused a memory leak reported at Drillbit shutdown time).

13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited 
identifiers of a form (with a period) that Drill can't handle (so the test 
failed when the test ran with multiple fragments).





was (Author: dsbos):


Chain of bugs and problems encountered and (partially) addressed:

1.  {{ScanBatch.next()}} returned {{NONE}} without ever returning 
{{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't 
get its schema, even for static-schema sources, or even get trigger to update 
their own schema).

2.  {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was 
not documented clearly (so developers didn't know correctly what to expect or 
provide).

3.  {{IteratorValidatorBatchIterator}} didn't validate the sequence of 
{{IterOutcome values}} (so developers weren't notified about incorrect results).

4.  {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} 
correctly (so it reported spurious/incorrect schema-change and/or 
empty-/non-empty input exceptions).

5.  {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR 
{"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so 
it didn't reset nested schema-change state, and so caused spurious 
{{OK_NEW_SCHEMA}} notifications and downstream exceptions).

6.  {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field 
already existed in the batch (so in that case it forcibly changed the type to 
{{NullableIntVector}}, causing schema changes and downstream exceptions). 
\[Note:  DRILL-22

[jira] [Updated] (DRILL-3952) Improve Window Functions performance when not all batches are required to process the current batch

2015-11-01 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha updated DRILL-3952:
--
Assignee: Deneche A. Hakim  (was: Aman Sinha)

> Improve Window Functions performance when not all batches are required to 
> process the current batch
> ---
>
> Key: DRILL-3952
> URL: https://issues.apache.org/jira/browse/DRILL-3952
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.2.0
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
> Fix For: 1.3.0
>
>
> Currently, the window operator blocks until all batches of current partition 
> to be available. For some queries it's necessary (e.g. aggregate with no 
> order-by in the window definition), but for other cases the window operator 
> can process and pass the current batch downstream sooner.
> Implementing this should help the window operator use less memory and run 
> faster, especially in the presence of a limit operator.
> The purpose of this JIRA is to improve the window operator in the following 
> cases:
> - aggregate, when order-by clause is available in window definition, can 
> process current batch as soon as it receives the last peer row
> - lead can process current batch as soon as it receives 1 more batch
> - lag can process current batch immediately
> - first_value can process current batch immediately
> - last_value, when order-by clause is available in window definition, can 
> process current batch as soon as it receives the last peer row
> - row_number, rank and dense_rank can process current batch immediately 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key

2015-11-01 Thread Aman Sinha (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Sinha resolved DRILL-3538.
---
Resolution: Not A Problem

Per previous comments, marking this as working as designed. 

> We do not prune partitions when we count over partitioning key and filter 
> over partitioning key
> ---
>
> Key: DRILL-3538
> URL: https://issues.apache.org/jira/browse/DRILL-3538
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.3.0
>
>
> We are not partition pruning when we do a count over partitioning key and 
> when the predicate involves the partitioning key. CTAS used was,
> {code}
> create table t3214 partition by (key2) as select cast(key1 as double) key1, 
> cast(key2 as char(1)) key2 from `twoKeyJsn.json`;
> {code}
> case 1) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7])
> {code}
> case 2) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2])
> {code}
> case 3) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0])
> {code}
> case 4) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), 
> $1)):ANY NOT NULL])
> 00-02StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
> 00-03  StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04Project(key1=[$1])
> 00-05  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], 
> selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> case 5) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-03  StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-04Project(key1=[$1])
> 00-05  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], 
> selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> commit id that I am testing on : 17e580a7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key

2015-11-01 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984483#comment-14984483
 ] 

Aman Sinha commented on DRILL-3538:
---

[~khfaraaz] I am not sure why you say we are not pruning in cases 1, 2, 3.  The 
Explain looks fine to me.  There is no Filter node in the plan which indicates 
it has been pushed into the Scan.  The reason you see the Scan showing a 
PojoRecordReader is that for a trivial COUNT(*) query on Parquet data, Drill 
optimizes by reading the row count directly from the metadata instead of doing 
it through a separate aggregation.  If you are specifically looking for the 
Scan to display the attributes it displays for a regular scan, that's a 
separate issue. 

> We do not prune partitions when we count over partitioning key and filter 
> over partitioning key
> ---
>
> Key: DRILL-3538
> URL: https://issues.apache.org/jira/browse/DRILL-3538
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.2.0
> Environment: 4 node cluster on CentOS
>Reporter: Khurram Faraaz
>Assignee: Aman Sinha
>Priority: Critical
> Fix For: 1.3.0
>
>
> We are not partition pruning when we do a count over partitioning key and 
> when the predicate involves the partitioning key. CTAS used was,
> {code}
> create table t3214 partition by (key2) as select cast(key1 as double) key1, 
> cast(key2 as char(1)) key2 from `twoKeyJsn.json`;
> {code}
> case 1) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7])
> {code}
> case 2) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2])
> {code}
> case 3) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02Project(EXPR$0=[$0])
> 00-03  
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0])
> {code}
> case 4) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), 
> $1)):ANY NOT NULL])
> 00-02StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
> 00-03  StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04Project(key1=[$1])
> 00-05  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], 
> selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> case 5) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 
> where key2 = 'm';
> +--+--+
> | text | json |
> +--+--+
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-03  StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-04Project(key1=[$1])
> 00-05  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], 
> selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> commit id that I am testing on : 17e580a7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (DRILL-4001) Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)

2015-11-01 Thread Daniel Barclay (Drill) (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Barclay (Drill) updated DRILL-4001:
--
Component/s: Execution - Data Types

> Empty vectors from previous batch left by 
> MapVector.load(...)/RecordBatchLoader.load(...)
> -
>
> Key: DRILL-4001
> URL: https://issues.apache.org/jira/browse/DRILL-4001
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Reporter: Daniel Barclay (Drill)
>
> In certain cases, {{MapVector.load(...)}} (called by 
> {{RecordBatchLoader.load(...)}}) returns with some map child vectors having a 
> length of zero instead of having a length matching the length of sibling 
> vectors and the number of records in the batch.  (This caused some of the 
> {{IndexOutOfBoundException}} errors seen in fixing DRILL-2288.)
> The condition seems to be that a child field (e.g., an HBase column in a 
> HBase column family) appears in an earlier batch and does not appear in a 
> later batch.  
> (The HBase column's child vector gets created (in the MapVector for the HBase 
> column family) during loading of the earlier batch.  During loading of the 
> later batch, all vectors get reset to zero length, and then only vectors for 
> fields _appearing in the batch message being loaded_ get loaded and set to 
> the length of the batch-\-other vectors created from earlier 
> messages/{{load}} calls are left with a length of zero (instead of, say, 
> being filled with nulls to the length of their siblings and the current 
> record batch).)
> See the TODO(DRILL-) mark and workaround in {{MapVector.getObject(int)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)