[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984771#comment-14984771 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43598040 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java --- @@ -93,13 +93,15 @@ public void ensureAtLeastOneField(ComplexWriter writer) { if (!atLeastOneWrite) { // if we had no columns, create one empty one so we can return some data for count purposes. SchemaPath sp = columns.get(0); - PathSegment root = sp.getRootSegment(); + PathSegment fieldPath = sp.getRootSegment(); BaseWriter.MapWriter fieldWriter = writer.rootAsMap(); - while (root.getChild() != null && !root.getChild().isArray()) { -fieldWriter = fieldWriter.map(root.getNameSegment().getPath()); -root = root.getChild(); + while (fieldPath.getChild() != null && ! fieldPath.getChild().isArray()) { +fieldWriter = fieldWriter.map(fieldPath.getNameSegment().getPath()); +fieldPath = fieldPath.getChild(); + } + if (fieldWriter.isEmptyMap()) { --- End diff -- can you explain this change? > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984767#comment-14984767 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597968 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/AbstractMapVector.java --- @@ -189,8 +189,8 @@ protected void putVector(String name, ValueVector vector) { Preconditions.checkNotNull(vector, "vector cannot be null") ); if (old != null && old != vector) { - logger.debug("Field [%s] mutated from [%s] to [%s]", name, old.getClass().getSimpleName(), - vector.getClass().getSimpleName()); + logger.debug("Field [{}] mutated from [{}] to [{}]", name, old.getClass().getSimpleName(), --- End diff -- I think there is a limit to the number of bracket replacements on the logger interface. Is three allowed? > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984769#comment-14984769 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597994 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/MapVector.java --- @@ -309,7 +309,14 @@ public Object getObject(int index) { Map vv = new JsonStringHashMap<>(); for (String child:getChildFieldNames()) { ValueVector v = getChild(child); -if (v != null) { +// TODO(DRILL-4001): Resolve this hack: --- End diff -- Can you add what happens if this hack isn't done? What is the error? > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3983) Small test improvements
[ https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984768#comment-14984768 ] ASF GitHub Bot commented on DRILL-3983: --- Github user julienledem commented on the pull request: https://github.com/apache/drill/pull/221#issuecomment-152918240 Thanks! > Small test improvements > --- > > Key: DRILL-3983 > URL: https://issues.apache.org/jira/browse/DRILL-3983 > Project: Apache Drill > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984764#comment-14984764 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597916 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/RecordBatch.java --- @@ -23,60 +23,214 @@ import org.apache.drill.exec.record.selection.SelectionVector4; /** - * A record batch contains a set of field values for a particular range of records. In the case of a record batch - * composed of ValueVectors, ideally a batch fits within L2 cache (~256k per core). The set of value vectors do not - * change unless the next() IterOutcome is a *_NEW_SCHEMA type. - * - * A key thing to know is that the Iterator provided by record batch must align with the rank positions of the field ids - * provided utilizing getValueVectorId(); + * A record batch contains a set of field values for a particular range of --- End diff -- Nice documentation on this code. > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984762#comment-14984762 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597810 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/union/UnionAllRecordBatch.java --- @@ -335,7 +343,8 @@ public IterOutcome nextBatch() throws SchemaChangeException { return iterRight; default: -throw new IllegalStateException(String.format("Unknown state %s.", iterRight)); +throw new IllegalStateException( --- End diff -- Why did you change the formatting here? > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984760#comment-14984760 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597780 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java --- @@ -122,6 +122,7 @@ protected void killIncoming(final boolean sendUpstream) { @Override public IterOutcome innerNext() { +recordCount = 0; --- End diff -- Why don't you add this to parent class rather than each child of the parent? (Either AbstractRecordBatch or AbstractSingleRecordBatch.) > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984756#comment-14984756 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597741 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/aggregate/HashAggTemplate.java --- @@ -325,10 +325,13 @@ public AggOutcome doWork() { if (EXTRA_DEBUG_1) { logger.debug("Received new schema. Batch has {} records.", incoming.getRecordCount()); } -newSchema = true; -this.cleanup(); -// TODO: new schema case needs to be handled appropriately -return AggOutcome.UPDATE_AGGREGATOR; +final BatchSchema newIncomingSchema = incoming.getSchema(); +if ((! newIncomingSchema.equals(schema)) && schema != null) { --- End diff -- You are quelling a new schema here without correctly reloading vector references. A schema change means: the schema has changed and/or the vectors have changed. Your check here isn't enough to guarantee correct behavior. It is possible that the schema is equal but the vectors changed. > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984752#comment-14984752 ] ASF GitHub Bot commented on DRILL-2288: --- Github user jacques-n commented on a diff in the pull request: https://github.com/apache/drill/pull/228#discussion_r43597587 --- Diff: contrib/storage-hbase/src/main/java/org/apache/drill/exec/store/hbase/HBaseRecordReader.java --- @@ -142,10 +148,12 @@ public void setup(OperatorContext context, OutputMutator output) throws Executio getOrCreateFamilyVector(column.getRootSegment().getPath(), false); } } - logger.debug("Opening scanner for HBase table '{}', Zookeeper quorum '{}', port '{}', znode '{}'.", - hbaseTableName, hbaseConf.get(HConstants.ZOOKEEPER_QUORUM), - hbaseConf.get(HBASE_ZOOKEEPER_PORT), hbaseConf.get(HConstants.ZOOKEEPER_ZNODE_PARENT)); - hTable = new HTable(hbaseConf, hbaseTableName); + // Add vector for any column families not mentioned yet (in order to avoid + // creation of dummy NullableIntVectors for them). + for (HColumnDescriptor columnFamily : --- End diff -- This also needs to be done for any requested column vectors. > ScanBatch violates IterOutcome protocol for zero-row sources [was: missing > JDBC metadata (schema) for 0-row results...] > --- > > Key: DRILL-2288 > URL: https://issues.apache.org/jira/browse/DRILL-2288 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Information Schema >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.3.0 > > Attachments: Drill2288NoResultSetMetadataWhenZeroRowsTest.java > > > The ResultSetMetaData object from getMetadata() of a ResultSet is not set up > (getColumnCount() returns zero, and trying to access any other metadata > throws IndexOutOfBoundsException) for a result set with zero rows, at least > for one from DatabaseMetaData.getColumns(...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4004) Fix bugs in JDK8 Tests before updating enforcer to JDK8
Jacques Nadeau created DRILL-4004: - Summary: Fix bugs in JDK8 Tests before updating enforcer to JDK8 Key: DRILL-4004 URL: https://issues.apache.org/jira/browse/DRILL-4004 Project: Apache Drill Issue Type: Bug Reporter: Jacques Nadeau The following tests fail on JDK8 {code} org.apache.drill.exec.store.mongo.TestMongoFilterPushDown.testFilterPushDownIsEqual org.apache.drill.exec.store.mongo.TestMongoFilterPushDown.testFilterPushDownGreaterThanWithSingleField org.apache.drill.exec.store.mongo.TestMongoFilterPushDown.testFilterPushDownLessThanWithSingleField org.apache.drill.TestFrameworkTest.testRepeatedColumnMatching org.apache.drill.TestFrameworkTest.testCSVVerificationOfOrder_checkFailure org.apache.drill.exec.physical.impl.flatten.TestFlattenPlanning.testFlattenPlanningAvoidUnnecessaryProject org.apache.drill.exec.record.vector.TestValueVector.testFixedVectorReallocation org.apache.drill.exec.record.vector.TestValueVector.testVariableVectorReallocation {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984731#comment-14984731 ] ASF GitHub Bot commented on DRILL-2288: --- GitHub user dsbos opened a pull request: https://github.com/apache/drill/pull/228 DRILL-2288: Fix ScanBatch violation of IterOutcome protocol and downstream chain of bugs You can merge this pull request into a Git repository by running: $ git pull https://github.com/dsbos/incubator-drill bugs/drill-2288_etc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/228.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #228 commit faefc960f1a36cf25fa0b553bab75e1c3fc71222 Author: dbarclay Date: 2015-10-28T02:25:25Z 2288: Pt. 1 Core: Added unit test. [Drill2288GetColumnsMetadataWhenNoRowsTest, empty.json] commit 396a41b155d1ad9413897ae4d87db2337642a1ea Author: dbarclay Date: 2015-11-01T03:36:12Z 2288: Pt. 1 Core: Changed HBase test table #1's # of regions from 1 to 2. [HBaseTestsSuite] Also added TODO(DRILL-3954) comment about # of regions. commit a0fe6b0787c284cda2006355e507c7caa6cae2a7 Author: dbarclay Date: 2015-10-28T02:35:11Z 2288: Pt. 2 Core: Documented IterOutcome much more clearly. [RecordBatch] Also edited some related Javadoc. commit 24b3f4df90711b630b30fe4e2ad68ff798e5731a Author: dbarclay Date: 2015-10-28T02:41:04Z 2288: Pt. 2 Hyg.: Edited doc., added @Override, etc. [AbstractRecordBatch, RecordBatch] Purged unused SetupOutcome. Added @Override. Edited comments. Fix some comments to doc. comments. commit a3108ab18e5f99e9d18f57786b2efa717a61c432 Author: dbarclay Date: 2015-10-28T03:00:26Z 2288: Pt. 3 Core&Hyg.: Added validation of IterOutcome sequence. [IteratorValidatorBatchIterator] Also: Renamed internal members for clarity. Added comments. commit c8fc4b3f5c3df5871a9b07b8f4ae800ddbe0ce64 Author: dbarclay Date: 2015-10-28T03:31:14Z 2288: Pt. 4 Core: Fixed a NONE -> OK_NEW_SCHEMA in ScanBatch.next(). [ScanBatch] (With nearby comments.) commit fcb1438724df83865a83749b097ca19d21cec444 Author: dbarclay Date: 2015-10-28T03:56:33Z 2288: Pt. 4 Hyg.: Edited comments, reordered, whitespace. [ScanBatch] Reordered Added comments. Aligned. commit 9520345aae524d246876a744a8a676e00b0294dc Author: dbarclay Date: 2015-10-28T04:02:25Z 2288: Pt. 4 Core+: Fixed UnionAllRecordBatch to receive IterOutcome sequence right. (3659) [UnionAllRecordBatch] commit 3acf8ec8927974901d825859c8f00d1382aa2d87 Author: dbarclay Date: 2015-10-28T04:05:01Z 2288: Pt. 5 Core: Fixed ScanBatch.Mutator.isNewSchema() to stop spurious "new schema" reports (fix short-circuit OR, to call resetting method right). [ScanBatch] commit 59ede9bda0e73c0bb6840018781f1f404d73f85c Author: dbarclay Date: 2015-10-28T04:11:55Z 2288: Pt. 5 Hyg.: Renamed, edited comments, reordered. [ScanBatch, SchemaChangeCallBack, AbstractSingleRecordBatch] Renamed getSchemaChange -> getSchemaChangedAndReset. Renamed schemaChange -> schemaChanged. Added doc. comments. Aligned. commit d93dc1ee518db409e72b8672679640be65ae7a62 Author: dbarclay Date: 2015-10-28T04:20:47Z 2288: Pt. 6 Core: Avoided dummy Null.IntVec. column in JsonReader when not needed (MapWriter.isEmptyMap()). [JsonReader, 3 vector files] commit f989229ac5510273698af19d87b50044efed81c5 Author: dbarclay Date: 2015-10-28T04:32:44Z 2288: Pt. 6 Hyg.: Edited comments, message. Fixed message formatting. [RecordReader, JSONFormatPlugin, JSONRecordReader, AbstractMapVector, JsonReader] Fixed message formatting. Edited comments. Edited message. Fixed spurious line break. commit a360e9d0d7bfac0aa75831c35585f8ea890858dc Author: dbarclay Date: 2015-10-28T05:06:13Z 2288: Pt. 7 Core: Added column families in HBaseRecordReader* to avoid dummy Null.IntVec. clash. [HBaseRecordReader] commit 18467a9ce14f6328b4766541267183927ebcdc9c Author: dbarclay Date: 2015-10-28T05:06:52Z 2288: Pt. 8 Core.1: Cleared recordCount in OrderedPartitionRecordBatch.innerNext(). [OrderedPartitionRecordBatch] commit 73bf71fc4af8e54599e908abbb993a64b066c097 Author: dbarclay Date: 2015-10-28T05:07:14Z 2288: Pt. 8 Core.2: Cleared recordCount in ProjectRecordBatch.innerNext. [ProjectRecordBatch] commit 064187d1c23f2b4fc09e94066d66e79ef961c4f1 Author: dbarclay Date: 2015-10-28T05:08:22Z 2288: Pt. 8 Core.3: Cleared recordCount in TopNBatch.innerNext. [TopNBatch] commit 8b9d1657ee22cee4432074f778e5d10e2c06e8e8 Author: dbarclay Date: 2015-10-28T05:24:35Z 2288: Pt. 9 Core: Had UnorderedRec
[jira] [Created] (DRILL-4003) Tests expecting Drill OversizedAllocationException yield OutOfMemoryError
Daniel Barclay (Drill) created DRILL-4003: - Summary: Tests expecting Drill OversizedAllocationException yield OutOfMemoryError Key: DRILL-4003 URL: https://issues.apache.org/jira/browse/DRILL-4003 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types, Tools, Build & Test Reporter: Daniel Barclay (Drill) Tests that expect Drill's {{OversizedAllocationException}} (for example, {{TestValueVector.testFixedVectorReallocation()}}) sometimes fail with an {{OutOfMemoryError}} instead. (Do the tests check whether there's enough memory available for the test before proceeding?) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3983) Small test improvements
[ https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984674#comment-14984674 ] ASF GitHub Bot commented on DRILL-3983: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/221#issuecomment-152898335 For reference, final tally shows that full test output went from 5.9M to 2.4M with this change. Nice improvement. > Small test improvements > --- > > Key: DRILL-3983 > URL: https://issues.apache.org/jira/browse/DRILL-3983 > Project: Apache Drill > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3983) Small test improvements
[ https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3983: -- Fix Version/s: 1.3.0 > Small test improvements > --- > > Key: DRILL-3983 > URL: https://issues.apache.org/jira/browse/DRILL-3983 > Project: Apache Drill > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3956) TEXT MySQL type unsupported
[ https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3956: -- Fix Version/s: 1.3.0 > TEXT MySQL type unsupported > --- > > Key: DRILL-3956 > URL: https://issues.apache.org/jira/browse/DRILL-3956 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.2.0 >Reporter: Andrew >Assignee: Jacques Nadeau > Fix For: 1.3.0 > > Attachments: DRILL-3956.patch > > > The JDBC storage plugin will fail with an NPE when querying a MySQL table > that has a 'TEXT' column. The underlying problem appears to be that Calcite > has no notion of this type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3921) Hive LIMIT 1 queries take too long
[ https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3921: -- Fix Version/s: 1.3.0 > Hive LIMIT 1 queries take too long > -- > > Key: DRILL-3921 > URL: https://issues.apache.org/jira/browse/DRILL-3921 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > Fix For: 1.3.0 > > > Fragment initialization on a Hive table (that is backed by a directory of > many files) can take really long. This is evident through LIMIT 1 queries. > The root cause is that the underlying reader in the HiveRecordReader is > initialized when the ctor is called, rather than when setup is called. > Two changes need to be made: > 1) lazily initialize the underlying record reader in HiveRecordReader > 2) allow for running a callable as a proxy user within an operator (through > OperatorContext). This is required as initialization of the underlying record > reader needs to be done as a proxy user (proxy for owner of the file). > Previously, this was handled while creating the record batch tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In
[ https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-3992. --- Resolution: Fixed Assignee: Jacques Nadeau Fix Version/s: (was: 1.2.0) 1.3.0 Resolved in 22e5316 > Unable to query Oracle DB using JDBC Storage Plug-In > > > Key: DRILL-3992 > URL: https://issues.apache.org/jira/browse/DRILL-3992 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 > Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00 >Reporter: Eric Roma >Assignee: Jacques Nadeau >Priority: Minor > Labels: newbie > Fix For: 1.3.0 > > > *See External Issue URL for Stack Overflow Post* > *Appears to be similar issue at > http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc* > Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release > 10.2.0.4.0 - 64bit in embedded mode. > I'm curious if anyone has had any success connecting Apache Drill to an > Oracle DB. I've updated the drill-override.conf with the following > configurations (per documents): > drill.exec: { > cluster-id: "drillbits1", > zk.connect: "localhost:2181", > drill.exec.sys.store.provider.local.path = "/mypath" > } > and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can > successfully create the storage plug-in: > { > "type": "jdbc", > "driver": "oracle.jdbc.driver.OracleDriver", > "url": "jdbc:oracle:thin:@::", > "username": "USERNAME", > "password": "PASSWORD", > "enabled": true > } > but when I issue a query such as: > select * from ..`dual`; > I get the following error: > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: > From line 1, column 15 to line 1, column 20: Table > '..dual' not found [Error Id: > 57a4153c-6378-4026-b90c-9bb727e131ae on :]. > I've tried to query other schema/tables and get a similar result. I've also > tried connecting to Teradata and get the same error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-1752) Drill cluster returns error when querying Mongo shards on an unsharded collection
[ https://issues.apache.org/jira/browse/DRILL-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-1752: -- Fix Version/s: (was: Future) 1.3.0 > Drill cluster returns error when querying Mongo shards on an unsharded > collection > - > > Key: DRILL-1752 > URL: https://issues.apache.org/jira/browse/DRILL-1752 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MongoDB >Affects Versions: 0.6.0, 0.7.0 > Environment: Drill cluster on nodes with Mongo Shards >Reporter: Andries Engelbrecht >Priority: Minor > Fix For: 1.3.0 > > Attachments: DRILL-1752.patch > > > Query fails on a large unsharded collection in MongoDB sharded cluster with > drillbits on each node with Mongo shards. > Error message: > 0: jdbc:drill:se0:5181> select * from unshard limit 2; > Query failed: Failure while setting up query. Incoming endpoints 1 is greater > than number of chunks 0 [cb2121f7-eb3e-48cd-8530-474ca76c598d] > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > 0: jdbc:drill:se0:5181> explain plan for select * from unshard limit 2; > +++ > |text|json| > +++ > | 00-00Screen > 00-01 SelectionVectorRemover > 00-02Limit(fetch=[2]) > 00-03 Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec > [dbName=review_syn, collectionName=unshard, filters=null], > columns=[SchemaPath [`*`) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "mongo-scan", > "@id" : 3, > "mongoScanSpec" : { > "dbName" : "review_syn", > "collectionName" : "unshard", > "filters" : null > }, > "storage" : { > "type" : "mongo", > "connection" : "mongodb://se4.dmz:27017", > "enabled" : true > }, > "columns" : [ "`*`" ], > "cost" : 625000.0 > }, { > "pop" : "limit", > "@id" : 2, > "child" : 3, > "first" : 0, > "last" : 2, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 625000.0 > }, { > "pop" : "selection-vector-remover", > "@id" : 1, > "child" : 2, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 625000.0 > }, { > "pop" : "screen", > "@id" : 0, > "child" : 1, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 625000.0 > } ] > } | > +++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4000) In all non-root fragments, Drill recreates storage plugin instances for every minor fragment
[ https://issues.apache.org/jira/browse/DRILL-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-4000. --- Resolution: Fixed Resolved in 7f55051 > In all non-root fragments, Drill recreates storage plugin instances for every > minor fragment > > > Key: DRILL-4000 > URL: https://issues.apache.org/jira/browse/DRILL-4000 > Project: Apache Drill > Issue Type: Bug >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.3.0 > > > Drill is creating ephemeral storage plugin instances when a plan is > deserialized. As such, every minor fragment of a query has Drill create a > separate storage plugin instance. Depending on the cost of storage plugin > creation, this could be quite expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3983) Small test improvements
[ https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-3983. --- Resolution: Fixed Fixed in 77e7de4 > Small test improvements > --- > > Key: DRILL-3983 > URL: https://issues.apache.org/jira/browse/DRILL-3983 > Project: Apache Drill > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3810) Filesystem plugin's support for file format's schema
[ https://issues.apache.org/jira/browse/DRILL-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-3810. --- Resolution: Fixed Fixed in ce593eb > Filesystem plugin's support for file format's schema > > > Key: DRILL-3810 > URL: https://issues.apache.org/jira/browse/DRILL-3810 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON, Storage - Other, Storage - Parquet, > Storage - Text & CSV >Reporter: Bhallamudi Venkata Siva Kamesh > Fix For: 1.3.0 > > > Filesystem Plugin supports multiple type of file formats like > * json > * avro > * text (csv|psv|tsv) > * parquet > and can support any type of file formats. > Among these file formats, some of the file formats are schema based like > *avro* and *parquet* and some of them are schema less like *json*. > For schema based file formats, Drill should have capability to validate the > query against file schema, before start executing the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3810) Filesystem plugin's support for file format's schema
[ https://issues.apache.org/jira/browse/DRILL-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3810: -- Fix Version/s: (was: Future) 1.3.0 > Filesystem plugin's support for file format's schema > > > Key: DRILL-3810 > URL: https://issues.apache.org/jira/browse/DRILL-3810 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - JSON, Storage - Other, Storage - Parquet, > Storage - Text & CSV >Reporter: Bhallamudi Venkata Siva Kamesh > Fix For: 1.3.0 > > > Filesystem Plugin supports multiple type of file formats like > * json > * avro > * text (csv|psv|tsv) > * parquet > and can support any type of file formats. > Among these file formats, some of the file formats are schema based like > *avro* and *parquet* and some of them are schema less like *json*. > For schema based file formats, Drill should have capability to validate the > query against file schema, before start executing the query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-1752) Drill cluster returns error when querying Mongo shards on an unsharded collection
[ https://issues.apache.org/jira/browse/DRILL-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau resolved DRILL-1752. --- Resolution: Fixed Fixed in ce593eb4c1dc5388787f0896f8845c9b0bc5d2e8 > Drill cluster returns error when querying Mongo shards on an unsharded > collection > - > > Key: DRILL-1752 > URL: https://issues.apache.org/jira/browse/DRILL-1752 > Project: Apache Drill > Issue Type: Bug > Components: Storage - MongoDB >Affects Versions: 0.6.0, 0.7.0 > Environment: Drill cluster on nodes with Mongo Shards >Reporter: Andries Engelbrecht >Priority: Minor > Fix For: Future > > Attachments: DRILL-1752.patch > > > Query fails on a large unsharded collection in MongoDB sharded cluster with > drillbits on each node with Mongo shards. > Error message: > 0: jdbc:drill:se0:5181> select * from unshard limit 2; > Query failed: Failure while setting up query. Incoming endpoints 1 is greater > than number of chunks 0 [cb2121f7-eb3e-48cd-8530-474ca76c598d] > Error: exception while executing query: Failure while trying to get next > result batch. (state=,code=0) > 0: jdbc:drill:se0:5181> explain plan for select * from unshard limit 2; > +++ > |text|json| > +++ > | 00-00Screen > 00-01 SelectionVectorRemover > 00-02Limit(fetch=[2]) > 00-03 Scan(groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec > [dbName=review_syn, collectionName=unshard, filters=null], > columns=[SchemaPath [`*`) > | { > "head" : { > "version" : 1, > "generator" : { > "type" : "ExplainHandler", > "info" : "" > }, > "type" : "APACHE_DRILL_PHYSICAL", > "options" : [ ], > "queue" : 0, > "resultMode" : "EXEC" > }, > "graph" : [ { > "pop" : "mongo-scan", > "@id" : 3, > "mongoScanSpec" : { > "dbName" : "review_syn", > "collectionName" : "unshard", > "filters" : null > }, > "storage" : { > "type" : "mongo", > "connection" : "mongodb://se4.dmz:27017", > "enabled" : true > }, > "columns" : [ "`*`" ], > "cost" : 625000.0 > }, { > "pop" : "limit", > "@id" : 2, > "child" : 3, > "first" : 0, > "last" : 2, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 625000.0 > }, { > "pop" : "selection-vector-remover", > "@id" : 1, > "child" : 2, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 625000.0 > }, { > "pop" : "screen", > "@id" : 0, > "child" : 1, > "initialAllocation" : 100, > "maxAllocation" : 100, > "cost" : 625000.0 > } ] > } | > +++ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3983) Small test improvements
[ https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984664#comment-14984664 ] ASF GitHub Bot commented on DRILL-3983: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/221 > Small test improvements > --- > > Key: DRILL-3983 > URL: https://issues.apache.org/jira/browse/DRILL-3983 > Project: Apache Drill > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4000) In all non-root fragments, Drill recreates storage plugin instances for every minor fragment
[ https://issues.apache.org/jira/browse/DRILL-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984663#comment-14984663 ] ASF GitHub Bot commented on DRILL-4000: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/227 > In all non-root fragments, Drill recreates storage plugin instances for every > minor fragment > > > Key: DRILL-4000 > URL: https://issues.apache.org/jira/browse/DRILL-4000 > Project: Apache Drill > Issue Type: Bug >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.3.0 > > > Drill is creating ephemeral storage plugin instances when a plan is > deserialized. As such, every minor fragment of a query has Drill create a > separate storage plugin instance. Depending on the cost of storage plugin > creation, this could be quite expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3921) Hive LIMIT 1 queries take too long
[ https://issues.apache.org/jira/browse/DRILL-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984665#comment-14984665 ] ASF GitHub Bot commented on DRILL-3921: --- Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/197 > Hive LIMIT 1 queries take too long > -- > > Key: DRILL-3921 > URL: https://issues.apache.org/jira/browse/DRILL-3921 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Reporter: Sudheesh Katkam >Assignee: Sudheesh Katkam > > Fragment initialization on a Hive table (that is backed by a directory of > many files) can take really long. This is evident through LIMIT 1 queries. > The root cause is that the underlying reader in the HiveRecordReader is > initialized when the ctor is called, rather than when setup is called. > Two changes need to be made: > 1) lazily initialize the underlying record reader in HiveRecordReader > 2) allow for running a callable as a proxy user within an operator (through > OperatorContext). This is required as initialization of the underlying record > reader needs to be done as a proxy user (proxy for owner of the file). > Previously, this was handled while creating the record batch tree. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3992) Unable to query Oracle DB using JDBC Storage Plug-In
[ https://issues.apache.org/jira/browse/DRILL-3992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984635#comment-14984635 ] Steven Phillips commented on DRILL-3992: +1 > Unable to query Oracle DB using JDBC Storage Plug-In > > > Key: DRILL-3992 > URL: https://issues.apache.org/jira/browse/DRILL-3992 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 > Environment: Windows 7 Enterprise 64-bit, Oracle 10g, Teradata 15.00 >Reporter: Eric Roma >Priority: Minor > Labels: newbie > Fix For: 1.2.0 > > > *See External Issue URL for Stack Overflow Post* > *Appears to be similar issue at > http://stackoverflow.com/questions/33370438/apache-drill-1-2-and-sql-server-jdbc* > Using Apache Drill v1.2 and Oracle Database 10g Enterprise Edition Release > 10.2.0.4.0 - 64bit in embedded mode. > I'm curious if anyone has had any success connecting Apache Drill to an > Oracle DB. I've updated the drill-override.conf with the following > configurations (per documents): > drill.exec: { > cluster-id: "drillbits1", > zk.connect: "localhost:2181", > drill.exec.sys.store.provider.local.path = "/mypath" > } > and placed the ojdbc6.jar in \apache-drill-1.2.0\jars\3rdparty. I can > successfully create the storage plug-in: > { > "type": "jdbc", > "driver": "oracle.jdbc.driver.OracleDriver", > "url": "jdbc:oracle:thin:@::", > "username": "USERNAME", > "password": "PASSWORD", > "enabled": true > } > but when I issue a query such as: > select * from ..`dual`; > I get the following error: > Query Failed: An Error Occurred > org.apache.drill.common.exceptions.UserRemoteException: VALIDATION ERROR: > From line 1, column 15 to line 1, column 20: Table > '..dual' not found [Error Id: > 57a4153c-6378-4026-b90c-9bb727e131ae on :]. > I've tried to query other schema/tables and get a similar result. I've also > tried connecting to Teradata and get the same error. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4000) In all non-root fragments, Drill recreates storage plugin instances for every minor fragment
[ https://issues.apache.org/jira/browse/DRILL-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984632#comment-14984632 ] ASF GitHub Bot commented on DRILL-4000: --- Github user StevenMPhillips commented on the pull request: https://github.com/apache/drill/pull/227#issuecomment-152887889 +1 > In all non-root fragments, Drill recreates storage plugin instances for every > minor fragment > > > Key: DRILL-4000 > URL: https://issues.apache.org/jira/browse/DRILL-4000 > Project: Apache Drill > Issue Type: Bug >Reporter: Jacques Nadeau >Assignee: Jacques Nadeau > Fix For: 1.3.0 > > > Drill is creating ephemeral storage plugin instances when a plan is > deserialized. As such, every minor fragment of a query has Drill create a > separate storage plugin instance. Depending on the cost of storage plugin > creation, this could be quite expensive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3956) TEXT MySQL type unsupported
[ https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984629#comment-14984629 ] Steven Phillips commented on DRILL-3956: +1 > TEXT MySQL type unsupported > --- > > Key: DRILL-3956 > URL: https://issues.apache.org/jira/browse/DRILL-3956 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.2.0 >Reporter: Andrew >Assignee: Steven Phillips > Attachments: DRILL-3956.patch > > > The JDBC storage plugin will fail with an NPE when querying a MySQL table > that has a 'TEXT' column. The underlying problem appears to be that Calcite > has no notion of this type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3983) Small test improvements
[ https://issues.apache.org/jira/browse/DRILL-3983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984619#comment-14984619 ] ASF GitHub Bot commented on DRILL-3983: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/221#issuecomment-152885146 lgtm. +1. Will merge shortly. > Small test improvements > --- > > Key: DRILL-3983 > URL: https://issues.apache.org/jira/browse/DRILL-3983 > Project: Apache Drill > Issue Type: Test >Reporter: Julien Le Dem >Assignee: Julien Le Dem > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3956) TEXT MySQL type unsupported
[ https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3956: -- Assignee: Steven Phillips (was: Jacques Nadeau) > TEXT MySQL type unsupported > --- > > Key: DRILL-3956 > URL: https://issues.apache.org/jira/browse/DRILL-3956 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.2.0 >Reporter: Andrew >Assignee: Steven Phillips > Attachments: DRILL-3956.patch > > > The JDBC storage plugin will fail with an NPE when querying a MySQL table > that has a 'TEXT' column. The underlying problem appears to be that Calcite > has no notion of this type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3956) TEXT MySQL type unsupported
[ https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3956: -- Attachment: DRILL-3956.patch > TEXT MySQL type unsupported > --- > > Key: DRILL-3956 > URL: https://issues.apache.org/jira/browse/DRILL-3956 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.2.0 >Reporter: Andrew >Assignee: Jacques Nadeau > Attachments: DRILL-3956.patch > > > The JDBC storage plugin will fail with an NPE when querying a MySQL table > that has a 'TEXT' column. The underlying problem appears to be that Calcite > has no notion of this type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (DRILL-3956) TEXT MySQL type unsupported
[ https://issues.apache.org/jira/browse/DRILL-3956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau reassigned DRILL-3956: - Assignee: Jacques Nadeau (was: Andrew) > TEXT MySQL type unsupported > --- > > Key: DRILL-3956 > URL: https://issues.apache.org/jira/browse/DRILL-3956 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Other >Affects Versions: 1.2.0 >Reporter: Andrew >Assignee: Jacques Nadeau > > The JDBC storage plugin will fail with an NPE when querying a MySQL table > that has a 'TEXT' column. The underlying problem appears to be that Calcite > has no notion of this type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3994) Build Fails on Windows after DRILL-3742
[ https://issues.apache.org/jira/browse/DRILL-3994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984612#comment-14984612 ] ASF GitHub Bot commented on DRILL-3994: --- Github user jacques-n commented on the pull request: https://github.com/apache/drill/pull/226#issuecomment-152881991 Still broken: [ERROR] Failed to execute goal org.apache.drill.tools:drill-fmpp-maven-plugin:1.3.0-SNAPSHOT:generate (generate-fmpp) on project drill-java-exec: FMPP processin g session failed. [ERROR] Caused by: A listener Java object has failed to handle event "end processing session". The class of the failing listener object is org.apache.drill.fmpp.mojo.FMPPMojo$1. [ERROR] Caused by: org.apache.maven.plugin.MojoFailureException: C:\Users\jnadeau\AppData\Local\Temp\freemarker-tmp8027657275379826459\javacc\Parser.jj should start with C:\Users\jnadeau\AppData\Local\Temp\freemarker-tmp8027657275379826459/ > Build Fails on Windows after DRILL-3742 > --- > > Key: DRILL-3994 > URL: https://issues.apache.org/jira/browse/DRILL-3994 > Project: Apache Drill > Issue Type: Bug > Components: Tools, Build & Test >Reporter: Sudheesh Katkam >Assignee: Julien Le Dem >Priority: Critical > Fix For: 1.3.0 > > > Build fails on Windows on the latest master: > {code} > c:\drill> mvn clean install -DskipTests > ... > [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 > approved: 169 licence. > [INFO] > [INFO] <<< exec-maven-plugin:1.2.1:java (default) < validate @ drill-common > <<< > [INFO] > [INFO] --- exec-maven-plugin:1.2.1:java (default) @ drill-common --- > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See > http://www.slf4j.org/codes.html#StaticLoggerBinder > for further details. > Scanning: C:\drill\common\target\classes > [WARNING] > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.IllegalArgumentException: > file:C:/drill/common/target/classes/ not in > [file:/C:/drill/common/target/classes/] > at > org.apache.drill.common.scanner.BuildTimeScan.main(BuildTimeScan.java:129) > ... 6 more > [INFO] > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache Drill Root POM .. SUCCESS [ 10.016 > s] > [INFO] tools/Parent Pom ... SUCCESS [ 1.062 > s] > [INFO] tools/freemarker codegen tooling ... SUCCESS [ 6.922 > s] > [INFO] Drill Protocol . SUCCESS [ 10.062 > s] > [INFO] Common (Logical Plan, Base expressions) FAILURE [ 9.954 > s] > [INFO] contrib/Parent Pom . SKIPPED > [INFO] contrib/data/Parent Pom SKIPPED > [INFO] contrib/data/tpch-sample-data .. SKIPPED > [INFO] exec/Parent Pom SKIPPED > [INFO] exec/Java Execution Engine . SKIPPED > [INFO] exec/JDBC Driver using dependencies SKIPPED > [INFO] JDBC JAR with all dependencies . SKIPPED > [INFO] contrib/mongo-storage-plugin ... SKIPPED > [INFO] contrib/hbase-storage-plugin ... SKIPPED > [INFO] contrib/jdbc-storage-plugin SKIPPED > [INFO] contrib/hive-storage-plugin/Parent Pom . SKIPPED > [INFO] contrib/hive-storage-plugin/hive-exec-shaded ... SKIPPED > [INFO] contrib/hive-storage-plugin/core ... SKIPPED > [INFO] contrib/drill-gis-plugin ... SKIPPED > [INFO] Packaging and Distribution Assembly SKIPPED > [INFO] contrib/sqlline SKIPPED > [INFO] > > [INFO] BUILD FAILURE > [INFO] > > [INFO] Total time: 38.813 s > [INFO] Finished at: 2015-10-28T12:17:19-07:00 > [INFO] Final Memory: 67M/466M > [INFO] > > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:j
[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410 ] Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 10:10 PM: - Chain of bugs and problems encountered and (partially) addressed: 1. {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema). 2. {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide). 3. {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results). 4. {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions). 5. {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions). 6. {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note: DRILL-2288 does not address other problems with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.] 7. HBase tests used only one table region, ignoring known problems with multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.) \[Note: DRILL-2288 addresses only one test table (increasing the number of regions on the other test tables exposed at least one other problem; others remain).] 8. {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}} dummy columns got created, causing spurious schema changes and downstream exceptions). 9. Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}}, {{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ). 10. {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ). 11. {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}"). \[Note: DRILL-2288 does not address the root problem.] 12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case of a zero-record record batch (so when it read a zero-row record batch, it caused a memory leak reported at Drillbit shutdown time). 13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited identifiers of a form (with a period) that Drill can't handle (so the test failed when the test ran with multiple fragments). was (Author: dsbos): Chain of bugs and problems encountered and (partially) addressed: 1. {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema). 2. {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide). 3. {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results). 4. {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions). 5. {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions). 6. {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[
[jira] [Comment Edited] (DRILL-2288) ScanBatch violates IterOutcome protocol for zero-row sources [was: missing JDBC metadata (schema) for 0-row results...]
[ https://issues.apache.org/jira/browse/DRILL-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983410#comment-14983410 ] Daniel Barclay (Drill) edited comment on DRILL-2288 at 11/1/15 9:59 PM: Chain of bugs and problems encountered and (partially) addressed: 1. {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema). 2. {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide). 3. {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results). 4. {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions). 5. {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions). 6. {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note: DRILL-2288 does not address other problems with {{NullableIntVector}} dummy columns from {{JsonRecordReader}}.] 7. HBase tests used only one table region, ignoring known problems with multi-region HBase tables (so latent {{HBaseRecordReader}} problems were left undetected and unresolved.) \[Note: DRILL-2288 addresses only one test table (increasing the number of regions on the other test tables exposes at least one other problem).] 8. {{HBaseRecordReader}} didn't create a {{MapVector}} for every column family (so {{NullableIntVector}} dummy columns got created, causing spurious schema changes and downstream exceptions). 9. Some {{RecordBatch}} classes didn't reset their record counts to zero ({{OrderedPartitionRecordBatch.recordCount}}, {{ProjectRecordBatch.recordCount}}, and/or {{TopNBatch.recordCount}}) (so downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ). 10. {{RecordBatchLoader}}'s record count was not reset to zero by {{UnorderedReceiverBatch}} (so, again, downstream code tried to access elements of (correctly) empty vectors, yielding {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}") ). 11. {{MapVector.load(...)}} left some existing vectors empty, not matching the returned length and the length of sibling vectors (so {{MapVector.getObject(int)}} got {{IndexOutOfBoundException}} (with ~"... {{range (0, 0)}}"). \[Note: DRILL-2288 does not address the root problem.] 12. {{BaseTestQuery.printResult(...)}} skipped deallocation calls in the case of a zero-record record batch (so when it read a zero-row record batch, it caused a memory leak reported at Drillbit shutdown time). 13. {{TestHBaseProjectPushDown.testRowKeyAndColumnPushDown()}} used delimited identifiers of a form (with a period) that Drill can't handle (so the test failed when the test ran with multiple fragments). was (Author: dsbos): Chain of bugs and problems encountered and (partially) addressed: 1. {{ScanBatch.next()}} returned {{NONE}} without ever returning {{OK_NEW_SCHEMA}} for a source having zero rows (so downstream operators didn't get its schema, even for static-schema sources, or even get trigger to update their own schema). 2. {{RecordBatch.IterOutcome}}, especially the allowed sequence of values, was not documented clearly (so developers didn't know correctly what to expect or provide). 3. {{IteratorValidatorBatchIterator}} didn't validate the sequence of {{IterOutcome values}} (so developers weren't notified about incorrect results). 4. {{UnionAllRecordBatch}} did not interpret {{NONE}} and {{OK_NEW_SCHEMA}} correctly (so it reported spurious/incorrect schema-change and/or empty-/non-empty input exceptions). 5. {{ScanBatch.Mutator.isNewSchema()}} didn't handle a short-circuit OR {"{{||}}"} correctly in calling {{SchemaChangeCallBack.getSchemaChange()}} (so it didn't reset nested schema-change state, and so caused spurious {{OK_NEW_SCHEMA}} notifications and downstream exceptions). 6. {{JsonRecordReader.ensureAtLeastOneField()}} didn't check whether any field already existed in the batch (so in that case it forcibly changed the type to {{NullableIntVector}}, causing schema changes and downstream exceptions). \[Note: DRILL-22
[jira] [Updated] (DRILL-3952) Improve Window Functions performance when not all batches are required to process the current batch
[ https://issues.apache.org/jira/browse/DRILL-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha updated DRILL-3952: -- Assignee: Deneche A. Hakim (was: Aman Sinha) > Improve Window Functions performance when not all batches are required to > process the current batch > --- > > Key: DRILL-3952 > URL: https://issues.apache.org/jira/browse/DRILL-3952 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.2.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim > Fix For: 1.3.0 > > > Currently, the window operator blocks until all batches of current partition > to be available. For some queries it's necessary (e.g. aggregate with no > order-by in the window definition), but for other cases the window operator > can process and pass the current batch downstream sooner. > Implementing this should help the window operator use less memory and run > faster, especially in the presence of a limit operator. > The purpose of this JIRA is to improve the window operator in the following > cases: > - aggregate, when order-by clause is available in window definition, can > process current batch as soon as it receives the last peer row > - lead can process current batch as soon as it receives 1 more batch > - lag can process current batch immediately > - first_value can process current batch immediately > - last_value, when order-by clause is available in window definition, can > process current batch as soon as it receives the last peer row > - row_number, rank and dense_rank can process current batch immediately -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key
[ https://issues.apache.org/jira/browse/DRILL-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Sinha resolved DRILL-3538. --- Resolution: Not A Problem Per previous comments, marking this as working as designed. > We do not prune partitions when we count over partitioning key and filter > over partitioning key > --- > > Key: DRILL-3538 > URL: https://issues.apache.org/jira/browse/DRILL-3538 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 > Environment: 4 node cluster on CentOS >Reporter: Khurram Faraaz >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.3.0 > > > We are not partition pruning when we do a count over partitioning key and > when the predicate involves the partitioning key. CTAS used was, > {code} > create table t3214 partition by (key2) as select cast(key1 as double) key1, > cast(key2 as char(1)) key2 from `twoKeyJsn.json`; > {code} > case 1) We do not do partition pruning in this case. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02Project(EXPR$0=[$0]) > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7]) > {code} > case 2) We do not do partition pruning in this case. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02Project(EXPR$0=[$0]) > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2]) > {code} > case 3) We do not do partition pruning in this case. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02Project(EXPR$0=[$0]) > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0]) > {code} > case 4) we do prune here. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), > $1)):ANY NOT NULL]) > 00-02StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)]) > 00-03 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)]) > 00-04Project(key1=[$1]) > 00-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], > selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]]) > {code} > case 5) we do prune here. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[MIN($0)]) > 00-03 StreamAgg(group=[{}], EXPR$0=[MIN($0)]) > 00-04Project(key1=[$1]) > 00-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], > selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]]) > {code} > commit id that I am testing on : 17e580a7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key
[ https://issues.apache.org/jira/browse/DRILL-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984483#comment-14984483 ] Aman Sinha commented on DRILL-3538: --- [~khfaraaz] I am not sure why you say we are not pruning in cases 1, 2, 3. The Explain looks fine to me. There is no Filter node in the plan which indicates it has been pushed into the Scan. The reason you see the Scan showing a PojoRecordReader is that for a trivial COUNT(*) query on Parquet data, Drill optimizes by reading the row count directly from the metadata instead of doing it through a separate aggregation. If you are specifically looking for the Scan to display the attributes it displays for a regular scan, that's a separate issue. > We do not prune partitions when we count over partitioning key and filter > over partitioning key > --- > > Key: DRILL-3538 > URL: https://issues.apache.org/jira/browse/DRILL-3538 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 > Environment: 4 node cluster on CentOS >Reporter: Khurram Faraaz >Assignee: Aman Sinha >Priority: Critical > Fix For: 1.3.0 > > > We are not partition pruning when we do a count over partitioning key and > when the predicate involves the partitioning key. CTAS used was, > {code} > create table t3214 partition by (key2) as select cast(key1 as double) key1, > cast(key2 as char(1)) key2 from `twoKeyJsn.json`; > {code} > case 1) We do not do partition pruning in this case. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02Project(EXPR$0=[$0]) > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7]) > {code} > case 2) We do not do partition pruning in this case. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02Project(EXPR$0=[$0]) > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2]) > {code} > case 3) We do not do partition pruning in this case. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02Project(EXPR$0=[$0]) > 00-03 > Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0]) > {code} > case 4) we do prune here. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), > $1)):ANY NOT NULL]) > 00-02StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)]) > 00-03 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)]) > 00-04Project(key1=[$1]) > 00-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], > selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]]) > {code} > case 5) we do prune here. > {code} > 0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 > where key2 = 'm'; > +--+--+ > | text | json | > +--+--+ > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[MIN($0)]) > 00-03 StreamAgg(group=[{}], EXPR$0=[MIN($0)]) > 00-04Project(key1=[$1]) > 00-05 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], > selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]]) > {code} > commit id that I am testing on : 17e580a7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4001) Empty vectors from previous batch left by MapVector.load(...)/RecordBatchLoader.load(...)
[ https://issues.apache.org/jira/browse/DRILL-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-4001: -- Component/s: Execution - Data Types > Empty vectors from previous batch left by > MapVector.load(...)/RecordBatchLoader.load(...) > - > > Key: DRILL-4001 > URL: https://issues.apache.org/jira/browse/DRILL-4001 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Reporter: Daniel Barclay (Drill) > > In certain cases, {{MapVector.load(...)}} (called by > {{RecordBatchLoader.load(...)}}) returns with some map child vectors having a > length of zero instead of having a length matching the length of sibling > vectors and the number of records in the batch. (This caused some of the > {{IndexOutOfBoundException}} errors seen in fixing DRILL-2288.) > The condition seems to be that a child field (e.g., an HBase column in a > HBase column family) appears in an earlier batch and does not appear in a > later batch. > (The HBase column's child vector gets created (in the MapVector for the HBase > column family) during loading of the earlier batch. During loading of the > later batch, all vectors get reset to zero length, and then only vectors for > fields _appearing in the batch message being loaded_ get loaded and set to > the length of the batch-\-other vectors created from earlier > messages/{{load}} calls are left with a length of zero (instead of, say, > being filled with nulls to the length of their siblings and the current > record batch).) > See the TODO(DRILL-) mark and workaround in {{MapVector.getObject(int)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)