[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1170#discussion_r178222960 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/limit/LimitRecordBatch.java --- @@ -60,13 +60,7 @@ public LimitRecordBatch(Limit popConfig, FragmentContext context, RecordBatch in protected boolean setupNewSchema() throws SchemaChangeException { container.zeroVectors(); transfers.clear(); - - -for(final VectorWrapper v : incoming) { - final TransferPair pair = v.getValueVector().makeTransferPair( - container.addOrGet(v.getField(), callBack)); - transfers.add(pair); -} +container.onSchemaChange(incoming, callBack, transfers); --- End diff -- `onSchemaChange()` may perhaps be the wrong name. It is why this functionality is called in this case. But, the actual functionality is closer to `setupTransfers()` (assuming the removed code was simply moved into the container class...) ---
[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1170#discussion_r178225930 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -136,14 +138,28 @@ public void transferOut(VectorContainer containerOut) { public T addOrGet(final MaterializedField field, final SchemaChangeCallBack callBack) { final TypedFieldId id = getValueVectorId(SchemaPath.getSimplePath(field.getName())); final ValueVector vector; -final Class clazz = TypeHelper.getValueVectorClass(field.getType().getMinorType(), field.getType().getMode()); + if (id != null) { - vector = getValueAccessorById(id.getFieldIds()).getValueVector(); + vector= getValueAccessorById(id.getFieldIds()).getValueVector(); + final Class clazz = TypeHelper.getValueVectorClass(field.getType().getMinorType(), field.getType().getMode()); + + // Check whether incoming field and the current one are compatible; if not then replace previous one with the new one if (id.getFieldIds().length == 1 && clazz != null && !clazz.isAssignableFrom(vector.getClass())) { final ValueVector newVector = TypeHelper.getNewVector(field, this.getAllocator(), callBack); replace(vector, newVector); return (T) newVector; } + + // At this point, we know incoming and current fields are compatible. Maps can have children, + // we need to ensure they have the same structure. + if (MinorType.MAP.equals(field.getType().getMinorType()) + && vector != null + && !SchemaUtil.isSameSchemaIncludingOrder(vector.getField().getChildren(), field.getChildren())) { + +final ValueVector newVector = TypeHelper.getNewVector(field, this.getAllocator(), callBack); +replace(vector, newVector); +return (T) newVector; --- End diff -- We have two vectors, both maps. We found out their schemas differ. We are throwing away the old map, replacing it with a new one with no members. Is that what we want to do? Or, do we want to recursively merge the maps? Is this done elsewhere? If so, how do we remember to do that merge? ---
[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...
Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/1170#discussion_r178225725 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java --- @@ -136,14 +138,28 @@ public void transferOut(VectorContainer containerOut) { public T addOrGet(final MaterializedField field, final SchemaChangeCallBack callBack) { final TypedFieldId id = getValueVectorId(SchemaPath.getSimplePath(field.getName())); final ValueVector vector; -final Class clazz = TypeHelper.getValueVectorClass(field.getType().getMinorType(), field.getType().getMode()); + if (id != null) { - vector = getValueAccessorById(id.getFieldIds()).getValueVector(); + vector= getValueAccessorById(id.getFieldIds()).getValueVector(); + final Class clazz = TypeHelper.getValueVectorClass(field.getType().getMinorType(), field.getType().getMode()); + + // Check whether incoming field and the current one are compatible; if not then replace previous one with the new one if (id.getFieldIds().length == 1 && clazz != null && !clazz.isAssignableFrom(vector.getClass())) { final ValueVector newVector = TypeHelper.getNewVector(field, this.getAllocator(), callBack); replace(vector, newVector); return (T) newVector; } + + // At this point, we know incoming and current fields are compatible. Maps can have children, + // we need to ensure they have the same structure. + if (MinorType.MAP.equals(field.getType().getMinorType()) --- End diff -- This is tricky and probably broken elsewhere. If the incoming type is a union or a list, then it can contain a nested map. ---
[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...
GitHub user sachouche opened a pull request: https://github.com/apache/drill/pull/1170 DRILL-6223: Fixed several Drillbit failures due to schema changes Fixed several Issues due to Schema changes: 1) Changes in complex data types Drill Query Failing when selecting all columns from a Complex Nested Data File (Parquet) Set). There are differences in Schema among the files: The Parquet files exhibit differences both at the first level and within nested data types A select * will not cause an exception but using a limit clause will Note also this issue seems to happen only when multiple Drillbit minor fragments are involved (concurrency higher than one) 2) Dangling columns (both simple and complex) This situation can be easily reproduced for: - Select STAR queries which involve input data with different schemas - LIMIT or / and PROJECT operators are used - The data will be read from more than one minor fragment - This is because individual readers have logic to handle such use-cases but not downstream operators - So is reader-1 sends one batch with F1, F2, and F3 - The reader-2 sends batch F2, F3 - Then the LIMIT and PROJECT operator will fail to cleanup the dangling column F1 which will cause failures when downstream operators copy logic attempts copy the stale column F1 - This pull request adds logic to detect and eliminate dangling columns You can merge this pull request into a Git repository by running: $ git pull https://github.com/sachouche/drill DRILL-6223 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/1170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1170 commit d986b6c7588c107bb7e49d2fc8eb3f25a60e1214 Author: Salim AchoucheDate: 2018-02-21T02:17:14Z DRILL-6223: Fixed several Drillbit failures due to schema changes ---