[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...

2018-03-29 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1170#discussion_r178222960
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/limit/LimitRecordBatch.java
 ---
@@ -60,13 +60,7 @@ public LimitRecordBatch(Limit popConfig, FragmentContext 
context, RecordBatch in
   protected boolean setupNewSchema() throws SchemaChangeException {
 container.zeroVectors();
 transfers.clear();
-
-
-for(final VectorWrapper v : incoming) {
-  final TransferPair pair = v.getValueVector().makeTransferPair(
-  container.addOrGet(v.getField(), callBack));
-  transfers.add(pair);
-}
+container.onSchemaChange(incoming, callBack, transfers);
--- End diff --

`onSchemaChange()` may perhaps be the wrong name. It is why this 
functionality is called in this case. But, the actual functionality is closer 
to `setupTransfers()` (assuming the removed code was simply moved into the 
container class...)


---


[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...

2018-03-29 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1170#discussion_r178225930
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
@@ -136,14 +138,28 @@ public void transferOut(VectorContainer containerOut) 
{
   public  T addOrGet(final MaterializedField field, 
final SchemaChangeCallBack callBack) {
 final TypedFieldId id = 
getValueVectorId(SchemaPath.getSimplePath(field.getName()));
 final ValueVector vector;
-final Class clazz = 
TypeHelper.getValueVectorClass(field.getType().getMinorType(), 
field.getType().getMode());
+
 if (id != null) {
-  vector = getValueAccessorById(id.getFieldIds()).getValueVector();
+  vector= 
getValueAccessorById(id.getFieldIds()).getValueVector();
+  final Class clazz  = 
TypeHelper.getValueVectorClass(field.getType().getMinorType(), 
field.getType().getMode());
+
+  // Check whether incoming field and the current one are compatible; 
if not then replace previous one with the new one
   if (id.getFieldIds().length == 1 && clazz != null && 
!clazz.isAssignableFrom(vector.getClass())) {
 final ValueVector newVector = TypeHelper.getNewVector(field, 
this.getAllocator(), callBack);
 replace(vector, newVector);
 return (T) newVector;
   }
+
+  // At this point, we know incoming and current fields are 
compatible. Maps can have children,
+  // we need to ensure they have the same structure.
+  if (MinorType.MAP.equals(field.getType().getMinorType())
+   && vector != null
+   && 
!SchemaUtil.isSameSchemaIncludingOrder(vector.getField().getChildren(), 
field.getChildren())) {
+
+final ValueVector newVector = TypeHelper.getNewVector(field, 
this.getAllocator(), callBack);
+replace(vector, newVector);
+return (T) newVector;
--- End diff --

We have two vectors, both maps. We found out their schemas differ. We are 
throwing  away the old map, replacing it with a new one with no members. Is 
that what we want to do? Or, do we want to recursively merge the maps? Is this 
done elsewhere? If so, how do we remember to do that merge?


---


[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...

2018-03-29 Thread paul-rogers
Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/1170#discussion_r178225725
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/VectorContainer.java 
---
@@ -136,14 +138,28 @@ public void transferOut(VectorContainer containerOut) 
{
   public  T addOrGet(final MaterializedField field, 
final SchemaChangeCallBack callBack) {
 final TypedFieldId id = 
getValueVectorId(SchemaPath.getSimplePath(field.getName()));
 final ValueVector vector;
-final Class clazz = 
TypeHelper.getValueVectorClass(field.getType().getMinorType(), 
field.getType().getMode());
+
 if (id != null) {
-  vector = getValueAccessorById(id.getFieldIds()).getValueVector();
+  vector= 
getValueAccessorById(id.getFieldIds()).getValueVector();
+  final Class clazz  = 
TypeHelper.getValueVectorClass(field.getType().getMinorType(), 
field.getType().getMode());
+
+  // Check whether incoming field and the current one are compatible; 
if not then replace previous one with the new one
   if (id.getFieldIds().length == 1 && clazz != null && 
!clazz.isAssignableFrom(vector.getClass())) {
 final ValueVector newVector = TypeHelper.getNewVector(field, 
this.getAllocator(), callBack);
 replace(vector, newVector);
 return (T) newVector;
   }
+
+  // At this point, we know incoming and current fields are 
compatible. Maps can have children,
+  // we need to ensure they have the same structure.
+  if (MinorType.MAP.equals(field.getType().getMinorType())
--- End diff --

This is tricky and probably broken elsewhere. If the incoming type is a 
union or a list, then it can contain a nested map.


---


[GitHub] drill pull request #1170: DRILL-6223: Fixed several Drillbit failures due to...

2018-03-15 Thread sachouche
GitHub user sachouche opened a pull request:

https://github.com/apache/drill/pull/1170

DRILL-6223: Fixed several Drillbit failures due to schema changes

Fixed several Issues due to Schema changes:
1) Changes in complex data types
Drill Query Failing when selecting all columns from a Complex Nested Data 
File (Parquet) Set). There are differences in Schema among the files:

The Parquet files exhibit differences both at the first level and within 
nested data types
A select * will not cause an exception but using a limit clause will
Note also this issue seems to happen only when multiple Drillbit minor 
fragments are involved (concurrency higher than one)

2) Dangling columns (both simple and complex)
This situation can be easily reproduced for:
- Select STAR queries which involve input data with different schemas
- LIMIT or / and PROJECT operators are used
- The data will be read from more than one minor fragment
- This is because individual readers have logic to handle such use-cases 
but not downstream operators
- So is reader-1 sends one batch with F1, F2, and F3
- The reader-2 sends batch F2, F3
- Then the LIMIT and PROJECT operator will fail to cleanup the dangling 
column F1 which will cause failures when downstream operators copy logic 
attempts copy the stale column F1
- This pull request adds logic to detect and eliminate dangling columns   

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sachouche/drill DRILL-6223

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1170


commit d986b6c7588c107bb7e49d2fc8eb3f25a60e1214
Author: Salim Achouche 
Date:   2018-02-21T02:17:14Z

DRILL-6223: Fixed several Drillbit failures due to schema changes




---