Github user zheh12 commented on the issue:
https://github.com/apache/spark/pull/21554
I know this sql standard.
But I wonder If I use `query.schema`, how it will affect the logic of
by-position.
I think we should let datasource implement has the ability to decide use
by-position or by-name.
As the implement of kudu-spark, it decides to use by-name with this map
```
val indices: Array[(Int, Int)] = schema.fields.zipWithIndex.map({ case
(field, sparkIdx) =>
sparkIdx -> table.getSchema.getColumnIndex(field.name)
})
```
But now we give a wrong shcmea, it always be something like (0,0), (1,1),
it always be by-position.
But I think this code want to be by-name. Beacuse kudu schema must put
primary key first, so it always has different order from other table schema.
When create dataframe with `query.schema`, there will no error by-position,
but add the possibility to let
datasource to choose by-name or by-position.
But now the datasource must be by-position.
And more, As a developer, I choose to implement InsertableRelation
```
trait InsertableRelation {
def insert(data: DataFrame, overwrite: Boolean): Unit
}
```
I have the possibility get the wrong schema, and I can't find nothing wrong
with the dataframe.
@cloud-fan What I think is right?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]