Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/4729#issuecomment-76315917
I can't tell if SPARK-5508 is using InsertIntoHive or not. I didn't see if
`spark.sql.parquet.useDataSourceApi` is turning on or off in that JIRA.
If you simple replace `InsertIntoTable`'s relation to `ParquetConversions`,
then you will get `org.apache.spark.sql.AnalysisException`. So I don't know why
you said the test is passed.
For SPARK-5950, there are few issues:
1 It the problem of `ParquetConversions`. As you did in #4782,
`InsertIntoTable`'s table is never replaced.
2 `AnalysisException`. That is why I use `InsertIntoHiveTable` to replace
`InsertIntoTable` in `ParquetConversions`. Because `InsertIntoHiveTable`
doesn't check the equality of `containsNull`.
3 Since the `containsNull` of `ArrayType`, `MapType`, `StructType` is set
to true by default, the schema of created Parquet table always has
`containsNull` as true. Later, when you try to insert data that has same schema
but only with different `containsNull` value, Parquet library will complain
that the schema is different. So the reading will fail.
This pr has solved all the three problems (I will update for `MapType`,
`StructType`). #4782 just considers the first one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]