eatoncys commented on issue #23010: [SPARK-26012][SQL]Null and '' values should not cause dynamic partition failure of string types URL: https://github.com/apache/spark/pull/23010#issuecomment-465517963 > this looks a little hacky. How about we create an analyzer rule, which deals with `InsertIntoHadoopFsRelationCommand`, and changes its `query` field to do the empty-string-to-null for partition columns? Sorry, maybe I didn't understand the meaning of above correctly, does it mean add an analyzer rule to match `InsertIntoHadoopFsRelationCommand` in `Analyzer`, and changes its `query` field, but the `InsertIntoHadoopFsRelationCommand` is newly created after the `query` plan has been analyzed in the code below: def writeAndRead( mode: SaveMode, **data: LogicalPlan,** outputColumnNames: Seq[String], **physicalPlan: SparkPlan**): BaseRelation = { ... case format: FileFormat => **val cmd = planForWritingFileFormat(format, mode, _data_)** ... val resolved = cmd.copy( partitionColumns = resolvedPartCols, outputColumnNames = outputColumnNames) **resolved.run(sparkSession, _physicalPlan_)** `InsertIntoHadoopFsRelationCommand` is created by `planForWritingFileFormat`, and then `cmd.run` is called immediately with the `physicalPlan` which is analyzed already, so, where to add an analyzer rule?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
