pralabhkumar commented on PR #37009: URL: https://github.com/apache/spark/pull/37009#issuecomment-1191680230
> Hey, I think the fix here is too hacky. Can we make this working independently with other options being set? Hi @HyukjinKwon Thx for reviewing . Had again gone through the code. Here is my understanding (same is mentioned in jira) IMHO Setting nullValue option will not help here . Since whatever the value we set ',,] string will be converted to the value by Univocityparser(external) which we have set. For e.g A,, and if setNullValue(“B”) will result to A,B by univocity parser . Then spark univocity parser in nullSafeDatum Will always convert to null (since datum == options.nullValue) . So output will always be A, null whereas we need A,, .There IMHO setting nullValue will not help here unless we have options.naFilter value to False which will make sure the above condition doesn't satisfy. Now in case of missing values in the beginning and end of line , current logic in convert method of UnivocietyParser is to go into exception row.update(i, requiredSchema.existenceDefaultValues(i)) and update with default value. Now we don’t want values to set to null in case options.naFilter is false. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
