Github user viirya commented on the issue:
https://github.com/apache/spark/pull/18865
@HyukjinKwon Thanks for the comment.
I think the current behavior confuses users in some ways, as it can have
weird query results shown in previous discussion.
The previous fix that incorporates all data columns if only
`_corrupt_record` is selected, doesn't get consensus from @cloud-fan.
> If _corrupted_record is designed to have different values for different
selected columns, it may makes sense to set _corrupted_record to null if no
columns are selected.
I'd agree with @cloud-fan that this may make sense. But it can confuse
users and I don't think it would be useful with this usage. So with the
scenarios mentioned by @dm-tran, we'd better tell users how to achieve the
results they want.
An exception is one option. Maybe a log info message is another option
which doesn't directly disallow users to do that but provides hints for them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]