[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

viirya Sun, 03 Sep 2017 06:05:32 -0700

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/18865
  
    @HyukjinKwon Thanks for the comment.
    
    I think the current behavior confuses users in some ways, as it can have 
weird query results shown in previous discussion.
    
    The previous fix that incorporates all data columns if only 
`_corrupt_record` is selected, doesn't get consensus from @cloud-fan.
    
    > If _corrupted_record is designed to have different values for different 
selected columns, it may makes sense to set _corrupted_record to null if no 
columns are selected.
    
    I'd agree with @cloud-fan that this may make sense. But it can confuse 
users and I don't think it would be useful with this usage. So with the 
scenarios mentioned by @dm-tran, we'd better tell users how to achieve the 
results they want.
    
    An exception is one option. Maybe a log info message is another option 
which doesn't directly disallow users to do that but provides hints for them.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #18865: [SPARK-21610][SQL] Corrupt records are not handled prope...

Reply via email to