Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/14240
  
    @cloud-fan It is not a bug. I prefer to make them consistent. I did a few 
performance test and find it makes sense to return only one column, and then do 
the Filter, and then Project will generate two duplicate columns. This should 
be faster when Filter can remove most of rows. 
    
    However, this optimization condition `projectSet.size == projects.size` is 
very specific in this rare case: `SELECT b, b FROM oneToTenPruned`. It does not 
make sense to write such columns without specifying an alias. If using the 
alias, we will always return one column. This PR removed this condition, 
instead of adding the condition into the `Data Source Table Scan`.
    
    Let me know what is your opinion. Thanks!
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to