[GitHub] [spark] linhongliu-db opened a new pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

GitBox Thu, 21 Jan 2021 20:58:23 -0800


linhongliu-db opened a new pull request #31286:
URL: https://github.com/apache/spark/pull/31286



   ### What changes were proposed in this pull request?
   In spark, the `count(table.*)` may cause very weird result, for example:
   ```
   select count(*) from (select 1 as a, null as b) t;
   output: 1
   select count(t.*) from (select 1 as a, null as b) t;
   output: 0
   ```
    This is because spark expands `t.*` while converts `*` to count(1), this 
will confuse
   users. After checking the ANSI standard, `count(*)` should always be 
`count(1)` while `count(t.*)`
   is not allowed. What's more, this is also not allowed by common databases, 
e.g. MySQL, Oracle.
   
   So, this PR proposes to block the ambiguous behavior and print a clear error 
message for users.
   
   ### Why are the changes needed?
   to avoid ambiguous behavior and follow ANSI standard and other SQL engines
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, `count(table.*)` behavior will be blocked and output an error message.
   
   
   ### How was this patch tested?
   newly added and existing tests


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] linhongliu-db opened a new pull request #31286: [SPARK-34199][SQL] Block `table.*` inside function to follow ANSI standard and other SQL engines

Reply via email to