[GitHub] [spark] thiyaga commented on pull request #38001: [SPARK-40562][SQL] Add `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`

GitBox Mon, 26 Sep 2022 12:29:43 -0700


thiyaga commented on PR #38001:
URL: https://github.com/apache/spark/pull/38001#issuecomment-1258517989


   We use grouping sets on our queries and rely on `grouping__id` to use as an 
identifier to query the data for respective group. If we use `grouping__id` 
directly, it will be prone to change if grouping set changes (for e.g. adding 
new grouping set/ adding new column to existing grouping set). Any grouping id 
change will make things even more complex when consuming this data directly 
from reporting tools like Tableau . We need to do the one of the following 
options to mitigate the changing `grouping__id`
   
   1. Either we need to transform the `grouping__id` to something that won't be 
impacted when the grouping set changes and deterministic (for e.g convert 
`grouping__id` to `group_name`)
   2. Have some sort of logical DB view which will handle the transformation at 
runtime (for e.g. using CASE WHEN)
   
   In essence, we always have dependency with `grouping__id` when grouping sets 
are used in our query. Any change in the grouping id generation will have 
immediate impact. This new parameter will help us to use the legacy logic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] thiyaga commented on pull request #38001: [SPARK-40562][SQL] Add `spark.sql.legacy.groupingIdWithAppendedUserGroupBy`

Reply via email to