sigmod commented on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-1075524713
> But I think that an extra shuffle could mean performance degradation > in case of scalar subqueries (CTEs returning only one row). Is it still way better than running the scalar subqueries over the same table multiple times? I'm more worried about the complexities (i.e., pattern matching cognitive overhead) with CommonSubqueries and CommonSubqueriesExec. E.g., iiuc, a logical rule to optimize scalar subqueries won't be able to traverse into the subqueries inside CommonSubqueries? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
