sigmod edited a comment on pull request #32298: URL: https://github.com/apache/spark/pull/32298#issuecomment-1075524713
> But I think that an extra shuffle could mean performance degradation > in case of scalar subqueries (CTEs returning only one row). Is it still way better than running the scalar subqueries over the same table multiple times? I'm more worried about the complexities (i.e., pattern matching cognitive overhead) with new plan nodes like CommonSubqueries and CommonSubqueriesExec. Many rules have been implemented as pattern matching, e.g., a rule that matches a Project is supposed to also match CommonSubqueries (in theory)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
