Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/6351#issuecomment-105712920
That is not fundamentally a problem. Honestly some more thought probably
needs to be put into the batches. Really the only reasons for splitting are
the following:
- Large batches are inherently more costly as you must go through every
rule, even if only a small number are making changes. So if rules will never
interact they can be in separate batches.
- However, large batches are more powerful as there is more opportunity
for rules to interact
- Its possible for rules to undo the result of other rules. In this case
they *must* be in separate batches or it will go back and forth till the limit
is reached
- Another reason for batches is satisfying preconditions. (i.e. a plan
must be analyzed before optimizing it).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]