[jira] [Commented] (SPARK-7754) Use PartialFunction literals instead of objects in Catalyst
[ https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553849#comment-14553849 ] Edoardo Vacchi commented on SPARK-7754: --- That's a fair observation; on the other hand: 1) is the code concerning rule application expecting throws at all? 2) RuleExecutor traces rules by their ruleName; if you really need named rules, you could use {{ rule.named(...) { } }} this does not solve the stack trace problem, but you'll still have meaningful information in the log Use PartialFunction literals instead of objects in Catalyst --- Key: SPARK-7754 URL: https://issues.apache.org/jira/browse/SPARK-7754 Project: Spark Issue Type: Improvement Reporter: Edoardo Vacchi Catalyst rules extend two distinct rule types: {{Rule[LogicalPlan]}} and {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}). The distinction is fairly subtle: in the end, both rule types are supposed to define a method {{apply(plan: LogicalPlan)}} (where LogicalPlan is either Logical- or Spark-) which returns a transformed plan (or a sequence thereof, in the case of Strategy). Ceremonies asides, the body of such method is always of the kind: {code:java} def apply(plan: PlanType) = plan match pf {code} where `pf` would be some `PartialFunction` of the PlanType: {code:java} val pf = { case ... = ... } {code} This is JIRA is a proposal to introduce utility methods to a) reduce the boilerplate to define rewrite rules b) turning them back into what they essentially represent: function types. These changes would be backwards compatible, and would greatly help in understanding what the code does. Current use of objects is redundant and possibly confusing. *{{Rule[LogicalPlan]}}* a) Introduce the utility object {code:java} object rule def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] = new Rule[LogicalPlan] { def apply (plan: LogicalPlan): LogicalPlan = plan transform pf } def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] = new Rule[LogicalPlan] { override val ruleName = name def apply (plan: LogicalPlan): LogicalPlan = plan transform pf } {code} b) progressively replace the boilerplate-y object definitions; e.g. {code:java} object MyRewriteRule extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case ... = ... } {code} with {code:java} // define a Rule[LogicalPlan] val MyRewriteRule = rule { case ... = ... } {code} and/or : {code:java} // define a named Rule[LogicalPlan] val MyRewriteRule = rule.named(My rewrite rule) { case ... = ... } {code} *Strategies* A similar solution could be applied to shorten the code for Strategies, which are total functions only because they are all supposed to manage the default case, possibly returning `Nil`. In this case we might introduce the following utility: {code:java} object strategy { /** * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. * The partial function must therefore return *one single* SparkPlan for each case. * The method will automatically wrap them in a [[Seq]]. * Unhandled cases will automatically return Seq.empty */ def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy = new Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty } /** * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ]. * The partial function must therefore return a Seq[SparkPlan] for each case. * Unhandled cases will automatically return Seq.empty */ def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy = new Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] } } {code} Usage: {code:java} val mystrategy = strategy { case ... = ... } val seqstrategy = strategy.seq { case ... = ... } {code} *Further possible improvements:* Making the utility methods `implicit`, thereby further reducing the rewrite rules to: {code:java} // define a PartialFunction[LogicalPlan, LogicalPlan] // the implicit would convert it into a Rule[LogicalPlan] at the use sites val MyRewriteRule = { case ... = ... } {code} *Caveats* Because of the way objects are initialized vs. vals, it might be necessary reorder instructions so that vals are actually initialized before they are used. E.g.: {code:java} class MyOptimizer extends
[jira] [Commented] (SPARK-7754) Use PartialFunction literals instead of objects in Catalyst
[ https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553617#comment-14553617 ] Reynold Xin commented on SPARK-7754: Note that there is at least one downside to this proposed refactoring: the stacktrace shown will be less useful due to anonymous classes. Use PartialFunction literals instead of objects in Catalyst --- Key: SPARK-7754 URL: https://issues.apache.org/jira/browse/SPARK-7754 Project: Spark Issue Type: Improvement Reporter: Edoardo Vacchi Catalyst rules extend two distinct rule types: {{Rule[LogicalPlan]}} and {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}). The distinction is fairly subtle: in the end, both rule types are supposed to define a method {{apply(plan: LogicalPlan)}} (where LogicalPlan is either Logical- or Spark-) which returns a transformed plan (or a sequence thereof, in the case of Strategy). Ceremonies asides, the body of such method is always of the kind: {code:java} def apply(plan: PlanType) = plan match pf {code} where `pf` would be some `PartialFunction` of the PlanType: {code:java} val pf = { case ... = ... } {code} This is JIRA is a proposal to introduce utility methods to a) reduce the boilerplate to define rewrite rules b) turning them back into what they essentially represent: function types. These changes would be backwards compatible, and would greatly help in understanding what the code does. Current use of objects is redundant and possibly confusing. *{{Rule[LogicalPlan]}}* a) Introduce the utility object {code:java} object rule def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] = new Rule[LogicalPlan] { def apply (plan: LogicalPlan): LogicalPlan = plan transform pf } def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): Rule[LogicalPlan] = new Rule[LogicalPlan] { override val ruleName = name def apply (plan: LogicalPlan): LogicalPlan = plan transform pf } {code} b) progressively replace the boilerplate-y object definitions; e.g. {code:java} object MyRewriteRule extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case ... = ... } {code} with {code:java} // define a Rule[LogicalPlan] val MyRewriteRule = rule { case ... = ... } {code} and/or : {code:java} // define a named Rule[LogicalPlan] val MyRewriteRule = rule.named(My rewrite rule) { case ... = ... } {code} *Strategies* A similar solution could be applied to shorten the code for Strategies, which are total functions only because they are all supposed to manage the default case, possibly returning `Nil`. In this case we might introduce the following utility: {code:java} object strategy { /** * Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan]. * The partial function must therefore return *one single* SparkPlan for each case. * The method will automatically wrap them in a [[Seq]]. * Unhandled cases will automatically return Seq.empty */ def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy = new Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty } /** * Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] ]. * The partial function must therefore return a Seq[SparkPlan] for each case. * Unhandled cases will automatically return Seq.empty */ def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy = new Strategy { def apply(plan: LogicalPlan): Seq[SparkPlan] = if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan] } } {code} Usage: {code:java} val mystrategy = strategy { case ... = ... } val seqstrategy = strategy.seq { case ... = ... } {code} *Further possible improvements:* Making the utility methods `implicit`, thereby further reducing the rewrite rules to: {code:java} // define a PartialFunction[LogicalPlan, LogicalPlan] // the implicit would convert it into a Rule[LogicalPlan] at the use sites val MyRewriteRule = { case ... = ... } {code} *Caveats* Because of the way objects are initialized vs. vals, it might be necessary reorder instructions so that vals are actually initialized before they are used. E.g.: {code:java} class MyOptimizer extends Optimizer { override val batches: Seq[Batch] = ... Batch(Other rules, FixedPoint(100), MyRewriteRule // --- might throw NPE val MyRewriteRule = ... } {code} this is easily fixed by