[jira] [Commented] (SPARK-7754) Use PartialFunction literals instead of objects in Catalyst

2015-05-21 Thread Edoardo Vacchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553849#comment-14553849
 ] 

Edoardo Vacchi commented on SPARK-7754:
---

That's a fair observation; on the other hand:

1) is the code concerning rule application expecting throws at all? 
2) RuleExecutor traces rules by their ruleName; if you really need named rules, 
you could use {{ rule.named(...) { } }}
   this does not solve the stack trace problem, but you'll still have 
meaningful information in the log


 Use PartialFunction literals instead of objects in Catalyst
 ---

 Key: SPARK-7754
 URL: https://issues.apache.org/jira/browse/SPARK-7754
 Project: Spark
  Issue Type: Improvement
Reporter: Edoardo Vacchi

 Catalyst rules extend two distinct rule types: {{Rule[LogicalPlan]}} and 
 {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
 The distinction is fairly subtle: in the end, both rule types are supposed to 
 define a method {{apply(plan: LogicalPlan)}}
 (where LogicalPlan is either Logical- or Spark-) which returns a transformed 
 plan (or a sequence thereof, in the case
 of Strategy).
 Ceremonies asides, the body of such method is always of the kind:
 {code:java}
  def apply(plan: PlanType) = plan match pf
 {code}
 where `pf` would be some `PartialFunction` of the PlanType:
 {code:java}
   val pf = {
 case ... = ...
   }
 {code}
 This is JIRA is a proposal to introduce utility methods to
   a) reduce the boilerplate to define rewrite rules
   b) turning them back into what they essentially represent: function types.
 These changes would be backwards compatible, and would greatly help in 
 understanding what the code does. Current use of objects is redundant and 
 possibly confusing.
 *{{Rule[LogicalPlan]}}*
 a) Introduce the utility object
 {code:java}
   object rule 
 def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
 Rule[LogicalPlan] =
   new Rule[LogicalPlan] {
 def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
   }
 def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
 Rule[LogicalPlan] =
   new Rule[LogicalPlan] {
 override val ruleName = name
 def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
   }
 {code}
 b) progressively replace the boilerplate-y object definitions; e.g.
 {code:java}
 object MyRewriteRule extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case ... = ...
 }
 {code}
 with
 {code:java}
 // define a Rule[LogicalPlan]
 val MyRewriteRule = rule {
   case ... = ...
 }
 {code}
 and/or :
 {code:java}
 // define a named Rule[LogicalPlan]
 val MyRewriteRule = rule.named(My rewrite rule) {
   case ... = ...
 }
 {code}
 *Strategies*
 A similar solution could be applied to shorten the code for
 Strategies, which are total functions
 only because they are all supposed to manage the default case,
 possibly returning `Nil`. In this case
 we might introduce the following utility:
 {code:java}
 object strategy {
   /**
* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
* The partial function must therefore return *one single* SparkPlan for 
 each case.
* The method will automatically wrap them in a [[Seq]].
* Unhandled cases will automatically return Seq.empty
*/
   def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
 new Strategy {
   def apply(plan: LogicalPlan): Seq[SparkPlan] =
 if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
 }
   /**
* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] 
 ].
* The partial function must therefore return a Seq[SparkPlan] for each 
 case.
* Unhandled cases will automatically return Seq.empty
*/
  def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
 new Strategy {
   def apply(plan: LogicalPlan): Seq[SparkPlan] =
 if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
 }
 }
 {code}
 Usage:
 {code:java}
 val mystrategy = strategy { case ... = ... }
 val seqstrategy = strategy.seq { case ... = ... }
 {code}
 *Further possible improvements:*
 Making the utility methods `implicit`, thereby
 further reducing the rewrite rules to:
 {code:java}
 // define a PartialFunction[LogicalPlan, LogicalPlan]
 // the implicit would convert it into a Rule[LogicalPlan] at the use sites
 val MyRewriteRule = {
   case ... = ...
 }
 {code}
 *Caveats*
 Because of the way objects are initialized vs. vals, it might be necessary
 reorder instructions so that vals are actually initialized before they are 
 used.
 E.g.:
 {code:java}
 class MyOptimizer extends 

[jira] [Commented] (SPARK-7754) Use PartialFunction literals instead of objects in Catalyst

2015-05-20 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-7754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553617#comment-14553617
 ] 

Reynold Xin commented on SPARK-7754:


Note that there is at least one downside to this proposed refactoring: the 
stacktrace shown will be less useful due to anonymous classes.


 Use PartialFunction literals instead of objects in Catalyst
 ---

 Key: SPARK-7754
 URL: https://issues.apache.org/jira/browse/SPARK-7754
 Project: Spark
  Issue Type: Improvement
Reporter: Edoardo Vacchi

 Catalyst rules extend two distinct rule types: {{Rule[LogicalPlan]}} and 
 {{Strategy}} (which is an alias for {{GenericStrategy[SparkPlan]}}).
 The distinction is fairly subtle: in the end, both rule types are supposed to 
 define a method {{apply(plan: LogicalPlan)}}
 (where LogicalPlan is either Logical- or Spark-) which returns a transformed 
 plan (or a sequence thereof, in the case
 of Strategy).
 Ceremonies asides, the body of such method is always of the kind:
 {code:java}
  def apply(plan: PlanType) = plan match pf
 {code}
 where `pf` would be some `PartialFunction` of the PlanType:
 {code:java}
   val pf = {
 case ... = ...
   }
 {code}
 This is JIRA is a proposal to introduce utility methods to
   a) reduce the boilerplate to define rewrite rules
   b) turning them back into what they essentially represent: function types.
 These changes would be backwards compatible, and would greatly help in 
 understanding what the code does. Current use of objects is redundant and 
 possibly confusing.
 *{{Rule[LogicalPlan]}}*
 a) Introduce the utility object
 {code:java}
   object rule 
 def rule(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
 Rule[LogicalPlan] =
   new Rule[LogicalPlan] {
 def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
   }
 def named(name: String)(pf: PartialFunction[LogicalPlan, LogicalPlan]): 
 Rule[LogicalPlan] =
   new Rule[LogicalPlan] {
 override val ruleName = name
 def apply (plan: LogicalPlan): LogicalPlan = plan transform pf
   }
 {code}
 b) progressively replace the boilerplate-y object definitions; e.g.
 {code:java}
 object MyRewriteRule extends Rule[LogicalPlan] {
   def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case ... = ...
 }
 {code}
 with
 {code:java}
 // define a Rule[LogicalPlan]
 val MyRewriteRule = rule {
   case ... = ...
 }
 {code}
 and/or :
 {code:java}
 // define a named Rule[LogicalPlan]
 val MyRewriteRule = rule.named(My rewrite rule) {
   case ... = ...
 }
 {code}
 *Strategies*
 A similar solution could be applied to shorten the code for
 Strategies, which are total functions
 only because they are all supposed to manage the default case,
 possibly returning `Nil`. In this case
 we might introduce the following utility:
 {code:java}
 object strategy {
   /**
* Generate a Strategy from a PartialFunction[LogicalPlan, SparkPlan].
* The partial function must therefore return *one single* SparkPlan for 
 each case.
* The method will automatically wrap them in a [[Seq]].
* Unhandled cases will automatically return Seq.empty
*/
   def apply(pf: PartialFunction[LogicalPlan, SparkPlan]): Strategy =
 new Strategy {
   def apply(plan: LogicalPlan): Seq[SparkPlan] =
 if (pf.isDefinedAt(plan)) Seq(pf.apply(plan)) else Seq.empty
 }
   /**
* Generate a Strategy from a PartialFunction[ LogicalPlan, Seq[SparkPlan] 
 ].
* The partial function must therefore return a Seq[SparkPlan] for each 
 case.
* Unhandled cases will automatically return Seq.empty
*/
  def seq(pf: PartialFunction[LogicalPlan, Seq[SparkPlan]]): Strategy =
 new Strategy {
   def apply(plan: LogicalPlan): Seq[SparkPlan] =
 if (pf.isDefinedAt(plan)) pf.apply(plan) else Seq.empty[SparkPlan]
 }
 }
 {code}
 Usage:
 {code:java}
 val mystrategy = strategy { case ... = ... }
 val seqstrategy = strategy.seq { case ... = ... }
 {code}
 *Further possible improvements:*
 Making the utility methods `implicit`, thereby
 further reducing the rewrite rules to:
 {code:java}
 // define a PartialFunction[LogicalPlan, LogicalPlan]
 // the implicit would convert it into a Rule[LogicalPlan] at the use sites
 val MyRewriteRule = {
   case ... = ...
 }
 {code}
 *Caveats*
 Because of the way objects are initialized vs. vals, it might be necessary
 reorder instructions so that vals are actually initialized before they are 
 used.
 E.g.:
 {code:java}
 class MyOptimizer extends Optimizer {
   override val batches: Seq[Batch] =
   ...
   Batch(Other rules, FixedPoint(100),
 MyRewriteRule // --- might throw NPE
   val MyRewriteRule = ...
 }
 {code}
 this is easily fixed by