[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

nongli Mon, 28 Dec 2015 12:36:12 -0800

Github user nongli commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10444#discussion_r48503899
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
 ---
    @@ -47,6 +48,34 @@ trait Predicate extends Expression {
       override def dataType: DataType = BooleanType
     }
     
    +object Predicate extends PredicateHelper {
    +  def toCNF(predicate: Expression, maybeThreshold: Option[Double] = None): 
Expression = {
    +    val cnf = new CNFExecutor(predicate).execute(predicate)
    +    val threshold = maybeThreshold.map(predicate.size * 
_).getOrElse(Double.MaxValue)
    +    if (cnf.size > threshold) predicate else cnf
    --- End diff --
    
    I disagree with 1. I don't see why it matters if it is all CNF or none. I 
think the heuristic we want is something like "maximize the number of simple 
predicates that are in CNF form". Simple here means contains just 1 attribute 
or binary predicate between two. These are candidates for benefiting from 
further optimization. 
    
    We could try cost basing this or just stopping the expansion after some 
amount.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6624][WIP] Draft of another alternative...

Reply via email to