[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55336261
  
@marbrus I see what you mean.  Updated to basically what you suggested, 
aside from building the map once.  Let me know, once it's finalized I can try 
to test one more time on live data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2345#discussion_r17452614
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -47,6 +48,53 @@ object Optimizer extends RuleExecutor[LogicalPlan] {
 }
 
 /**
+  *  Pushes operations to either side of a Union.
+  */
+object UnionPushdown extends Rule[LogicalPlan] {
+
+  /**
+*  Maps Attributes from the left side to the corresponding Attribute 
on the right side.
+*/
+  def buildRewrites(union: Union): AttributeMap[Attribute] = {
+assert(union.left.output.size == union.right.output.size)
+
+AttributeMap(union.left.output.zip(union.right.output))
+  }
+
+  /**
+*  Rewrites an expression so that it can be pushed to the right side 
of a Union operator.
+*  This method relies on the fact that the output attributes of a 
union are always equal
+*  to the left child's output.
+*/
+  def pushToRight[A : Expression](e: A, union: Union, rewrites: 
AttributeMap[Attribute]): A = {
--- End diff --

Nit: `union` is not used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55336425
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20170/consoleFull)
 for   PR 2345 at commit 
[`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55336546
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/73/consoleFull)
 for   PR 2345 at commit 
[`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55336553
  
LGTM to me once the tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55342753
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20176/consoleFull)
 for   PR 2345 at commit 
[`5c8d24d`](https://github.com/apache/spark/commit/5c8d24d07eaaa35e7b596e047861b663fc10f03d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55345242
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/73/consoleFull)
 for   PR 2345 at commit 
[`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55345198
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20170/consoleFull)
 for   PR 2345 at commit 
[`0788691`](https://github.com/apache/spark/commit/07886917b71ad0b23fbe68253a568d29882a21b1).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55348969
  
Thanks!  I've merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2345


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55350225
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20176/consoleFull)
 for   PR 2345 at commit 
[`5c8d24d`](https://github.com/apache/spark/commit/5c8d24d07eaaa35e7b596e047861b663fc10f03d).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect(`
  * `case class CreateTableAsSelect(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread koeninger
GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/2345

SPARK-3462 push down filters and projections into Unions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mediacrossinginc/spark SPARK-3462

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2345


commit ef47b3b80dd92f4652947ccffa5c9fea97adffb0
Author: Cody Koeninger cody.koenin...@mediacrossing.com
Date:   2014-09-10T05:07:58Z

SPARK-3462 push down filters and projections into Unions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55117296
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55172311
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55172771
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20111/consoleFull)
 for   PR 2345 at commit 
[`ef47b3b`](https://github.com/apache/spark/commit/ef47b3b80dd92f4652947ccffa5c9fea97adffb0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55178764
  
Hey @koeninger, thanks for implementing this optimization!

Overall this looks pretty good.  A few minor suggestions:
 - I'm not sure that we want to check the names and qualifiers to find the 
corresponding `Attribute`s.  It is possible that either side could actually 
have tables that are aliased differently and thus the `Attribute`s would have 
different qualifiers.  Instead, I think that it safe to assume that the 
analysis phase has checked name and type matching and just find the 
corresponding `Attribute` by ordering.  I'm thinking something like this:

```scala
/**
 * Pushes Project and Filter operations to either side of a Union.
 */
object UnionPushdown extends Rule[LogicalPlan] {

  /**
   * Rewrites an expression so that it can be pushed to the right side of a 
Union operator.
   * This method relies on the fact that the output attributes of a union 
are always equal to the
   * left child's output.
   */
  def pushToRight[A : Expression](e: A, union: Union): A = {
assert(union.left.output.size == union.right.output.size)

// Maps Attributes from the left side to the corresponding Attribute on 
the right side.
val rewrites = AttributeMap(union.left.output.zip(union.right.output))
val result = e transform {
  case a: Attribute = rewrites(a)
}

// We must promise the compiler that we did not discard the names in 
the case of project
// expressions.  This is safe since the only transformation is from 
Attribute = Attribute.
result.asInstanceOf[A]
  }

  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
// Push down filter into union
case Filter(condition, u @ Union(left, right)) =
  Union(
Filter(condition, left),
Filter(pushToRight(condition, u), right))

// Push down projection into union
case Project(projectList, u @ Union(left, right)) =
  Union(
Project(projectList, left),
Project(projectList.map(pushToRight(_, u)), right))
  }
}
```
 - Would also be great to add some tests to `FilterPushdownSuite` and maybe 
create a similar `ColumnPruningSuite`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2345#issuecomment-55185718
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20111/consoleFull)
 for   PR 2345 at commit 
[`ef47b3b`](https://github.com/apache/spark/commit/ef47b3b80dd92f4652947ccffa5c9fea97adffb0).
 * This patch **fails** unit tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org