Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r104825321
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -20,19 +20,342 @@ package
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r104825193
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -20,19 +20,342 @@ package
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15363
@gatorsmile @wzhfy Would you please review this PR. Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r94650968
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -42,7 +366,7 @@ object ReorderJoin extends Rule
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/16228#discussion_r92523690
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/estimation/JoinEstimation.scala
---
@@ -0,0 +1,175
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15363
The following updates were made:
1. Incorporate table and column statistics into the star join detection
algorithm. Fact table is chosen based on table cardinality, and dimensions are
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r86595007
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -83,10 +88,221 @@ object ReorderJoin extends Rule
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15363
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r85478316
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -83,10 +88,221 @@ object ReorderJoin extends Rule
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r85478259
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -83,10 +88,221 @@ object ReorderJoin extends Rule
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r85477835
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala
---
@@ -83,10 +88,221 @@ object ReorderJoin extends Rule
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r85477511
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -373,6 +373,11 @@ object SQLConf {
.booleanConf
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15363
@davies Thank you for reviewing the code! I see this work as evolving and
improving with the support of CBO. Without statistics and features such as
cardinality and selectivity, we cannot
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r85477459
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
---
@@ -261,3 +262,34 @@ object PhysicalAggregation
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/15363#discussion_r84777635
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StarJoinSuite.scala
---
@@ -0,0 +1,354 @@
+/*
+ * Licensed to the Apache Software
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/14847
@viirya Hi Simon, I have some general comments/questions:
1. It will help to include in the design document some example queries
together with their corresponding optimized + physical
GitHub user ioana-delaney opened a pull request:
https://github.com/apache/spark/pull/15363
[SPARK-17791][SQL] Join reordering using star schema detection
## What changes were proposed in this pull request?
Star schema consists of one or more fact tables referencing a
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15289
@JoshRosen In your example, we don't want to first count one million rows
coming from the base table and then to return zero rows based on the false
predicate in the outer query
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15289
@JoshRosen The original predicate has to be kept above the aggregation. An
optimization would be to also push down the predicate below the aggregation,
lower in the plan for early filtering
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/15289
@JoshRosen Wouldn't be a better design to push down the predicate but keep
the original predicate as well? If the aggregate is above a complex join, not
pushing down the predicate may
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13867#discussion_r76094136
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -912,19 +912,24 @@ class Analyzer
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13867#discussion_r76093305
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -912,19 +912,24 @@ class Analyzer
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13867
@hvanhovell I rebased my changes to the latest build. Please take a look.
Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13867
Can someone please review the changes? Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13867
@hvanhovell Would you please let me know if you agree with my previous
reply?
An alternative design is to remove the try-catch expression from
resolveOuterReferences() altogether
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13867
@hvanhovell Thank you for reviewing the changes. I replied to your comments
and made some updates. Please let me know.
---
If your project is set up for it, you can reply to this email and
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13867#discussion_r68513197
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -911,19 +911,30 @@ class Analyzer
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13867#discussion_r68513152
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -911,19 +911,30 @@ class Analyzer
GitHub user ioana-delaney opened a pull request:
https://github.com/apache/spark/pull/13867
[SPARK-16161][SQL] Ambiguous error message for unsupported correlated
predicate subqueries
## What changes were proposed in this pull request?
Subqueries with deep correlation fail with
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13570
@hvanhovell The EXISTS/NOT EXISTS predicates will have an empty condition.
e.g.
select c1 from t1 where EXISTS (select c2 from t2)
== Optimized Logical Plan ==
Project
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13570
@hvanhovell Thank you for reviewing the changes and I apologize for the
delay in replying.
I simplified the code. However, I don't think this is what you suggested.
What you sugg
GitHub user ioana-delaney opened a pull request:
https://github.com/apache/spark/pull/13570
[SPARK-15832][SQL] Embedded IN/EXISTS predicate subquery throws
TreeNodeException
## What changes were proposed in this pull request?
Queries with embedded existential sub-query
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13418
@cloud-fan @gatorsmile @davies @rxin @hvanhovell Thank you all. This was my
first PR!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13418
@cloud-fan Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13418
Thank you @cloud-fan. I mentioned the local relations in the test case
description and move the test cases under withTempTable.
---
If your project is set up for it, you can reply to this
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13418
@cloud-fan I moved the unit tests to a new test case. Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13418
@cloud-fan I replaced p.expressions with projectList. Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user ioana-delaney commented on the issue:
https://github.com/apache/spark/pull/13418
@gatorsmile @davies @rxin @cloud-fan I've incorporated the comments. Please
advise. Thank you.
---
If your project is set up for it, you can reply to this email and have your
reply appe
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13418#discussion_r65438111
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
---
@@ -84,6 +84,13 @@ object ScalarSubquery
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13418#discussion_r65437967
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1468,7 +1468,8 @@ object DecimalAggregates
Github user ioana-delaney commented on a diff in the pull request:
https://github.com/apache/spark/pull/13418#discussion_r65294821
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -1468,7 +1468,8 @@ object DecimalAggregates
GitHub user ioana-delaney opened a pull request:
https://github.com/apache/spark/pull/13418
[SPARK-15677][SQL] Query with scalar sub-query in the SELECT list throws
UnsupportedOperationException
## What changes were proposed in this pull request?
Queries with scalar sub-query
101 - 142 of 142 matches
Mail list logo