Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@maropu, I don't think we can. Actually this is how we deal with [simpler
joins](https://github.com/apache/spark/pull/22318#issuecomment-427080091)
Do you think changing the behaviour
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22617#discussion_r229480460
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
---
@@ -68,9 +100,7 @@ object
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22789
Thanks @mgaido91 @hvanhovell for the review.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22617#discussion_r228944791
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
---
@@ -68,9 +100,7 @@ object
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22617#discussion_r228889856
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/ExternalAppendOnlyUnsafeRowArrayBenchmark.scala
---
@@ -68,9 +100,7 @@ object
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22817
Thanks @dongjoon-hyun , @gatorsmile, @cloud-fan , @hvanhovell for the
review.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228753985
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
---
@@ -146,7 +146,10 @@ trait CodegenSupport extends
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22789#discussion_r228753682
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
---
@@ -319,4 +319,15 @@ class WholeStageCodegenSuite
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22617
@dongjoon-hyun , @kiszk could you please help me how take a step forward
with this PR?
---
-
To unsubscribe, e-mail: reviews
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22789
@kiszk , @mgaido91, @hvanhovell anything I can add to this PR?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22817#discussion_r228737835
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -2578,4 +2578,12 @@ class DataFrameSuite extends QueryTest
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22817
@hvanhovell @gatorsmile I think this is regression from 2.2 to 2.3
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22804
Thanks @dongjoon-hyun , @wangyum for the review.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22817
@gatorsmile , I looked into this and it seems if we use `mapChildren` in
`ResolveReferences` then `UnresolvedExtractValue` should define 2 children
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22817#discussion_r228285647
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
---
@@ -407,7 +407,10 @@ case class ResolvedStar
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22817
I will try to investigate a bit more come up with an other solution.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22817
So based on the UT results it seems that simply changing the resolution to
bottom-up causes issues with `LambdaFunction`s in the current version of Spark.
The issue seems
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22804
Thanks @dongjoon-hyun for the fixes. Merged.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22817
Thanks @gatorsmile , I thought the issue in SPARK-25816 and in the added UT
is because the top-down. I thought that `UnresolvedExtractValue(child,
fieldExpr) if child.resolved` could be resolved
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22817
[SPARK-25816][SQL] ResolveReferences should work bottom-up manner on
expressions
## What changes were proposed in this pull request?
ResolveReferences works top-down manner when
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22804#discussion_r227470048
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/execution/benchmark/ObjectHashAggregateExecBenchmark.scala
---
@@ -21,207 +21,212 @@ import
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22804
[SPARK-25665][SQL][TEST] Refactor ObjectHashAggregateExecBenchmark toâ¦
## What changes were proposed in this pull request?
Refactor ObjectHashAggregateExecBenchmark to use main method
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22789
[SPARK-25767][SQL] fix inputVars preparation if outputVars is a lazy stream
## What changes were proposed in this pull request?
Code generation is incorrect if `outputVars` parameter
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22766
cc @cloud-fan I believe this is a regression because
https://issues.apache.org/jira/browse/SPARK-18186
---
-
To unsubscribe
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22766#discussion_r226345858
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
---
@@ -339,40 +339,38 @@ private[hive] case class HiveUDAFFunction
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22766
[SPARK-25768][SQL] fix constant argument expecting UDAFs
## What changes were proposed in this pull request?
This change makes all fields of `HiveUDAFFunction` lazy.
## How
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22664
Thanks for the review @dongjoon-hyun and @dbtsai .
I have one question though, I still don't see
https://issues.apache.org/jira/browse/SPARK-25662 assigned to me. Could you
please look
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@srowen, I saw your last comment on
https://github.com/peter-toth/spark/tree/SPARK-25150. I submitted this PR to
solve that ticket and I believe the description here explains what is the real
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22664#discussion_r22366
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DataSourceReadBenchmark.scala
---
@@ -34,10 +34,15 @@ import
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22664
[SPARK-25662][TEST] Refactor DataSourceReadBenchmark to use main method
## What changes were proposed in this pull request?
1. Refactor DataSourceReadBenchmark
## How
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22603
Thanks @dongjoon-hyun , `petertoth` is my JIRA user id.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22603
Thanks @cloud-fan for the review. I've fixed your findings.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
Also please consider that currently (and also after this PR) using `b` and
`c` from the description:
```
b.join(c, b("id") === b("id"), "inner&quo
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22603
Thanks @dongjoon-hyun for the review. I've fixed your findings.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22617
cc @dongjoon-hyun @seancxmao
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22617
[SPARK-25484][TEST] Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark
## What changes were proposed in this pull request?
1. Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark
2
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
Thanks @viirya, your analysis is correct.
Unfortunately an attribute doesn't have a reference to its dataset so I
don't think this scenario can be solved easily. I believe the good
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan could you please help me with this PR and take it one step
forward?
---
-
To unsubscribe, e-mail: reviews
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22603#discussion_r221898450
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
---
@@ -315,7 +315,12 @@ object InMemoryFileIndex
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22603#discussion_r221890344
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InMemoryFileIndex.scala
---
@@ -315,7 +315,12 @@ object InMemoryFileIndex
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22603
SPARK-25062: clean up BlockLocations in InMemoryFileIndex
## What changes were proposed in this pull request?
`InMemoryFileIndex` caches `FileStatus` objects to paths. Each `FileStatus
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22518#discussion_r219617722
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/subquery.scala ---
@@ -166,7 +168,7 @@ case class ReuseSubquery(conf: SQLConf) extends
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22518#discussion_r219616464
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala
---
@@ -1268,4 +1269,16 @@ class SubquerySuite extends QueryTest
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan, does the new description defines the scope as you suggested?
Is there anything I can add to this PR
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan , I added some explanation to the description in which cases
this PR helps and also where it doesn't
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@cloud-fan this PR doesn't solve that question.
There are some hacks in `Dataset.join` to handle `EqualTo` and
`EqualNullSafe` with duplicated attributes and those hacks are still required
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215571877
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -921,12 +924,18 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215571612
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -805,10 +807,10 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215571667
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
---
@@ -23,12 +23,14 @@ package
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215571480
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
---
@@ -23,12 +23,14 @@ package
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215504208
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -921,12 +924,18 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215503790
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -921,12 +924,18 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215499599
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
---
@@ -23,12 +23,14 @@ package
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215274203
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -754,11 +754,14 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215274137
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala
---
@@ -23,12 +23,14 @@ package
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215255291
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -754,11 +754,16 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r215189187
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -754,11 +754,16 @@ class Analyzer
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
@mgaido91 , 2.2 also suffered from this.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214793247
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -295,4 +295,14 @@ class DataFrameJoinSuite extends QueryTest
Github user peter-toth commented on the issue:
https://github.com/apache/spark/pull/22318
Also added missing `if attr.resolved` which I think will fix the UT issues.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214732767
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -817,7 +819,7 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214732751
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -805,10 +807,10 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214732731
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -921,12 +930,16 @@ class Analyzer
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214666748
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -295,4 +295,17 @@ class DataFrameJoinSuite extends QueryTest
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r21451
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -895,6 +897,13 @@ class Analyzer(
case
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214666333
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -295,4 +295,17 @@ class DataFrameJoinSuite extends QueryTest
Github user peter-toth commented on a diff in the pull request:
https://github.com/apache/spark/pull/22318#discussion_r214666206
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala ---
@@ -295,4 +295,17 @@ class DataFrameJoinSuite extends QueryTest
GitHub user peter-toth opened a pull request:
https://github.com/apache/spark/pull/22318
[SPARK-25150][SQL] Fix attribute deduplication in join
## What changes were proposed in this pull request?
Fixes attribute deduplication in join conditions.
## How
68 matches
Mail list logo