Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208125902
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SmallDataSortBenchmark.scala
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208129700
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -166,7 +170,13 @@ class RangePartitioner[K : Ordering : ClassTag, V
GitHub user sddyljsx opened a pull request:
https://github.com/apache/spark/pull/22028
[SPARK-25046][SQL] Fix Alter View can excute sql like "ALTER VIEW ... AS
INSERT INTO"
## What changes were proposed in this pull request?
Alter View can excute sql like &
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208441135
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SmallDataSortBenchmark.scala
---
@@ -0,0 +1,95 @@
+/*
+ * Licensed to
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208441067
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SmallDataSortBenchmark.scala
---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
@ueshin please test again
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208801492
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
---
@@ -2799,6 +2799,26 @@ class SQLQuerySuite extends QueryTest with
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208801520
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala
---
@@ -2799,6 +2799,26 @@ class SQLQuerySuite extends QueryTest with
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208801641
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
This optimization is only for SQL, but other places also use
RangePartitioner. What it can affect other places?
The failed UTs are caused by
```
else if (sampleCacheEnabled
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208803055
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -166,7 +169,16 @@ class RangePartitioner[K : Ordering : ClassTag, V
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208803622
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r208804032
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -166,7 +169,16 @@ class RangePartitioner[K : Ordering : ClassTag, V
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
@ueshin
please retest it, an unkown error occurred.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209417115
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -166,9 +169,17 @@ class RangePartitioner[K : Ordering : ClassTag, V
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209417551
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209417745
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209418058
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209418116
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209420016
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -166,9 +169,17 @@ class RangePartitioner[K : Ordering : ClassTag, V
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
please help retest it . @kiszk @viirya
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r209486199
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/ShuffleExchangeExec.scala
---
@@ -294,7 +296,12 @@ object ShuffleExchangeExec
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
I think I need another retest . Please help. @viirya
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r211130877
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -155,6 +156,8 @@ class RangePartitioner[K : Ordering : ClassTag, V
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r211131294
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1207,6 +1207,13 @@ object SQLConf {
.intConf
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r211131380
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1207,6 +1207,13 @@ object SQLConf {
.intConf
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
We may not know in advance how big this query is. The data at the beginning
is large, but it may be very small after filtering.
I encountered this problem while using thrift server for queries
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r211230520
--- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala ---
@@ -166,9 +169,20 @@ class RangePartitioner[K : Ordering : ClassTag, V
Github user sddyljsx commented on a diff in the pull request:
https://github.com/apache/spark/pull/21859#discussion_r211270748
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1207,6 +1207,13 @@ object SQLConf {
.intConf
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
I read the source code again.
The RangePartitioner[K, V] in ShuffleExchangeExec is an instance of
RangePartitioner[InternalRow, Null]. RangePartitioner only sample K for getting
the
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
'The ShuffleWriter should treat RangePartitioner specially and consume the
sampled data in RangePartitioner instead of the input iterator.' This idea is
good, maybe we can cache both t
GitHub user sddyljsx opened a pull request:
https://github.com/apache/spark/pull/21859
[SPARK-24900][SQL]speed up sort when the dataset is small
## What changes were proposed in this pull request?
when running the sql like 'select * from order where order_status = 4
Github user sddyljsx commented on the issue:
https://github.com/apache/spark/pull/21859
@felixcheung
Thanks for review.
**1. How small is 'small':**
This optimazition works when the sampled data of the RangePartitioner
covers all the da
34 matches
Mail list logo