Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r212704387
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -348,30 +350,30 @@ abstract class SparkPlan extends
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22219
Yes, I verified results of a variety of queries, and memory & performance.
This patch passed all our query test. And there was no performance
degradation in our test c
GitHub user Dooyoung-Hwang opened a pull request:
https://github.com/apache/spark/pull/22219
[SPARK-25224][SQL] Improvement of Spark SQL ThriftServer memory management
## What changes were proposed in this pull request?
Spark SQL only have two options for managing
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22219
Change the accessor of collectCountAndIterator to private[sql]. And updated
doc of feature that I define in ThriftServer
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r213597686
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -329,49 +329,52 @@ abstract class SparkPlan extends
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r213597102
--- Diff:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
---
@@ -289,6 +289,14
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r212877431
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3237,6 +3237,20 @@ class Dataset[T] private[sql
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r212855096
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3237,6 +3237,20 @@ class Dataset[T] private[sql
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r212862913
--- Diff:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
---
@@ -289,6 +289,14
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r212866466
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -329,49 +329,52 @@ abstract class SparkPlan extends
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r213200433
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3237,6 +3237,28 @@ class Dataset[T] private[sql
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22219
Add test cases.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r213682460
--- Diff:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
---
@@ -289,6 +289,14
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r215122865
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3237,6 +3238,28 @@ class Dataset[T] private[sql
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r214908763
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3237,6 +3238,28 @@ class Dataset[T] private[sql
GitHub user Dooyoung-Hwang opened a pull request:
https://github.com/apache/spark/pull/22347
[SPARK-25353][SQL] Refactoring executeTake(n: Int) in SparkPlan
## What changes were proposed in this pull request?
In some cases, executeTake in SparkPlan could deserialize more than
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22347
cc: @maropu @viirya @kiszk @HyukjinKwon
This PR is separated from https://github.com/apache/spark/pull/22219 by
@maropu's opinion
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22347
@kiszk
It is impossible counting decoded rows without modify SparkPlan, because
there is no way of counting iterated size.
Instead I can simulate this patch in Scala WorkSheet
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22347
I tested in my local PC. 3.3 GHz Intel Core i5, and selected 400,000 rows x
25 times.
I took a total execution time between decodeUnsafeRows.
My tested data is skewed, so gathered
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22347
Jenkins, retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22219
I refactored collectionResultAsSeqView function with using implicit class.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22219
Dear reviewers (cc : @dongjoon-hyun )
I updated these.
1. No behavior changes, if the new config is off. So, [PR
SPARK-25353](https://github.com/apache/spark/pull/22347
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r225500776
--- Diff:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
---
@@ -120,10 +120,11
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22347
Thank you for review.
Yes, ThriftServer will use intermediate "collection view" in this PR.
And [Original PR of
ThriftServer](https://github.com/apache/spark/
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22347#discussion_r222623133
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -348,30 +349,30 @@ abstract class SparkPlan extends
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22347#discussion_r222622501
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -348,30 +349,30 @@ abstract class SparkPlan extends
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22347
I added example code of issue case to the content of PR.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Dooyoung-Hwang commented on the issue:
https://github.com/apache/spark/pull/22219
@kiszk @viirya @HyukjinKwon @cloud-fan
Could you review this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r214819341
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3237,6 +3238,28 @@ class Dataset[T] private[sql
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r214818164
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ---
@@ -329,17 +337,26 @@ abstract class SparkPlan extends
Github user Dooyoung-Hwang commented on a diff in the pull request:
https://github.com/apache/spark/pull/22219#discussion_r214818478
--- Diff:
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkExecuteStatementOperation.scala
---
@@ -120,10 +120,11
31 matches
Mail list logo