[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97242/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #97242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97242/testReport)** for PR 22347 at commit [`8666272`](https://github.com/apache/spark/commit/86662722e53bfcae2c75e61d170c983abd599b3a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #97242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97242/testReport)** for PR 22347 at commit [`8666272`](https://github.com/apache/spark/commit/86662722e53bfcae2c75e61d170c983abd599b3a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22347 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user tooptoop4 commented on the issue: https://github.com/apache/spark/pull/22347 @dongjoon-hyun can this be merged? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96937/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #96937 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96937/testReport)** for PR 22347 at commit [`8666272`](https://github.com/apache/spark/commit/86662722e53bfcae2c75e61d170c983abd599b3a). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user Dooyoung-Hwang commented on the issue: https://github.com/apache/spark/pull/22347 I added example code of issue case to the content of PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #96937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96937/testReport)** for PR 22347 at commit [`8666272`](https://github.com/apache/spark/commit/86662722e53bfcae2c75e61d170c983abd599b3a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96870/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #96870 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96870/testReport)** for PR 22347 at commit [`a8f1481`](https://github.com/apache/spark/commit/a8f14817ce3f52f710c3341148c2e1f3374335eb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user Dooyoung-Hwang commented on the issue: https://github.com/apache/spark/pull/22347 Thank you for review. Yes, ThriftServer will use intermediate "collection view" in this PR. And [Original PR of ThriftServer](https://github.com/apache/spark/pull/22219) will be updated accordingly ,if this PR is merged. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #96870 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96870/testReport)** for PR 22347 at commit [`a8f1481`](https://github.com/apache/spark/commit/a8f14817ce3f52f710c3341148c2e1f3374335eb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22347 Thank you for your first contribution, @Dooyoung-Hwang ! So, this is a spin-off PR from STS? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22347 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95892/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #95892 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95892/testReport)** for PR 22347 at commit [`a8f1481`](https://github.com/apache/spark/commit/a8f14817ce3f52f710c3341148c2e1f3374335eb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #95892 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95892/testReport)** for PR 22347 at commit [`a8f1481`](https://github.com/apache/spark/commit/a8f14817ce3f52f710c3341148c2e1f3374335eb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22347 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user Dooyoung-Hwang commented on the issue: https://github.com/apache/spark/pull/22347 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user Dooyoung-Hwang commented on the issue: https://github.com/apache/spark/pull/22347 I tested in my local PC. 3.3 GHz Intel Core i5, and selected 400,000 rows x 25 times. I took a total execution time between decodeUnsafeRows. My tested data is skewed, so gathered rows from executor are distributed between 40 and 80. The average execution time decreased from 175.92ms to 93.52ms. Memory usage also improved, and total GC Time is decreased from 13.883 sec to 10.764 sec. ## Before Patch ### GC statics S0 | S1 | E | O | M | CCS | YGC | YGCT | FGC | FGCT | GCT -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 100 | 24.92 | 48.66 | 96.92 | 88.25 | 150 | 13.883 | 0 | 0 | 13.883 ### Wall time : AVG 175.92 ms Row Count | Decode Time(ms) -- | -- 428942 | 73 473726 | 106 476322 | 78 509996 | 83 510590 | 124 556896 | 94 556896 | 362 595272 | 193 595272 | 175 642478 | 120 644970 | 279 679544 | 269 693354 | 116 723532 | 124 729912 | 136 730218 | 120 730246 | 184 773640 | 183 774148 | 380 810198 | 128 811606 | 131 859090 | 138 895474 | 314 895954 | 339 939636 | 149 ## After Patch : 93.52ms ### GC statics S0 | S1 | E | O | M | CCS | YGC | YGCT | FGC | FGCT | GCT -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 0 | 100 | 81.37 | 33.34 | 97.21 | 88.35 | 127 | 10.764 | 0 | 0 | 10.764 ### Wall time : AVG 93.52 ms Row Count | Decode time (ms) -- | -- 421922 | 61 422516 | 180 422850 | 110 473218 | 62 473218 | 103 473438 | 115 507198 | 60 554606 | 144 557202 | 119 601392 | 71 642652 | 61 645276 | 64 679036 | 64 679036 | 63 729624 | 242 729652 | 62 729912 | 131 773814 | 122 774234 | 62 807908 | 59 810198 | 64 814900 | 72 844772 | 59 858582 | 127 858582 | 61 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95859/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22347 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #95859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95859/testReport)** for PR 22347 at commit [`a8f1481`](https://github.com/apache/spark/commit/a8f14817ce3f52f710c3341148c2e1f3374335eb). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22347 **[Test build #95859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95859/testReport)** for PR 22347 at commit [`a8f1481`](https://github.com/apache/spark/commit/a8f14817ce3f52f710c3341148c2e1f3374335eb). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22347 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22347 Let me leave this ok to test since there looks a progress here anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22347 btw, do we have any actual performance benefit (wall time) from this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user Dooyoung-Hwang commented on the issue: https://github.com/apache/spark/pull/22347 @kiszk It is impossible counting decoded rows without modify SparkPlan, because there is no way of counting iterated size. Instead I can simulate this patch in Scala WorkSheet with below code. ```scala var decodeCount = 0 def decoding(buf: Array[Int]): Iterator[String] = { new Iterator[String] { var remain = buf.sum var index = 0 override def hasNext: Boolean = remain > 0 override def next(): String = { while (buf(index) == 0) index += 1 buf(index) -= 1 remain -= 1 decodeCount += 1// increase decodeCount f"[decode Result:$remain]" } } } // reset decodeCount decodeCount = 0 // Before Patch : decode without scala view val buf = new ArrayBuffer[String] val inputIter = Array(Array(2, 2, 2), Array(2), Array(2)).iterator while (inputIter.hasNext) buf ++= Array(inputIter.next()).flatMap(decoding) val result1 = buf.take(3).toArray // ensure decode count is 10 assert(decodeCount == 10) // reset decodeCount decodeCount = 0 // After Patch : decode with scala view val result2 = ArrayBuffer(Array(2, 2, 2), Array(2), Array(2)).toArray.view .flatMap(decoding).take(3).force // ensure decode count is 3 assert(decodeCount == 3) // assert same element assert(result1 sameElements result2) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22347: [SPARK-25353][SQL] executeTake in SparkPlan is modified ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22347 Thank you for your update. Is it better to add test case to confirm the state of the internal structures is as you expected? @maropu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org