[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread zsxwing
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128688973 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java --- @@ -257,4 +257,11 @@ public Properties cryptoConf() {

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79816/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79816/testReport)** for PR 16578 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79824/testReport)** for PR 18468 at commit

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18694 This only makes sense if the downstream operators consume the iterator of SortMergeJoin first and then performs its work. If the downstream operators are piped with SortMergeJoin, once the iterator

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #79823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79823/testReport)** for PR 18697 at commit

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79822/ Test FAILed. ---

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18697 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #79822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79822/testReport)** for PR 18697 at commit

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #79822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79822/testReport)** for PR 18697 at commit

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread aray
Github user aray commented on the issue: https://github.com/apache/spark/pull/18697 Plan for the example query before the patch (with partitioning as suffix): ``` *HashAggregate(keys=[parent#228], functions=[], output=[level2#274]) hashpartitioning(parent#228, 5) +-

[GitHub] spark pull request #18697: [SPARK-16683][SQL] Repeated joins to same table c...

2017-07-20 Thread aray
GitHub user aray opened a pull request: https://github.com/apache/spark/pull/18697 [SPARK-16683][SQL] Repeated joins to same table can leak attributes via partitioning ## What changes were proposed in this pull request? In some complex queries where the same table is

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r128684617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0")

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 cleanup hook is used after task is done. The diff solve the leak for SortMergeJoin only and does not apply to the limit case. Limit is another special case and need to be taken care of separately.

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18694 I'd doubt this is actually a memory leak as `UnsafeExternalSorter` already avoids memory leaks by registering a cleanup task:

[GitHub] spark pull request #18492: [SPARK-19326] Speculated task attempts do not get...

2017-07-20 Thread janewangfb
Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18492#discussion_r128683971 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -572,20 +572,35 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128683903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #79821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79821/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #79820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79820/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #79819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79819/testReport)** for PR 18652 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18468 I saw @rxin's comment at #18680 on the performance implications for new column vector implementation. As this introduces another implementation, shall we consider the issue for this too? --- If

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128678034 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1248,4 +1251,238 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128676879 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1248,4 +1251,238 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128678520 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128679302 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128679416 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 The memory leak happens on following scenario. For example, in inner join, the left side is exhausted, we will stop advance the right side. Because the right side is not reach the end, the memory

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18694 **[Test build #79818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79818/testReport)** for PR 18694 at commit

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan
Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128679491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79817/testReport)** for PR 18468 at commit

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128677320 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark pull request #18688: delete the superfluous ‘ symbol in rdd-programm...

2017-07-20 Thread wangyangting
Github user wangyangting closed the pull request at: https://github.com/apache/spark/pull/18688 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18688: delete the superfluous ‘ symbol in rdd-programming-gui...

2017-07-20 Thread wangyangting
Github user wangyangting commented on the issue: https://github.com/apache/spark/pull/18688 you're right, but I think the use of "*ByKey" will be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18694 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79813/ Test PASSed. ---

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18694 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18694 **[Test build #79813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79813/testReport)** for PR 18694 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79816/testReport)** for PR 16578 at commit

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79812/ Test PASSed. ---

[GitHub] spark issue #18589: [SPARK-16872][ML] Add Gaussian NB

2017-07-20 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/18589 @MLnick Sorry to reply late. It is a long time since I got the last comments in the previous PR https://github.com/apache/spark/pull/15324, so I thought that community may dislike that

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #79812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79812/testReport)** for PR 15435 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 ping @cloud-fan , @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18658 When the size is too large, which exception will be thrown? Truncate for all the cases? or just when the size is too large? --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79815/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79815/ Test FAILed. ---

[GitHub] spark pull request #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18658#discussion_r128675175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1037,25 +1037,25 @@ object

[GitHub] spark pull request #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18658#discussion_r128675159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1037,25 +1037,25 @@ object

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79815/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I just pushed a revision of `SelectedField.scala`. Let's see what Jenkins says. I expect it to pass, and assuming it does I will return the ball to the reviewers' court. --- If your project is

[GitHub] spark pull request #3658: [SPARK-4809] Rework Guava library shading.

2017-07-20 Thread caneGuy
Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/3658#discussion_r128674205 --- Diff: streaming/pom.xml --- @@ -105,6 +105,14 @@ + + +org.apache.maven.plugins +

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79811/ Test PASSed. ---

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79811/testReport)** for PR 18468 at commit

[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18559 Have we highlighted this in release notes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18694 Can you show some experiments that indicate there's memory leak? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128671222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668731 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668538 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3185,170 +3185,207 @@ object functions { val inputTypes = (1 to

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668434 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128668137 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -443,12 +459,57 @@ final class

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128668114 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -443,12 +459,57 @@ final class

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128668000 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -375,6 +390,7 @@ final class

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128667967 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -321,6 +321,17 @@ package object config { .intConf

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128667716 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18692: [SPARK-21417][SQL] Detect joind conditions via filter ex...

2017-07-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 I think we already did it via constraint propagation, didn't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18618 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79814/ Test PASSed. ---

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18618 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18618 **[Test build #79814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79814/testReport)** for PR 18618 at commit

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 @cloud-fan I understand your concern. A `TransportRequestHandler` is for a channel/connection. We want to track the sending chunks of all connections. So I guess we must have a manager for

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18618 **[Test build #79814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79814/testReport)** for PR 18618 at commit

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18694 **[Test build #79813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79813/testReport)** for PR 18694 at commit

[GitHub] spark pull request #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames i...

2017-07-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18618#discussion_r128664429 --- Diff: python/pyspark/sql/types.py --- @@ -445,9 +445,12 @@ class StructType(DataType): This is the data type representing a

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18694 What is the perf impact and how large is it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18694 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18618 @holdenk, sure, makes sense but let me just leave a deprecation note for the `StructType.names` if you are okay with it too (at least I use this a lot in the production codes ...). --- If

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128664006 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -947,6 +967,10 @@ private class

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128663898 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128663906 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -604,6 +609,12 @@ case class SortMergeJoinExec(

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #79812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79812/testReport)** for PR 15435 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79811/testReport)** for PR 18468 at commit

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18605: [SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid f...

2017-07-20 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/18605 @yanboliang I have made changes accordingly. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18695: [SPARK-12717][PYTHON] Adding thread-safe broadcast pickl...

2017-07-20 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18695 ping @holdenk @davies , does this fix look ok to you? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128657154 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128657173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -79,8 +79,15 @@ class UDFRegistration private[sql]

[GitHub] spark issue #18683: [SPARK-21474][CORE] Make number of parallel fetches from...

2017-07-20 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18683 Did you see an explicit performance change while changing this parameter? I would think netty client thread number for shuffle is still a bottleneck even if you split into more fetch requests.

[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-07-20 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18690 I think it is because you don't configure metrics sink here, so by default none of the metrics are reported. For example if I enabled jmx sink

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @jkbradley I think it's OK now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-20 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128648171 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -271,7 +273,7 @@ object OneVsRestModel extends

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128646482 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128646011 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -18,8 +18,17 @@ package org.apache.spark.deploy.worker

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128648684 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128648242 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128646523 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128647444 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -43,7 +52,7 @@ object DriverWrapper {

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128648079 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79806/ Test PASSed. ---

  1   2   3   4   >