date:20170720

[GitHub] spark pull request #18388: [SPARK-21175] Reject OpenBlocks when memory short...

2017-07-20 Thread zsxwing

Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/18388#discussion_r128688973 --- Diff: common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java --- @@ -257,4 +257,11 @@ public Properties cryptoConf() {

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79816/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79816/testReport)** for PR 16578 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79824/testReport)** for PR 18468 at commit

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/18694 This only makes sense if the downstream operators consume the iterator of SortMergeJoin first and then performs its work. If the downstream operators are piped with SortMergeJoin, once the iterator

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #79823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79823/testReport)** for PR 18697 at commit

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18697 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79822/ Test FAILed. ---

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18697 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #79822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79822/testReport)** for PR 18697 at commit

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18697 **[Test build #79822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79822/testReport)** for PR 18697 at commit

[GitHub] spark issue #18697: [SPARK-16683][SQL] Repeated joins to same table can leak...

2017-07-20 Thread aray

Github user aray commented on the issue: https://github.com/apache/spark/pull/18697 Plan for the example query before the patch (with partitioning as suffix): ``` *HashAggregate(keys=[parent#228], functions=[], output=[level2#274]) hashpartitioning(parent#228, 5) +-

[GitHub] spark pull request #18697: [SPARK-16683][SQL] Repeated joins to same table c...

2017-07-20 Thread aray

GitHub user aray opened a pull request: https://github.com/apache/spark/pull/18697 [SPARK-16683][SQL] Repeated joins to same table can leak attributes via partitioning ## What changes were proposed in this pull request? In some complex queries where the same table is

[GitHub] spark pull request #18313: [SPARK-21087] [ML] CrossValidator, TrainValidatio...

2017-07-20 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18313#discussion_r128684617 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala --- @@ -113,15 +122,28 @@ class CrossValidator @Since("1.2.0")

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan

Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 cleanup hook is used after task is done. The diff solve the leak for SortMergeJoin only and does not apply to the limit case. Limit is another special case and need to be taken care of separately.

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/18694 I'd doubt this is actually a memory leak as `UnsafeExternalSorter` already avoids memory leaks by registering a cleanup task:

[GitHub] spark pull request #18492: [SPARK-19326] Speculated task attempts do not get...

2017-07-20 Thread janewangfb

Github user janewangfb commented on a diff in the pull request: https://github.com/apache/spark/pull/18492#discussion_r128683971 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -572,20 +572,35 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan

Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128683903 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #79821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79821/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #79820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79820/testReport)** for PR 18652 at commit

[GitHub] spark issue #18652: [SPARK-21497][SQL][WIP] Pull non-deterministic equi join...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18652 **[Test build #79819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79819/testReport)** for PR 18652 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/18468 I saw @rxin's comment at #18680 on the performance implications for new column vector implementation. As this introduces another implementation, shall we consider the issue for this too? --- If

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin

Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128678034 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1248,4 +1251,238 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin

Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128676879 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala --- @@ -1248,4 +1251,238 @@ class ColumnarBatchSuite

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin

Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128678520 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin

Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128679302 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread ueshin

Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128679416 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan

Github user zhzhan commented on the issue: https://github.com/apache/spark/pull/18694 The memory leak happens on following scenario. For example, in inner join, the left side is exhausted, we will stop advance the right side. Because the right side is not reach the end, the memory

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18694 **[Test build #79818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79818/testReport)** for PR 18694 at commit

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread zhzhan

Github user zhzhan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128679491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79817/testReport)** for PR 18468 at commit

[GitHub] spark pull request #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector...

2017-07-20 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18468#discussion_r128677320 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/CachedBatchColumnVector.java --- @@ -0,0 +1,259 @@ +/* + * Licensed to

[GitHub] spark pull request #18688: delete the superfluous ‘ symbol in rdd-programm...

2017-07-20 Thread wangyangting

Github user wangyangting closed the pull request at: https://github.com/apache/spark/pull/18688 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18688: delete the superfluous ‘ symbol in rdd-programming-gui...

2017-07-20 Thread wangyangting

Github user wangyangting commented on the issue: https://github.com/apache/spark/pull/18688 you're right, but I think the use of "*ByKey" will be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18694 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79813/ Test PASSed. ---

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18694 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18694 **[Test build #79813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79813/testReport)** for PR 18694 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79816/testReport)** for PR 16578 at commit

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79812/ Test PASSed. ---

[GitHub] spark issue #18589: [SPARK-16872][ML] Add Gaussian NB

2017-07-20 Thread zhengruifeng

Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/18589 @MLnick Sorry to reply late. It is a long time since I got the last comments in the previous PR https://github.com/apache/spark/pull/15324, so I thought that community may dislike that

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15435 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #79812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79812/testReport)** for PR 15435 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread kiszk

Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18468 ping @cloud-fan , @ueshin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18658 When the size is too large, which exception will be thrown? Truncate for all the cases? or just when the size is too large? --- If your project is set up for it, you can reply to this

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79815/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79815/ Test FAILed. ---

[GitHub] spark pull request #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18658#discussion_r128675175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1037,25 +1037,25 @@ object

[GitHub] spark pull request #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18658#discussion_r128675159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1037,25 +1037,25 @@ object

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #79815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79815/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread mallman

Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I just pushed a revision of `SelectedField.scala`. Let's see what Jenkins says. I expect it to pass, and assuming it does I will return the ball to the reviewers' court. --- If your project is

[GitHub] spark pull request #3658: [SPARK-4809] Rework Guava library shading.

2017-07-20 Thread caneGuy

Github user caneGuy commented on a diff in the pull request: https://github.com/apache/spark/pull/3658#discussion_r128674205 --- Diff: streaming/pom.xml --- @@ -105,6 +105,14 @@ + + +org.apache.maven.plugins +

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79811/ Test PASSed. ---

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18468 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79811 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79811/testReport)** for PR 18468 at commit

[GitHub] spark issue #18559: [SPARK-21335][SQL] support un-aliased subquery

2017-07-20 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/18559 Have we highlighted this in release notes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya

Github user viirya commented on the issue: https://github.com/apache/spark/pull/18694 Can you show some experiments that indicate there's memory leak? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread viirya

Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128671222 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668731 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668538 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -3185,170 +3185,207 @@ object functions { val inputTypes = (1 to

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128668434 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128668137 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -443,12 +459,57 @@ final class

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128668114 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -443,12 +459,57 @@ final class

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128668000 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -375,6 +390,7 @@ final class

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18487#discussion_r128667967 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -321,6 +321,17 @@ package object config { .intConf

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128667716 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark issue #18692: [SPARK-21417][SQL] Detect joind conditions via filter ex...

2017-07-20 Thread cloud-fan

Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18692 I think we already did it via constraint propagation, didn't we? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18618 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79814/ Test PASSed. ---

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18618 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18618 **[Test build #79814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79814/testReport)** for PR 18618 at commit

[GitHub] spark issue #18388: [SPARK-21175] Reject OpenBlocks when memory shortage on ...

2017-07-20 Thread jinxing64

Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18388 @cloud-fan I understand your concern. A `TransportRequestHandler` is for a channel/connection. We want to track the sending chunks of all connections. So I guess we must have a manager for

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18618 **[Test build #79814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79814/testReport)** for PR 18618 at commit

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18694 **[Test build #79813 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79813/testReport)** for PR 18694 at commit

[GitHub] spark pull request #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames i...

2017-07-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18618#discussion_r128664429 --- Diff: python/pyspark/sql/types.py --- @@ -445,9 +445,12 @@ class StructType(DataType): This is the data type representing a

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18694 What is the perf impact and how large is it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18694 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #18618: [SPARK-20090][PYTHON] Add StructType.fieldNames in PySpa...

2017-07-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18618 @holdenk, sure, makes sense but let me just leave a deprecation note for the `StructType.names` if you are okay with it too (at least I use this a lot in the production codes ...). --- If

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128664006 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -947,6 +967,10 @@ private class

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128663898 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -649,6 +660,11 @@ private[joins] class

[GitHub] spark pull request #18694: [SPARK-21492][SQL] Memory leak in SortMergeJoin

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18694#discussion_r128663906 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -604,6 +609,12 @@ case class SortMergeJoinExec(

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15435 **[Test build #79812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79812/testReport)** for PR 15435 at commit

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-20 Thread SparkQA

Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18468 **[Test build #79811 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79811/testReport)** for PR 18468 at commit

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18605: [SparkR][SPARK-21381]:SparkR: pass on setHandleInvalid f...

2017-07-20 Thread wangmiao1981

Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/18605 @yanboliang I have made changes accordingly. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18695: [SPARK-12717][PYTHON] Adding thread-safe broadcast pickl...

2017-07-20 Thread BryanCutler

Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18695 ping @holdenk @davies , does this fix look ok to you? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128657154 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -130,460 +138,507 @@ class UDFRegistration private[sql]

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-20 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17848#discussion_r128657173 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -79,8 +79,15 @@ class UDFRegistration private[sql]

[GitHub] spark issue #18683: [SPARK-21474][CORE] Make number of parallel fetches from...

2017-07-20 Thread jerryshao

Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18683 Did you see an explicit performance change while changing this parameter? I would think netty client thread number for shuffle is still a bottleneck even if you split into more fetch requests.

[GitHub] spark issue #18690: [SPARK-21334][CORE] Add metrics reporting service to Ext...

2017-07-20 Thread jerryshao

Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/18690 I think it is because you don't configure metrics sink here, so by default none of the metrics are reported. For example if I enabled jmx sink

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2017-07-20 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/15435 cc @jkbradley I think it's OK now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-20 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r128648171 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -271,7 +273,7 @@ object OneVsRestModel extends

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128646482 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128646011 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -18,8 +18,17 @@ package org.apache.spark.deploy.worker

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128648684 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128648242 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128646523 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128647444 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -43,7 +52,7 @@ object DriverWrapper {

[GitHub] spark pull request #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for st...

2017-07-20 Thread vanzin

Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/18630#discussion_r128648079 --- Diff: core/src/main/scala/org/apache/spark/deploy/worker/DriverWrapper.scala --- @@ -66,4 +77,68 @@ object DriverWrapper { System.exit(-1)

[GitHub] spark issue #17373: [SPARK-12664] Expose probability in mlp model

2017-07-20 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17373 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79806/ Test PASSed. ---

1 2 3 4 >

1 - 100 of 322 matches

Mail list logo