[GitHub] spark pull request #17544: [SPARK-20231] [SQL] Refactor star schema code for...

2017-04-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17544#discussion_r110031650 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala --- @@ -20,339 +20,13 @@ package org.apache.spark.sql.catalyst

[GitHub] spark issue #14617: [SPARK-17019][Core] Expose on-heap and off-heap memory u...

2017-04-05 Thread jsoltren
Github user jsoltren commented on the issue: https://github.com/apache/spark/pull/14617 This looks good to me. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17531 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17532: [SPARK-20214][ML] Make sure converted csc matrix has sor...

2017-04-05 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17532 Btw, I'd really like to get this into 2.2, which will be cut soon. Let me know if you'd like me to take it over. Thanks! --- If your project is set up for it, you can reply to this email and ha

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17531 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75553/ Test FAILed. ---

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17531 **[Test build #75553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75553/testReport)** for PR 17531 at commit [`42f49f2`](https://github.com/apache/spark/commit/4

[GitHub] spark issue #17537: [SPARK-20204][SQL][Followup] SQLConf should react to cha...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17537 **[Test build #75558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75558/testReport)** for PR 17537 at commit [`1fb23cf`](https://github.com/apache/spark/commit/1f

[GitHub] spark pull request #17544: [SPARK-20231] [SQL] Refactor star schema code for...

2017-04-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/17544#discussion_r110029591 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/StarSchemaDetection.scala --- @@ -0,0 +1,351 @@ +/* + * Licensed t

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17531 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75552/ Test FAILed. ---

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17531 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17531 **[Test build #75552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75552/testReport)** for PR 17531 at commit [`28ebc94`](https://github.com/apache/spark/commit/2

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/17531 I will leave it around a bit in case @JoshRosen has any further comments. Feel free to merge btw if you dont ! --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17544: [SPARK-20231] [SQL] Refactor star schema code for the su...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17544 **[Test build #75557 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75557/testReport)** for PR 17544 at commit [`99732ff`](https://github.com/apache/spark/commit/99

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread mridulm
Github user mridulm commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r110028203 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor( setTaskFinishedA

[GitHub] spark issue #17544: [SPARK-20231] [SQL] Refactor star schema code for the su...

2017-04-05 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17544 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #17544: [SPARK-20231] [SQL] Refactor star schema code for the su...

2017-04-05 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/17544 add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #17543: [SPARK-20230] FetchFailedExceptions should invalidate fi...

2017-04-05 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/17543 That JIRA is great. I'll close this PR for now and link my JIRA in there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proj

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread brkyvz
Github user brkyvz closed the pull request at: https://github.com/apache/spark/pull/17543 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark issue #17543: [SPARK-20230] FetchFailedExceptions should invalidate fi...

2017-04-05 Thread kayousterhout
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/17543 In theory (as you may know), the way this is supposed to work is that, since each reduce task reads the map outputs in random order, we delay re-scheduling the earlier stage, to try to collect

[GitHub] spark issue #17543: [SPARK-20230] FetchFailedExceptions should invalidate fi...

2017-04-05 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/17543 Let me try to draw a graph to better explain this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this f

[GitHub] spark issue #17543: [SPARK-20230] FetchFailedExceptions should invalidate fi...

2017-04-05 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/17543 Yes, your explanation is on point. If I have 4+ executors that died, then all retries of Stage B will also eventually fail. If we didn't ignore these failures, we could have re-computed the outputs o

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17543#discussion_r110014380 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1281,10 +1281,24 @@ class DAGScheduler( val failedSta

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r110013198 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/broadcastMode.scala --- @@ -26,10 +26,7 @@ import org.apache.spark.sql.cata

[GitHub] spark issue #17545: [SPARK-20232][Python] Improve combineByKey docs

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17545 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request #17545: [SPARK-20232][Python] Improve combineByKey docs

2017-04-05 Thread dgingrich
GitHub user dgingrich opened a pull request: https://github.com/apache/spark/pull/17545 [SPARK-20232][Python] Improve combineByKey docs ## What changes were proposed in this pull request? Improve combineByKey documentation: * Add note on memory allocation * Chan

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17543#discussion_r110011464 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1281,10 +1281,24 @@ class DAGScheduler( val failedStage = st

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/17543#discussion_r110011422 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1281,10 +1281,24 @@ class DAGScheduler( val failedStage = st

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 I'm sorry but you won't convince me that it's a useful feature to have Spark be a big security hole when inserted into a kerberos environment. As I said, if you want to change your approach t

[GitHub] spark issue #17544: [SPARK-20231] [SQL] Refactor star schema code for the su...

2017-04-05 Thread ioana-delaney
Github user ioana-delaney commented on the issue: https://github.com/apache/spark/pull/17544 @gatorsmile I did a small refactoring for star schema. Would you please review. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #17539: [SPARK-20224][SS] Updated docs for streaming dropDuplica...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17539 **[Test build #75556 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75556/testReport)** for PR 17539 at commit [`c78ebe8`](https://github.com/apache/spark/commit/c7

[GitHub] spark issue #17544: [SPARK-20231] [SQL] Refactor star schema code for the su...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17544 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17543#discussion_r110010527 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1281,10 +1281,24 @@ class DAGScheduler( val failedSta

[GitHub] spark pull request #17544: [SPARK-20231] [SQL] Refactor star schema code for...

2017-04-05 Thread ioana-delaney
GitHub user ioana-delaney opened a pull request: https://github.com/apache/spark/pull/17544 [SPARK-20231] [SQL] Refactor star schema code for the subsequent star join detection in CBO ## What changes were proposed in this pull request? This commit moves star schema code fro

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 Said another way people need another layer to use spark standalone in secured environments anyway. --- If your project is set up for it, you can reply to this email and have your reply appea

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 To me it's basically the same as users including S3 credentials when submitting to spark standalone. Kerberos just requires more machinery. It might be a little harder to get at the spark conf

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/17543#discussion_r110009919 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1281,10 +1281,24 @@ class DAGScheduler( val failedSta

[GitHub] spark issue #17512: [SPARK-20196][PYTHON][SQL] update doc for catalog functi...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17512 **[Test build #7 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/7/testReport)** for PR 17512 at commit [`5ed1950`](https://github.com/apache/spark/commit/5e

[GitHub] spark issue #17523: [SPARK-20064][PySpark] Bump the PySpark verison number t...

2017-04-05 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17523 In _theory_ I can command Jenkins (the pr-status board says "asked to test" rather than "admin needed" but idk). Lets see if Jenkins retest this please does anything. --- If your project is se

[GitHub] spark pull request #17512: [SPARK-20196][PYTHON][SQL] update doc for catalog...

2017-04-05 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17512#discussion_r110009267 --- Diff: R/pkg/R/SQLContext.R --- @@ -544,12 +544,15 @@ sql <- function(x, ...) { dispatchFunc("sql(sqlQuery)", x, ...) } -#' Crea

[GitHub] spark issue #17523: [SPARK-20064][PySpark] Bump the PySpark verison number t...

2017-04-05 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17523 probably need someone who can command Jenkins (I can't) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17296: [SPARK-19953][ML] Random Forest Models use parent UID wh...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17296 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17296: [SPARK-19953][ML] Random Forest Models use parent UID wh...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17296 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75551/ Test PASSed. ---

[GitHub] spark issue #17296: [SPARK-19953][ML] Random Forest Models use parent UID wh...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17296 **[Test build #75551 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75551/testReport)** for PR 17296 at commit [`7c7ce13`](https://github.com/apache/spark/commit/7

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread kalvinnchau
Github user kalvinnchau commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r110004837 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -422,6 +431,2

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread kalvinnchau
Github user kalvinnchau commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r110004474 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -422,6 +431,2

[GitHub] spark issue #17543: [SPARK-20230] FetchFailedExceptions should invalidate fi...

2017-04-05 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/17543 cc @kayousterhout @markhamstra for feedback. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17543: [SPARK-20230] FetchFailedExceptions should invalidate fi...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17543 **[Test build #75554 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75554/testReport)** for PR 17543 at commit [`bc80aab`](https://github.com/apache/spark/commit/bc

[GitHub] spark pull request #17543: [SPARK-20230] FetchFailedExceptions should invali...

2017-04-05 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/17543 [SPARK-20230] FetchFailedExceptions should invalidate file caches in MapOutputTracker even if newer stages are launched ## What changes were proposed in this pull request? If you lose insta

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread kalvinnchau
Github user kalvinnchau commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r110003294 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -67,6 +67,8 @

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r110002201 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -422,6 +431,21 @@

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r110002396 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -67,6 +67,8 @@ pri

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r110002068 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -422,6 +431,21 @@

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17531 **[Test build #75553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75553/testReport)** for PR 17531 at commit [`42f49f2`](https://github.com/apache/spark/commit/42

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 That is not what you change does, though. If you want to change the master / worker scripts to refresh kerberos credentials, that would be a lot more acceptable. This change is just not acce

[GitHub] spark issue #17413: [SPARK-20085] [MESOS] Configurable mesos labels for exec...

2017-04-05 Thread mgummelt
Github user mgummelt commented on the issue: https://github.com/apache/spark/pull/17413 LGTM. @srowen Can we please get a merge? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread kalvinnchau
Github user kalvinnchau commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r10663 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -408,6 +410,2

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 That's right, but you still need a separate out of band process refreshing with the KDC. My thinking is why not have spark do that on your behalf? --- If your project is set up for it, you ca

[GitHub] spark issue #17531: [SPARK-20217][core] Executor should not fail stage if ki...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17531 **[Test build #75552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75552/testReport)** for PR 17531 at commit [`28ebc94`](https://github.com/apache/spark/commit/28

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r109998390 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor( setTaskFinishedAnd

[GitHub] spark pull request #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace impr...

2017-04-05 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16793 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

2017-04-05 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16793 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #16793: [SPARK-19454][PYTHON][SQL] DataFrame.replace improvement...

2017-04-05 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/16793 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the featu

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17533 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75547/ Test FAILed. ---

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17533 **[Test build #75547 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75547/testReport)** for PR 17533 at commit [`b0c3abc`](https://github.com/apache/spark/commit/b

[GitHub] spark issue #17533: [SPARK-20219] Schedule tasks based on size of input from...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17533 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-05 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17469 Jenkins OK to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #17523: [SPARK-20064][PySpark] Bump the PySpark verison number t...

2017-04-05 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/17523 not sure why Jenkins hasn't triggured. Lets try Jenkins Test This Please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your proje

[GitHub] spark issue #17296: [SPARK-19953][ML] Random Forest Models use parent UID wh...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17296 **[Test build #75551 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75551/testReport)** for PR 17296 at commit [`7c7ce13`](https://github.com/apache/spark/commit/7c

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75549/ Test FAILed. ---

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17540 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #75549 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75549/testReport)** for PR 17540 at commit [`f9342b5`](https://github.com/apache/spark/commit/f

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread mgummelt
Github user mgummelt commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r109991082 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -408,6 +410,22 @

[GitHub] spark issue #17537: [SPARK-20204][SQL][Followup] SQLConf should react to cha...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17537 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17537: [SPARK-20204][SQL][Followup] SQLConf should react to cha...

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17537 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75548/ Test FAILed. ---

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/17530 Then in your setup you can configure things so that the cluster already has the user's keytab; Spark doesn't need to distribute it for you. --- If your project is set up for it, you can reply to thi

[GitHub] spark issue #17537: [SPARK-20204][SQL][Followup] SQLConf should react to cha...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17537 **[Test build #75548 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75548/testReport)** for PR 17537 at commit [`9187cca`](https://github.com/apache/spark/commit/9

[GitHub] spark issue #17542: Corrects interval notation in doc comment

2017-04-05 Thread w3iBStime
Github user w3iBStime commented on the issue: https://github.com/apache/spark/pull/17542 I don't have a local clone of the code--just used the GitHub web UI for a small change. Unfortunately, it doesn't look like GitHub can search for "1.0]" . --- If your project is set up for it, yo

[GitHub] spark pull request #17413: [SPARK-20085] [MESOS] Configurable mesos labels f...

2017-04-05 Thread mgummelt
Github user mgummelt commented on a diff in the pull request: https://github.com/apache/spark/pull/17413#discussion_r109989797 --- Diff: docs/running-on-mesos.md --- @@ -368,6 +368,15 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] spark issue #17469: [SPARK-20132][Docs] Add documentation for column string ...

2017-04-05 Thread map222
Github user map222 commented on the issue: https://github.com/apache/spark/pull/17469 I think the latest commit addresses the formatting issues from above: removed spaces insides `(..)`, removed the `\n` newlines, and made the blockquotes more consistent with the rest of the code.

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17541 **[Test build #75550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75550/testReport)** for PR 17541 at commit [`02f4a02`](https://github.com/apache/spark/commit/0

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17541 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17541 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75550/ Test FAILed. ---

[GitHub] spark pull request #17469: [SPARK-20132][Docs] Add documentation for column ...

2017-04-05 Thread map222
Github user map222 commented on a diff in the pull request: https://github.com/apache/spark/pull/17469#discussion_r109989329 --- Diff: python/pyspark/sql/column.py --- @@ -250,11 +250,39 @@ def __iter__(self): raise TypeError("Column is not iterable") # s

[GitHub] spark issue #17530: [SPARK-5158] Access kerberized HDFS from Spark standalon...

2017-04-05 Thread themodernlife
Github user themodernlife commented on the issue: https://github.com/apache/spark/pull/17530 In our setup each user gets their own standalone cluster. Users cannot submit jobs to each other's clusters. By providing a keytab on cluster creation and having Spark manage renewal on behalf

[GitHub] spark issue #17542: Corrects interval notation in doc comment

2017-04-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17542 That's fine, can you look for other instances? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this featu

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-05 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/17540 @srowen, agreed. Closely related but not the same code paths. The question is: when should `withNewExecutionId` get called? I'm running the test suite now and this patch causes test failures

[GitHub] spark issue #17542: Corrects interval notation in doc comment

2017-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17542 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feat

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r109986108 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala --- @@ -98,7 +98,7 @@ class ExchangeSuite extends SparkPlanTest with

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r109986073 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/ExchangeSuite.scala --- @@ -70,7 +70,7 @@ class ExchangeSuite extends SparkPlanTest with

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17541#discussion_r109985386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -748,7 +748,7 @@ object SQLConf { .doubleConf .

[GitHub] spark pull request #17542: Corrects interval notation in doc comment

2017-04-05 Thread w3iBStime
GitHub user w3iBStime opened a pull request: https://github.com/apache/spark/pull/17542 Corrects interval notation in doc comment The random number generated by XORShiftRandom.nextDouble() is a value between zero and one, including zero but not including one. I.e., 0 <= x < 1 . I'v

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17541 **[Test build #75550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75550/testReport)** for PR 17541 at commit [`02f4a02`](https://github.com/apache/spark/commit/02

[GitHub] spark issue #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17541 cc @rxin @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17541: [SPARK-20229][SQL] add semanticHash to QueryPlan

2017-04-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/17541 [SPARK-20229][SQL] add semanticHash to QueryPlan ## What changes were proposed in this pull request? Like `Expression`, `QueryPlan` should also have a `semanticHash` method, then we can

[GitHub] spark issue #17535: [SPARK-20222][SQL] Bring back the Spark SQL UI when exec...

2017-04-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17535 Looks closely related to https://github.com/apache/spark/pull/17540 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project d

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-05 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17540 Looks closely related to https://github.com/apache/spark/pull/17535 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project d

[GitHub] spark issue #17471: [SPARK-3577] Report Spill size on disk for UnsafeExterna...

2017-04-05 Thread sitalkedia
Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/17471 I did not see we already have an open PR for this. Sure, I will add test to this PR and also file a separate JIRA. --- If your project is set up for it, you can reply to this email and have y

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r109980881 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor( setTaskFinishe

[GitHub] spark issue #17540: [SPARK-20213][SQL][UI] Fix DataFrameWriter operations in...

2017-04-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17540 **[Test build #75549 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75549/testReport)** for PR 17540 at commit [`f9342b5`](https://github.com/apache/spark/commit/f9

[GitHub] spark pull request #17531: [SPARK-20217][core] Executor should not fail stag...

2017-04-05 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/17531#discussion_r109979571 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -432,7 +432,7 @@ private[spark] class Executor( setTaskFinishe

<    1   2   3   4   >