[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-16 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11175


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-16 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-185044331
  
@kayousterhout did you get some time to look into this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-184129148
  
@kayousterhout I'll let you push the button on this one when you're sure 
but it looks OK to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183897497
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183897498
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51266/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183896933
  
**[Test build #51266 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51266/consoleFull)**
 for PR 11175 at commit 
[`26b4370`](https://github.com/apache/spark/commit/26b43706ad5e0ca0fb84a9af0b089383db0b02a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-14 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183877104
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183880185
  
**[Test build #51266 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51266/consoleFull)**
 for PR 11175 at commit 
[`26b4370`](https://github.com/apache/spark/commit/26b43706ad5e0ca0fb84a9af0b089383db0b02a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-182778235
  
I don't quite see why this solves a lock problem. Should this be a set?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11167#discussion_r52653078
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -177,13 +177,15 @@ private[spark] class TaskSetManager(
 
   var emittedTaskSizeWarning = false
 
-  /** Add a task to all the pending-task lists that it should be on. */
+  /**
+* Add a task to all the pending-task lists that it should be on.
+* Note that it's okay if we add a task to the same queue twice because
+* dequeueTaskFromList will skip already-running tasks.
--- End diff --

Can you move your new comment to the comment above pendingTasksForExecutor? 
  I'd change that comment to say something like: 

// Set of pending tasks for each executor. These collections are actually
// treated as stacks, in which new tasks are added to the end of the
// ArrayBuffer and removed from the end. This makes it faster to detect
// tasks that repeatedly fail because whenever a task failed, it is put
// back at the head of the stack. These collections may contain duplicates
// for two reasons:
// (1) : Tasks are only removed lazily; when a task is launched, it remains 
in all the pending lists except
// the one that it was launched from.
// (2): Tasks may be re-added to these lists multiple times as a result
// of failures.
// Duplicates are handled in dequeueTaskFromList, which ensures that a
// task hasn't already started running before launching it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183010519
  
OK, but the JIRA describes a deadlock, or at least that's how it reads.
@kayousterhout does that sound right? it sounds OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183011444
  
Sorry, I did not make it clear. Its not a deadlock, the lock is held for a 
very long time. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183034458
  
Thanks @kayousterhout. I updated the comment accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183022065
  
Thanks for fixing this! You're right that the change you linked to is 
broken; when I removed the readding parameter, I should have killed the 
if-statement. I updated the JIRA description to describe this problem somewhat 
better and link to the offending change.

This should be merged into 1.6 when it's ready since it's a regression.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183087398
  
No problem. Created a new pull request 
https://github.com/apache/spark/pull/11175


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/11167#discussion_r52690994
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,9 +186,7 @@ private[spark] class TaskSetManager(
   private def addPendingTask(index: Int) {
 // Utility method that adds `index` to a list only if it's not already 
there
 def addTo(list: ArrayBuffer[Int]) {
-  if (!list.contains(index)) {
-list += index
-  }
+  list += index
 }
--- End diff --

Sorry for not getting to this party until after the merge, but it strikes 
me that `addTo` is now only making a negative contribution toward understanding 
the code -- mostly because the "...if it's not already there" comment is now 
wrong.  I don't see why the handful of uses of `addTo` shouldn't just be 
replaced with, e.g.:
```scala
pendingTasksForExecutor.getOrElseUpdate(e.executorId, new ArrayBuffer) += 
index
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/11167#discussion_r52692951
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,9 +186,7 @@ private[spark] class TaskSetManager(
   private def addPendingTask(index: Int) {
 // Utility method that adds `index` to a list only if it's not already 
there
 def addTo(list: ArrayBuffer[Int]) {
-  if (!list.contains(index)) {
-list += index
-  }
+  list += index
 }
--- End diff --

Updated the PR : #11175 according to @markhamstra 's comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183126342
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on a diff in the pull request:

https://github.com/apache/spark/pull/11167#discussion_r52691236
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,9 +186,7 @@ private[spark] class TaskSetManager(
   private def addPendingTask(index: Int) {
 // Utility method that adds `index` to a list only if it's not already 
there
 def addTo(list: ArrayBuffer[Int]) {
-  if (!list.contains(index)) {
-list += index
-  }
+  list += index
 }
--- End diff --

You're in luck because I accidentally merged before Jenkins got to it, so 
we reverted it pending tests. There's a new PR here: 
https://github.com/apache/spark/pull/11175

Would you mind sticking this comment there? Agree with your sentiment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/11175#discussion_r52692010
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
@@ -181,9 +186,7 @@ private[spark] class TaskSetManager(
   private def addPendingTask(index: Int) {
 // Utility method that adds `index` to a list only if it's not already 
there
 def addTo(list: ArrayBuffer[Int]) {
-  if (!list.contains(index)) {
-list += index
-  }
+  list += index
 }
--- End diff --

It strikes me that `addTo` is now only making a negative contribution 
toward understanding the code -- mostly because the "...if it's not already 
there" comment is now wrong. I don't see why the handful of uses of `addTo` 
shouldn't just be replaced with, e.g.:
```scala
pendingTasksForExecutor.getOrElseUpdate(e.executorId, new ArrayBuffer) += 
index
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183066243
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183081536
  
@sitalkedia can you re-open this? (It was closed automatically when 
inadventently merged)  I don't think we can test it unless it's open


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183082872
  
@kayousterhout I am not seeing an option to reopen this pull request. 
Should I create a new pull request?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183083738
  
That would be great thanks!! Sorry for the trouble caused by my rushed 
merging!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
GitHub user sitalkedia opened a pull request:

https://github.com/apache/spark/pull/11175

[SPARK-13279] Remove unnecessary duplicate check in addPendingTask fu…

…nction

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sitalkedia/spark fix_stuck_driver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11175


commit 3fe1af8fd9e7e5807a3767919407d32d50fc6313
Author: Sital Kedia 
Date:   2016-02-11T05:58:52Z

[SPARK-13279] Remove unnecessary duplicate check in addPendingTask function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183066743
  
I reverted this since it hasn't passed tests yet.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183066629
  
I just merged this but then realized Jenkins hadn't run yet... reverted the 
pushed version but will re-merge when Jenkins passes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11167


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread kayousterhout
Github user kayousterhout commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183087801
  
Jenkins this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11175#issuecomment-183087708
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-11 Thread sitalkedia
Github user sitalkedia commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-183006784
  
As you can see from the jstack of the driver  http://pastebin.com/m8CP6VMv. 
The dag-scheduler-event-loop thread has taken a lock and is spending a lot of 
time in the addPendingTask function. For each task added, It is iterating over 
the list of tasks to check for duplicates. Which becomes an o(n2) operation and 
when the number of tasks is huge, it takes more than 5 minutes. 

As mentioned in the comment, the addPendingTask function does not really 
need to check for duplicates because dequeueTaskFromList will skip already 
running tasks. If we remove the duplicate check from addPendingTask function, 
then the time period for which the lock is held is very short and things are 
working fine.  We can not make this as a set because we treat the list of 
pending tasks as a stack, please see 
https://github.com/sitalkedia/spark/blob/fix_stuck_driver/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L113
 . 

Please note that this is a regression from Spark 1.5 and is introduced in 
https://github.com/facebook/FB-Spark/commit/3535b91ddc9fd05b613a121e09263b0f378bd5fa#diff-bad3987c83bd22d46416d3dd9d208e76L789






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-182732313
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-10 Thread sitalkedia
GitHub user sitalkedia opened a pull request:

https://github.com/apache/spark/pull/11167

[SPARK-13279] Remove unnecessary duplicate check in addPendingTask fu…

…nction

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sitalkedia/spark fix_stuck_driver

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11167


commit c3068010427fc9a6d9f6110ec39f9caf08c10a5e
Author: Sital Kedia 
Date:   2016-02-11T05:58:52Z

[SPARK-13279] Remove unnecessary duplicate check in addPendingTask function




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13279] Remove unnecessary duplicate che...

2016-02-10 Thread aching
Github user aching commented on the pull request:

https://github.com/apache/spark/pull/11167#issuecomment-182736721
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org