[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-26 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225899#comment-14225899
 ] 

Sean Owen commented on SPARK-4584:
--

{{SecurityManager}} is something that loads of the JVM code consults, if it 
exists:

{code}
SecurityManager sm = SecurityManager.getSystemSecurityManager();
if (sm != null) {
  ...
}
{code}

Setting any {{SecurityManager}} is like turning on a whole lot of not-cheap 
permission checks throughout the JDK. I think setting one is pretty undesirable 
from a performance perspective. It also precludes the possibility of enabling a 
real SecurityManager for contexts that want to although I find it unlikely that 
would ever really work with Spark.

How about just documenting and telling users don't System.exit in your code, 
which is widely accepted as a no-no in Java/Scala anyway?

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Marcelo Vanzin
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-26 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226506#comment-14226506
 ] 

Marcelo Vanzin commented on SPARK-4584:
---

[~sowen] that was going to be my suggestion. For 1.2, I'll just remote the 
security manager and declare use System.exit() at your own risk - the 
behavior should then be the same as 1.1. Post 1.2, we could add some new 
exception (e.g. {{SparkAppException}}) that users can throw if they want the 
runtime to exit with a specific error code. But even that I don't think is 
strictly necessary - just a nice to have.

BTW, we've tried to extend the security manager so that all operations except 
for {{checkExit}} are no-ops, but even that doesn't help.

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Marcelo Vanzin
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226637#comment-14226637
 ] 

Apache Spark commented on SPARK-4584:
-

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/3484

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Marcelo Vanzin
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Sandy Ryza (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224185#comment-14224185
 ] 

Sandy Ryza commented on SPARK-4584:
---

I took a look at the jobs Nishkam ran before and after that commit.  The second 
stage in the before job takes 69 seconds and the second stage in the after 
job takes 158 seconds.  This seems to be caused by the individual tasks taking 
longer.

Totally confused about how that commit could have caused the regression.

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225313#comment-14225313
 ] 

Nishkam Ravi commented on SPARK-4584:
-

Looked into this issue a bit more. Source of the problem is 
setupSystemSecurityManager(). If I comment out the invocation, we get the 
performance back. 

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225321#comment-14225321
 ] 

Nishkam Ravi commented on SPARK-4584:
-

Looks like java.lang.SecurityManager is a hog. I'm looking to see if we use it 
elsewhere in Spark. If so, we might want to reconsider removing it everywhere 
and replacing it by something more efficient.

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225471#comment-14225471
 ] 

Nishkam Ravi commented on SPARK-4584:
-

I don't see SecurityManager anywhere else in Spark, which is great. 

Marcelo and I tried a lightweight version of SecurityManager with all methods 
except for checkExit() stubbed out and continue to see the performance issue. 

Maybe something like Runtime.addShutdownHook could be used instead?

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225479#comment-14225479
 ] 

Nishkam Ravi commented on SPARK-4584:
-

Use of SecurityManager may be suppressing a JIT compiler optimization.

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225554#comment-14225554
 ] 

Andrew Or commented on SPARK-4584:
--

Hey [~nravi] how much data are you shuffling? Can you check the Shuffle Write 
field on the Spark UI?

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225563#comment-14225563
 ] 

Nishkam Ravi commented on SPARK-4584:
-

[~andrewor14] Just curious, why is that relevant here? 

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225567#comment-14225567
 ] 

Nishkam Ravi commented on SPARK-4584:
-

Around 2GB in the map stage and 1.5GB in the collect phase.

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225593#comment-14225593
 ] 

Andrew Or commented on SPARK-4584:
--

It may not be, but I just wanted to get a sense of how large your dataset is. I 
wasn't able to reproduce the performance discrepancy between the two commits on 
my end. I'm doing a simple groupBy on spark-perf that shuffles about 16GB with 
2000 partitions, and I haven't observed a significant performance difference 
(~2%, likely just noise). These results are based on the median over 10 runs.

The other thing is that this only seems to affect the application master, which 
becomes the driver in cluster mode but otherwise shouldn't affect the 
application performance at all. What mode were you running in, client or 
cluster mode? How many partitions did you have?

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-25 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225603#comment-14225603
 ] 

Nishkam Ravi commented on SPARK-4584:
-

I would recommend working with JavaWordCount. We don't see this regression for 
most of our other workloads either, including those from spark-perf. Don't 
think you would need a large input dataset to reproduce the problem. To see the 
perf diff, commenting out invocation to setupSystemSecurityManager() would 
suffice. I'm running YARN cluster mode and have 450 partitions. 

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi
Assignee: Sandy Ryza
Priority: Blocker

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset in YARN cluster mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN

2014-11-24 Thread Nishkam Ravi (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223930#comment-14223930
 ] 

Nishkam Ravi commented on SPARK-4584:
-

In YARN cluster mode.

 2x Performance regression for Spark-on-YARN
 ---

 Key: SPARK-4584
 URL: https://issues.apache.org/jira/browse/SPARK-4584
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Nishkam Ravi

 Significant performance regression observed for Spark-on-YARN (upto 2x) after 
 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 
  from Oct 7th. Problem can be reproduced with JavaWordCount against a large 
 enough input dataset.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org