[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225899#comment-14225899 ] Sean Owen commented on SPARK-4584: -- {{SecurityManager}} is something that loads of the JVM code consults, if it exists: {code} SecurityManager sm = SecurityManager.getSystemSecurityManager(); if (sm != null) { ... } {code} Setting any {{SecurityManager}} is like turning on a whole lot of not-cheap permission checks throughout the JDK. I think setting one is pretty undesirable from a performance perspective. It also precludes the possibility of enabling a real SecurityManager for contexts that want to although I find it unlikely that would ever really work with Spark. How about just documenting and telling users don't System.exit in your code, which is widely accepted as a no-no in Java/Scala anyway? 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Marcelo Vanzin Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226506#comment-14226506 ] Marcelo Vanzin commented on SPARK-4584: --- [~sowen] that was going to be my suggestion. For 1.2, I'll just remote the security manager and declare use System.exit() at your own risk - the behavior should then be the same as 1.1. Post 1.2, we could add some new exception (e.g. {{SparkAppException}}) that users can throw if they want the runtime to exit with a specific error code. But even that I don't think is strictly necessary - just a nice to have. BTW, we've tried to extend the security manager so that all operations except for {{checkExit}} are no-ops, but even that doesn't help. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Marcelo Vanzin Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14226637#comment-14226637 ] Apache Spark commented on SPARK-4584: - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/3484 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Marcelo Vanzin Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224185#comment-14224185 ] Sandy Ryza commented on SPARK-4584: --- I took a look at the jobs Nishkam ran before and after that commit. The second stage in the before job takes 69 seconds and the second stage in the after job takes 158 seconds. This seems to be caused by the individual tasks taking longer. Totally confused about how that commit could have caused the regression. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225313#comment-14225313 ] Nishkam Ravi commented on SPARK-4584: - Looked into this issue a bit more. Source of the problem is setupSystemSecurityManager(). If I comment out the invocation, we get the performance back. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225321#comment-14225321 ] Nishkam Ravi commented on SPARK-4584: - Looks like java.lang.SecurityManager is a hog. I'm looking to see if we use it elsewhere in Spark. If so, we might want to reconsider removing it everywhere and replacing it by something more efficient. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225471#comment-14225471 ] Nishkam Ravi commented on SPARK-4584: - I don't see SecurityManager anywhere else in Spark, which is great. Marcelo and I tried a lightweight version of SecurityManager with all methods except for checkExit() stubbed out and continue to see the performance issue. Maybe something like Runtime.addShutdownHook could be used instead? 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225479#comment-14225479 ] Nishkam Ravi commented on SPARK-4584: - Use of SecurityManager may be suppressing a JIT compiler optimization. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225554#comment-14225554 ] Andrew Or commented on SPARK-4584: -- Hey [~nravi] how much data are you shuffling? Can you check the Shuffle Write field on the Spark UI? 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225563#comment-14225563 ] Nishkam Ravi commented on SPARK-4584: - [~andrewor14] Just curious, why is that relevant here? 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225567#comment-14225567 ] Nishkam Ravi commented on SPARK-4584: - Around 2GB in the map stage and 1.5GB in the collect phase. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225593#comment-14225593 ] Andrew Or commented on SPARK-4584: -- It may not be, but I just wanted to get a sense of how large your dataset is. I wasn't able to reproduce the performance discrepancy between the two commits on my end. I'm doing a simple groupBy on spark-perf that shuffles about 16GB with 2000 partitions, and I haven't observed a significant performance difference (~2%, likely just noise). These results are based on the median over 10 runs. The other thing is that this only seems to affect the application master, which becomes the driver in cluster mode but otherwise shouldn't affect the application performance at all. What mode were you running in, client or cluster mode? How many partitions did you have? 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225603#comment-14225603 ] Nishkam Ravi commented on SPARK-4584: - I would recommend working with JavaWordCount. We don't see this regression for most of our other workloads either, including those from spark-perf. Don't think you would need a large input dataset to reproduce the problem. To see the perf diff, commenting out invocation to setupSystemSecurityManager() would suffice. I'm running YARN cluster mode and have 450 partitions. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Assignee: Sandy Ryza Priority: Blocker Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset in YARN cluster mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4584) 2x Performance regression for Spark-on-YARN
[ https://issues.apache.org/jira/browse/SPARK-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14223930#comment-14223930 ] Nishkam Ravi commented on SPARK-4584: - In YARN cluster mode. 2x Performance regression for Spark-on-YARN --- Key: SPARK-4584 URL: https://issues.apache.org/jira/browse/SPARK-4584 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.2.0 Reporter: Nishkam Ravi Significant performance regression observed for Spark-on-YARN (upto 2x) after 1.2 rebase. The offending commit is: 70e824f750aa8ed446eec104ba158b0503ba58a9 from Oct 7th. Problem can be reproduced with JavaWordCount against a large enough input dataset. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org