[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14626720#comment-14626720 ] Apache Spark commented on SPARK-4598: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/7399 Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Sub-task Components: Web UI Affects Versions: 1.2.0 Reporter: meiyoula In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228677#comment-14228677 ] Josh Rosen commented on SPARK-4598: --- I was able to reproduce this issue using the SparkPi example. I captured a heap dump in YourKit and it looks like the raw, uncompressed HTML of the Stage page is over 75 megabytes and the Scala XML tree corresponding to the page is hundreds of megabytes (~200). The actual HTML itself should be highly compressible, since it contains a lot of redundancy. In the longer-term, we could also explore approaches that perform more of the rendering / formatting in the browser using Javascript; this would allow us to send the task table data as JSON or CSV, which would contain much less redundancy; we could also avoid the overheads of the XML library. As as shorter-term hack, though, I wonder whether there's some trick to reduce the overall memory usage of the intermediate scala.xml data structures, since it seems odd that we end up materializing such a large object graph when it seems like large portions of it could be lazily streamed. Maybe there's some simple trick where sprinkling in a few {{.iterator}} calls would improve things. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228684#comment-14228684 ] Josh Rosen commented on SPARK-4598: --- Actually, it might be pretty hard to trim down the memory usage via scala.xml tricks. Adding some functionality to return the stage table information as CSV data might be a cleaner way to handle this. This doesn't necessarily imply using AJAX requests to load the data from the backend; we could just dump the CSV data into a script tag and load it via Javascript. We might be able to hide all of this complexity behind the StageTableBase class, so we could this change to a small section of the code. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228438#comment-14228438 ] Masayoshi TSUZUKI commented on SPARK-4598: -- Discussion about this problem seems to be on the github PR ticket. https://github.com/apache/spark/pull/3456 Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228453#comment-14228453 ] Masayoshi TSUZUKI commented on SPARK-4598: -- The similar problem was reported on JIRA (https://issues.apache.org/jira/browse/SPARK-2017) but it's about the client side problem. When I saw the SPARK-2017 problem, I produced over 1,000,000 tasks but server didn't stop with OOM (just my web browser became unresponsive for several minutes). And @rxin and @carlosfuertes also didn't seem to get the server side OOM. What's the difference? The souce has been changed? It might be a clue as to solve the OOM to have a closer look at the difference. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228508#comment-14228508 ] Josh Rosen commented on SPARK-4598: --- [~meiyoula], Do you have a sample job / workload that will let me reproduce this issue? Which Spark version are you using and how big is your driver memory? Do you know if this is a regression from an earlier Spark version? Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228646#comment-14228646 ] meiyoula commented on SPARK-4598: - [~joshrosen], I use the this two-day's github master code to test this, and just run the example SparkPi with default driver memory. It is the command: ./spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client ../lib/spark-examples*.jar 10 When the application is running and has executed 50,000 tasks, I open the stagepage in SparkUI, the web shutdown; When the application is finished, I open the stagepage in HistoryServer, the web shutdown. Attention, the HistoryServer memory is also use default value. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227842#comment-14227842 ] Patrick Wendell commented on SPARK-4598: Having sorting with pagination seems very difficult to do correctly since we rely on javascript for sorting in the frontent. It would be helpful to understand the exact memory requirements of serving hundreds of thousands of tasks. Where is the memory from? Can we just optimize the use of memory? We need to store all of those tasks anyways in int he driver. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14228043#comment-14228043 ] meiyoula commented on SPARK-4598: - Yearh, optimize the use of memory maybe can resolve the problem once, but it's not an effective solution. Sorting is before pagination, so it has no problem. Using paginationi in HistoryServerSparkUI can lower the memory requirements, why don't do this? It will be helpful to the spark cluster capabilities and good to spark users. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14224810#comment-14224810 ] Patrick Wendell commented on SPARK-4598: It is a good idea to paginate this page. However, can you also work around this by having more driver memory? I'm worried that pagination will break important things like sorting. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225557#comment-14225557 ] WangTaoTheTonic commented on SPARK-4598: I think pagination is a better idea. Allocating more driver memory just for seeing this page is not very appropriate. And another question, how much memory is more? We couldn't estimate memory requirement according to numbers of tasks everytime. Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4598) Paginate stage page to avoid OOM with 100,000 tasks
[ https://issues.apache.org/jira/browse/SPARK-4598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225566#comment-14225566 ] xukun commented on SPARK-4598: -- i think you can sort the task data first and then paginate the page. i think the origin data do not need much memory. the task page need much memory Paginate stage page to avoid OOM with 100,000 tasks - Key: SPARK-4598 URL: https://issues.apache.org/jira/browse/SPARK-4598 Project: Spark Issue Type: Bug Components: Spark Core Reporter: meiyoula Priority: Critical In HistoryServer stage page, clicking the task href in Description, it occurs the GC error. The detail error message is: 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-352] | Error for /history/application_1416206401491_0010/stages/stage/ | org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:590) java.lang.OutOfMemoryError: GC overhead limit exceeded 2014-11-17 16:36:30,851 | WARN | [qtp1083955615-364] | handle failed | org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:697) java.lang.OutOfMemoryError: GC overhead limit exceeded -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org