[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error
[ https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030653#comment-17030653 ] Behroz Sikander commented on SPARK-30686: - I have pinged. I hope someone can help with the ticket. > Spark 2.4.4 metrics endpoint throwing error > --- > > Key: SPARK-30686 > URL: https://issues.apache.org/jira/browse/SPARK-30686 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Behroz Sikander >Priority: Major > > I am using Spark-standalone in HA mode with zookeeper. > Once the driver is up and running, whenever I try to access the metrics api > using the following URL > http://master_address/proxy/app-20200130041234-0123/api/v1/applications > I get the following exception. > It seems that the request never even reaches the spark code. It would be > helpful if somebody can help me. > {code:java} > HTTP ERROR 500 > Problem accessing /api/v1/applications. Reason: > Server Error > Caused by: > java.lang.NullPointerException: while trying to invoke the method > org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, > javax.servlet.http.HttpServletRequest, > javax.servlet.http.HttpServletResponse) of a null object loaded from field > org.glassfish.jersey.servlet.ServletContainer.webComponent of an object > loaded from local variable 'this' > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) > at > org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at > org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:808) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error
[ https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027291#comment-17027291 ] Behroz Sikander commented on SPARK-30686: - Could it be linked to [https://github.com/apache/spark/pull/19748] ? > Spark 2.4.4 metrics endpoint throwing error > --- > > Key: SPARK-30686 > URL: https://issues.apache.org/jira/browse/SPARK-30686 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Behroz Sikander >Priority: Major > > I am using Spark-standalone in HA mode with zookeeper. > Once the driver is up and running, whenever I try to access the metrics api > using the following URL > http://master_address/proxy/app-20200130041234-0123/api/v1/applications > I get the following exception. > It seems that the request never even reaches the spark code. It would be > helpful if somebody can help me. > {code:java} > HTTP ERROR 500 > Problem accessing /api/v1/applications. Reason: > Server Error > Caused by: > java.lang.NullPointerException: while trying to invoke the method > org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, > javax.servlet.http.HttpServletRequest, > javax.servlet.http.HttpServletResponse) of a null object loaded from field > org.glassfish.jersey.servlet.ServletContainer.webComponent of an object > loaded from local variable 'this' > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) > at > org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at > org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:808) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error
[ https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026929#comment-17026929 ] Behroz Sikander commented on SPARK-30686: - 1- I don't know the exact steps, i tried to reproduce but I couldn't figure out. 2- I don't understand the question. If you are asking about the response of /api/v1/version endpoint, then yes, it is also returning this. > Spark 2.4.4 metrics endpoint throwing error > --- > > Key: SPARK-30686 > URL: https://issues.apache.org/jira/browse/SPARK-30686 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: Behroz Sikander >Priority: Major > > I am using Spark-standalone in HA mode with zookeeper. > Once the driver is up and running, whenever I try to access the metrics api > using the following URL > http://master_address/proxy/app-20200130041234-0123/api/v1/applications > I get the following exception. > It seems that the request never even reaches the spark code. It would be > helpful if somebody can help me. > {code:java} > HTTP ERROR 500 > Problem accessing /api/v1/applications. Reason: > Server Error > Caused by: > java.lang.NullPointerException: while trying to invoke the method > org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, > javax.servlet.http.HttpServletRequest, > javax.servlet.http.HttpServletResponse) of a null object loaded from field > org.glassfish.jersey.servlet.ServletContainer.webComponent of an object > loaded from local variable 'this' > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) > at > org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) > at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) > at > org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) > at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at > org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:808) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error
[ https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026805#comment-17026805 ] Behroz Sikander commented on SPARK-30686: - Sometimes, I have also seen {code:java} Caused by: java.lang.NoClassDefFoundError: org/glassfish/jersey/internal/inject/AbstractBinder at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:863) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:529) at java.net.URLClassLoader.access$100(URLClassLoader.java:75) at java.net.URLClassLoader$1.run(URLClassLoader.java:430) at java.net.URLClassLoader$1.run(URLClassLoader.java:424) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:423) at java.lang.ClassLoader.loadClass(ClassLoader.java:490) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at org.glassfish.jersey.media.sse.SseFeature.configure(SseFeature.java:123) at org.glassfish.jersey.model.internal.CommonConfig.configureFeatures(CommonConfig.java:730) at org.glassfish.jersey.model.internal.CommonConfig.configureMetaProviders(CommonConfig.java:648) at org.glassfish.jersey.server.ResourceConfig.configureMetaProviders(ResourceConfig.java:829) at org.glassfish.jersey.server.ApplicationHandler.initialize(ApplicationHandler.java:453) at org.glassfish.jersey.server.ApplicationHandler.access$500(ApplicationHandler.java:184) at org.glassfish.jersey.server.ApplicationHandler$3.call(ApplicationHandler.java:350) at org.glassfish.jersey.server.ApplicationHandler$3.call(ApplicationHandler.java:347) at org.glassfish.jersey.internal.Errors.process(Errors.java:315) at org.glassfish.jersey.internal.Errors.process(Errors.java:297) at org.glassfish.jersey.internal.Errors.processWithException(Errors.java:255) at org.glassfish.jersey.server.ApplicationHandler.(ApplicationHandler.java:347) at org.glassfish.jersey.servlet.WebComponent.(WebComponent.java:392) at org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:177) at org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:369) at javax.servlet.GenericServlet.init(GenericServlet.java:244) at org.spark_project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:643) at org.spark_project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:499) at org.spark_project.jetty.servlet.ServletHolder.ensureInstance(ServletHolder.java:791) at org.spark_project.jetty.servlet.ServletHolder.prepare(ServletHolder.java:776) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:579) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.spark_project.jetty.server.Server.handle(Server.java:539) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:808) Caused by: java.lang.ClassNotFoundException:
[jira] [Created] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error
Behroz Sikander created SPARK-30686: --- Summary: Spark 2.4.4 metrics endpoint throwing error Key: SPARK-30686 URL: https://issues.apache.org/jira/browse/SPARK-30686 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4 Reporter: Behroz Sikander I am using Spark-standalone in HA mode with zookeeper. Once the driver is up and running, whenever I try to access the metrics api using the following URL http://master_address/proxy/app-20200130041234-0123/api/v1/applications I get the following exception. It seems that the request never even reaches the spark code. It would be helpful if somebody can help me. {code:java} HTTP ERROR 500 Problem accessing /api/v1/applications. Reason: Server Error Caused by: java.lang.NullPointerException: while trying to invoke the method org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) of a null object loaded from field org.glassfish.jersey.servlet.ServletContainer.webComponent of an object loaded from local variable 'this' at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228) at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.spark_project.jetty.server.Server.handle(Server.java:539) at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:808) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26302) retainedBatches configuration can eat up memory on driver
[ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718824#comment-16718824 ] Behroz Sikander commented on SPARK-26302: - By code, i meant a warning in logs. I will prepare a documentation commit. > retainedBatches configuration can eat up memory on driver > - > > Key: SPARK-26302 > URL: https://issues.apache.org/jira/browse/SPARK-26302 > Project: Spark > Issue Type: Improvement > Components: Documentation, DStreams >Affects Versions: 2.4.0 >Reporter: Behroz Sikander >Priority: Minor > Attachments: heap_dump_detail.png > > > The documentation for configuration "spark.streaming.ui.retainedBatches" says > "How many batches the Spark Streaming UI and status APIs remember before > garbage collecting" > The default for this configuration is 1000. > From our experience, the documentation is incomplete and we found it the hard > way. > The size of a single BatchUIData is around 750KB. Increasing this value to > something like 5000 increases the total size to ~4GB. > If your driver heap is not big enough, the job starts to slow down, has > frequent GCs and has long scheduling days. Once the heap is full, the job > cannot be recovered. > A note of caution should be added to the documentation to let users know the > impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26302) retainedBatches configuration can eat up memory on driver
[ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718623#comment-16718623 ] Behroz Sikander commented on SPARK-26302: - If I get it correct, then the idea is to add a general warning but where? Documentation or somewhere in code ? > retainedBatches configuration can eat up memory on driver > - > > Key: SPARK-26302 > URL: https://issues.apache.org/jira/browse/SPARK-26302 > Project: Spark > Issue Type: Improvement > Components: Documentation, DStreams >Affects Versions: 2.4.0 >Reporter: Behroz Sikander >Priority: Minor > Attachments: heap_dump_detail.png > > > The documentation for configuration "spark.streaming.ui.retainedBatches" says > "How many batches the Spark Streaming UI and status APIs remember before > garbage collecting" > The default for this configuration is 1000. > From our experience, the documentation is incomplete and we found it the hard > way. > The size of a single BatchUIData is around 750KB. Increasing this value to > something like 5000 increases the total size to ~4GB. > If your driver heap is not big enough, the job starts to slow down, has > frequent GCs and has long scheduling days. Once the heap is full, the job > cannot be recovered. > A note of caution should be added to the documentation to let users know the > impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26302) retainedBatches configuration can cause memory leak
[ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715544#comment-16715544 ] Behroz Sikander commented on SPARK-26302: - >> I think the same applies to all spark.ui.*retained* parameters. If a warning >> added here all other places has to be adapted. I agree. >> What I can imagine is a general warning but it would be hard to find a >> committer to merge it. The impact is so indirect that its really hard to debug this issue. It is worth the effort to find a committer because a warning will be really helpful considering the number of configurations properties also. >> Is it really memory leak and not slow processing or out of memory? It is slow processing, long scheduling delays followed by out of memory. > retainedBatches configuration can cause memory leak > --- > > Key: SPARK-26302 > URL: https://issues.apache.org/jira/browse/SPARK-26302 > Project: Spark > Issue Type: Improvement > Components: Documentation, DStreams >Affects Versions: 2.4.0 >Reporter: Behroz Sikander >Priority: Minor > Attachments: heap_dump_detail.png > > > The documentation for configuration "spark.streaming.ui.retainedBatches" says > "How many batches the Spark Streaming UI and status APIs remember before > garbage collecting" > The default for this configuration is 1000. > From our experience, the documentation is incomplete and we found it the hard > way. > The size of a single BatchUIData is around 750KB. Increasing this value to > something like 5000 increases the total size to ~4GB. > If your driver heap is not big enough, the job starts to slow down, has > frequent GCs and has long scheduling days. Once the heap is full, the job > cannot be recovered. > A note of caution should be added to the documentation to let users know the > impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26302) retainedBatches configuration can cause memory leak
Behroz Sikander created SPARK-26302: --- Summary: retainedBatches configuration can cause memory leak Key: SPARK-26302 URL: https://issues.apache.org/jira/browse/SPARK-26302 Project: Spark Issue Type: Improvement Components: Documentation Affects Versions: 2.4.0 Reporter: Behroz Sikander Attachments: heap_dump_detail.png The documentation for configuration "spark.streaming.ui.retainedBatches" says "How many batches the Spark Streaming UI and status APIs remember before garbage collecting" The default for this configuration is 1000. >From our experience, the documentation is incomplete and we found it the hard >way. The size of a single BatchUIData is around 750KB. Increasing this value to something like 5000 increases the total size to ~4GB. If your driver heap is not big enough, the job starts to slow down, has frequent GCs and has long scheduling days. Once the heap is full, the job cannot be recovered. A note of caution should be added to the documentation to let users know the impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26302) retainedBatches configuration can cause memory leak
[ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712559#comment-16712559 ] Behroz Sikander commented on SPARK-26302: - I am willing to do a PR for documentation once someone can give a go ahead. > retainedBatches configuration can cause memory leak > --- > > Key: SPARK-26302 > URL: https://issues.apache.org/jira/browse/SPARK-26302 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.4.0 >Reporter: Behroz Sikander >Priority: Minor > Attachments: heap_dump_detail.png > > > The documentation for configuration "spark.streaming.ui.retainedBatches" says > "How many batches the Spark Streaming UI and status APIs remember before > garbage collecting" > The default for this configuration is 1000. > From our experience, the documentation is incomplete and we found it the hard > way. > The size of a single BatchUIData is around 750KB. Increasing this value to > something like 5000 increases the total size to ~4GB. > If your driver heap is not big enough, the job starts to slow down, has > frequent GCs and has long scheduling days. Once the heap is full, the job > cannot be recovered. > A note of caution should be added to the documentation to let users know the > impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26302) retainedBatches configuration can cause memory leak
[ https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Behroz Sikander updated SPARK-26302: Attachment: heap_dump_detail.png > retainedBatches configuration can cause memory leak > --- > > Key: SPARK-26302 > URL: https://issues.apache.org/jira/browse/SPARK-26302 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 2.4.0 >Reporter: Behroz Sikander >Priority: Minor > Attachments: heap_dump_detail.png > > > The documentation for configuration "spark.streaming.ui.retainedBatches" says > "How many batches the Spark Streaming UI and status APIs remember before > garbage collecting" > The default for this configuration is 1000. > From our experience, the documentation is incomplete and we found it the hard > way. > The size of a single BatchUIData is around 750KB. Increasing this value to > something like 5000 increases the total size to ~4GB. > If your driver heap is not big enough, the job starts to slow down, has > frequent GCs and has long scheduling days. Once the heap is full, the job > cannot be recovered. > A note of caution should be added to the documentation to let users know the > impact of this seemingly harmless configuration property. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master
[ https://issues.apache.org/jira/browse/SPARK-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624790#comment-16624790 ] Behroz Sikander commented on SPARK-24794: - Can someone please have a look at this PR? > DriverWrapper should have both master addresses in -Dspark.master > - > > Key: SPARK-24794 > URL: https://issues.apache.org/jira/browse/SPARK-24794 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.2.1 >Reporter: Behroz Sikander >Priority: Major > > In standalone cluster mode, one could launch a Driver with supervise mode > enabled. Spark launches the driver with a JVM argument -Dspark.master which > is set to [host and port of current > master|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149]. > > During the life of context, the spark masters can switch due to any reason. > After that if the driver dies unexpectedly and comes up it tries to connect > with the master which was set initially with -Dspark.master but that master > is in STANDBY mode. The context tries multiple times to connect to standby > and then just kills itself. > > *Suggestion:* > While launching the driver process, Spark master should use the [spark.master > passed as > input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124] > instead of master and port of the current master. > Log messages that we observe: > > {code:java} > 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 > org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: > Connecting to master spark://10.100.100.22:7077.. > . > 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 > org.apache.spark.network.client.TransportClientFactory []: Successfully > created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in > bootstraps) > . > 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 > org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: > Connecting to master spark://10.100.100.22:7077... > . > 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 > org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: > Connecting to master spark://10.100.100.22:7077... > . > 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread > org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application > has been killed. Reason: All masters are unresponsive! Giving up.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master
[ https://issues.apache.org/jira/browse/SPARK-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Behroz Sikander updated SPARK-24794: Description: In standalone cluster mode, one could launch a Driver with supervise mode enabled. Spark launches the driver with a JVM argument -Dspark.master which is set to [host and port of current master|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149]. During the life of context, the spark masters can switch due to any reason. After that if the driver dies unexpectedly and comes up it tries to connect with the master which was set initially with -Dspark.master but that master is in STANDBY mode. The context tries multiple times to connect to standby and then just kills itself. *Suggestion:* While launching the driver process, Spark master should use the [spark.master passed as input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124] instead of master and port of the current master. Log messages that we observe: {code:java} 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077.. . 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 org.apache.spark.network.client.TransportClientFactory []: Successfully created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps) . 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application has been killed. Reason: All masters are unresponsive! Giving up.{code} was: In standalone cluster mode, one could launch a Driver with supervise mode enabled. Spark launches the driver with a JVM argument -Dspark.master which is set to [host and port of current master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149]] During the life of context, the spark masters can switch due to any reason. After that if the driver dies unexpectedly and comes up it tries to connect with the master which was set initially with -Dspark.master but that master is in STANDBY mode. The context tries multiple times to connect to standby and then just kills itself. *Suggestion:* While launching the driver process, Spark master should use the [spark.master passed as input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124] instead of master and port of the current master. Log messages that we observe: {code:java} 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077.. . 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 org.apache.spark.network.client.TransportClientFactory []: Successfully created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps) . 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application has been killed. Reason: All masters are unresponsive! Giving up.{code} > DriverWrapper should have both master addresses in -Dspark.master > - > > Key: SPARK-24794 > URL: https://issues.apache.org/jira/browse/SPARK-24794 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.2.1 >Reporter: Behroz Sikander >Priority: Major > > In standalone cluster mode, one could launch a Driver with supervise mode > enabled. Spark launches the driver with a JVM argument -Dspark.master which > is set to [host and port of current >
[jira] [Created] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master
Behroz Sikander created SPARK-24794: --- Summary: DriverWrapper should have both master addresses in -Dspark.master Key: SPARK-24794 URL: https://issues.apache.org/jira/browse/SPARK-24794 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 2.2.1 Reporter: Behroz Sikander In standalone cluster mode, one could launch a Driver with supervise mode enabled. Spark launches the driver with a JVM argument -Dspark.master which is set to [host and port of current master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149].] During the life of context, the spark masters can switch due to any reason. After that if the driver dies unexpectedly and comes up it tries to connect with the master which was set initially with -Dspark.master but that master is in STANDBY mode. The context tries multiple times to connect to standby and then just kills itself. *Suggestion:* While launching the driver process, Spark master should use the [spark.master passed as input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124] instead of master and port of the current master. Log messages that we observe: {code:java} 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077.. . 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 org.apache.spark.network.client.TransportClientFactory []: Successfully created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps) . 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application has been killed. Reason: All masters are unresponsive! Giving up.{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master
[ https://issues.apache.org/jira/browse/SPARK-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Behroz Sikander updated SPARK-24794: Description: In standalone cluster mode, one could launch a Driver with supervise mode enabled. Spark launches the driver with a JVM argument -Dspark.master which is set to [host and port of current master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149]] During the life of context, the spark masters can switch due to any reason. After that if the driver dies unexpectedly and comes up it tries to connect with the master which was set initially with -Dspark.master but that master is in STANDBY mode. The context tries multiple times to connect to standby and then just kills itself. *Suggestion:* While launching the driver process, Spark master should use the [spark.master passed as input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124] instead of master and port of the current master. Log messages that we observe: {code:java} 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077.. . 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 org.apache.spark.network.client.TransportClientFactory []: Successfully created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps) . 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application has been killed. Reason: All masters are unresponsive! Giving up.{code} was: In standalone cluster mode, one could launch a Driver with supervise mode enabled. Spark launches the driver with a JVM argument -Dspark.master which is set to [host and port of current master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149].] During the life of context, the spark masters can switch due to any reason. After that if the driver dies unexpectedly and comes up it tries to connect with the master which was set initially with -Dspark.master but that master is in STANDBY mode. The context tries multiple times to connect to standby and then just kills itself. *Suggestion:* While launching the driver process, Spark master should use the [spark.master passed as input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124] instead of master and port of the current master. Log messages that we observe: {code:java} 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077.. . 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 org.apache.spark.network.client.TransportClientFactory []: Successfully created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps) . 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: Connecting to master spark://10.100.100.22:7077... . 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application has been killed. Reason: All masters are unresponsive! Giving up.{code} > DriverWrapper should have both master addresses in -Dspark.master > - > > Key: SPARK-24794 > URL: https://issues.apache.org/jira/browse/SPARK-24794 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.2.1 >Reporter: Behroz Sikander >Priority: Major > > In standalone cluster mode, one could launch a Driver with supervise mode > enabled. Spark launches the driver with a JVM argument -Dspark.master which > is set to [host and port of current >