[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error

2020-02-05 Thread Behroz Sikander (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17030653#comment-17030653
 ] 

Behroz Sikander commented on SPARK-30686:
-

I have pinged. I hope someone can help with the ticket.

> Spark 2.4.4 metrics endpoint throwing error
> ---
>
> Key: SPARK-30686
> URL: https://issues.apache.org/jira/browse/SPARK-30686
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Behroz Sikander
>Priority: Major
>
> I am using Spark-standalone in HA mode with zookeeper.
> Once the driver is up and running, whenever I try to access the metrics api 
> using the following URL
> http://master_address/proxy/app-20200130041234-0123/api/v1/applications
> I get the following exception.
> It seems that the request never even reaches the spark code. It would be 
> helpful if somebody can help me.
> {code:java}
> HTTP ERROR 500
> Problem accessing /api/v1/applications. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException: while trying to invoke the method 
> org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, 
> javax.servlet.http.HttpServletRequest, 
> javax.servlet.http.HttpServletResponse) of a null object loaded from field 
> org.glassfish.jersey.servlet.ServletContainer.webComponent of an object 
> loaded from local variable 'this'
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.spark_project.jetty.server.Server.handle(Server.java:539)
>   at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at 
> org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:808)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error

2020-01-31 Thread Behroz Sikander (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17027291#comment-17027291
 ] 

Behroz Sikander commented on SPARK-30686:
-

Could it be linked to [https://github.com/apache/spark/pull/19748] ?

> Spark 2.4.4 metrics endpoint throwing error
> ---
>
> Key: SPARK-30686
> URL: https://issues.apache.org/jira/browse/SPARK-30686
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Behroz Sikander
>Priority: Major
>
> I am using Spark-standalone in HA mode with zookeeper.
> Once the driver is up and running, whenever I try to access the metrics api 
> using the following URL
> http://master_address/proxy/app-20200130041234-0123/api/v1/applications
> I get the following exception.
> It seems that the request never even reaches the spark code. It would be 
> helpful if somebody can help me.
> {code:java}
> HTTP ERROR 500
> Problem accessing /api/v1/applications. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException: while trying to invoke the method 
> org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, 
> javax.servlet.http.HttpServletRequest, 
> javax.servlet.http.HttpServletResponse) of a null object loaded from field 
> org.glassfish.jersey.servlet.ServletContainer.webComponent of an object 
> loaded from local variable 'this'
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.spark_project.jetty.server.Server.handle(Server.java:539)
>   at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at 
> org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:808)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error

2020-01-30 Thread Behroz Sikander (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026929#comment-17026929
 ] 

Behroz Sikander commented on SPARK-30686:
-

1- I don't know the exact steps, i tried to reproduce but I couldn't figure out.

2- I don't understand the question. If you are asking about the response of 
/api/v1/version endpoint, then yes, it is also returning this.

 

> Spark 2.4.4 metrics endpoint throwing error
> ---
>
> Key: SPARK-30686
> URL: https://issues.apache.org/jira/browse/SPARK-30686
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.4
>Reporter: Behroz Sikander
>Priority: Major
>
> I am using Spark-standalone in HA mode with zookeeper.
> Once the driver is up and running, whenever I try to access the metrics api 
> using the following URL
> http://master_address/proxy/app-20200130041234-0123/api/v1/applications
> I get the following exception.
> It seems that the request never even reaches the spark code. It would be 
> helpful if somebody can help me.
> {code:java}
> HTTP ERROR 500
> Problem accessing /api/v1/applications. Reason:
> Server Error
> Caused by:
> java.lang.NullPointerException: while trying to invoke the method 
> org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, 
> javax.servlet.http.HttpServletRequest, 
> javax.servlet.http.HttpServletResponse) of a null object loaded from field 
> org.glassfish.jersey.servlet.ServletContainer.webComponent of an object 
> loaded from local variable 'this'
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
>   at 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
>   at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>   at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>   at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>   at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>   at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>   at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>   at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>   at org.spark_project.jetty.server.Server.handle(Server.java:539)
>   at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333)
>   at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>   at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>   at 
> org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
>   at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>   at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>   at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>   at java.lang.Thread.run(Thread.java:808)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error

2020-01-30 Thread Behroz Sikander (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17026805#comment-17026805
 ] 

Behroz Sikander commented on SPARK-30686:
-

Sometimes, I have also seen


{code:java}
Caused by:
java.lang.NoClassDefFoundError: 
org/glassfish/jersey/internal/inject/AbstractBinder
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:863)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:529)
at java.net.URLClassLoader.access$100(URLClassLoader.java:75)
at java.net.URLClassLoader$1.run(URLClassLoader.java:430)
at java.net.URLClassLoader$1.run(URLClassLoader.java:424)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:490)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at 
org.glassfish.jersey.media.sse.SseFeature.configure(SseFeature.java:123)
at 
org.glassfish.jersey.model.internal.CommonConfig.configureFeatures(CommonConfig.java:730)
at 
org.glassfish.jersey.model.internal.CommonConfig.configureMetaProviders(CommonConfig.java:648)
at 
org.glassfish.jersey.server.ResourceConfig.configureMetaProviders(ResourceConfig.java:829)
at 
org.glassfish.jersey.server.ApplicationHandler.initialize(ApplicationHandler.java:453)
at 
org.glassfish.jersey.server.ApplicationHandler.access$500(ApplicationHandler.java:184)
at 
org.glassfish.jersey.server.ApplicationHandler$3.call(ApplicationHandler.java:350)
at 
org.glassfish.jersey.server.ApplicationHandler$3.call(ApplicationHandler.java:347)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at 
org.glassfish.jersey.internal.Errors.processWithException(Errors.java:255)
at 
org.glassfish.jersey.server.ApplicationHandler.(ApplicationHandler.java:347)
at 
org.glassfish.jersey.servlet.WebComponent.(WebComponent.java:392)
at 
org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:177)
at 
org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:369)
at javax.servlet.GenericServlet.init(GenericServlet.java:244)
at 
org.spark_project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:643)
at 
org.spark_project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:499)
at 
org.spark_project.jetty.servlet.ServletHolder.ensureInstance(ServletHolder.java:791)
at 
org.spark_project.jetty.servlet.ServletHolder.prepare(ServletHolder.java:776)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:579)
at 
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
at 
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.spark_project.jetty.server.Server.handle(Server.java:539)
at 
org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333)
at 
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at 
org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:808)
Caused by: java.lang.ClassNotFoundException: 

[jira] [Created] (SPARK-30686) Spark 2.4.4 metrics endpoint throwing error

2020-01-30 Thread Behroz Sikander (Jira)
Behroz Sikander created SPARK-30686:
---

 Summary: Spark 2.4.4 metrics endpoint throwing error
 Key: SPARK-30686
 URL: https://issues.apache.org/jira/browse/SPARK-30686
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: Behroz Sikander


I am using Spark-standalone in HA mode with zookeeper.

Once the driver is up and running, whenever I try to access the metrics api 
using the following URL
http://master_address/proxy/app-20200130041234-0123/api/v1/applications
I get the following exception.

It seems that the request never even reaches the spark code. It would be 
helpful if somebody can help me.

{code:java}
HTTP ERROR 500
Problem accessing /api/v1/applications. Reason:

Server Error
Caused by:
java.lang.NullPointerException: while trying to invoke the method 
org.glassfish.jersey.servlet.WebComponent.service(java.net.URI, java.net.URI, 
javax.servlet.http.HttpServletRequest, javax.servlet.http.HttpServletResponse) 
of a null object loaded from field 
org.glassfish.jersey.servlet.ServletContainer.webComponent of an object loaded 
from local variable 'this'
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at 
org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
at 
org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
at 
org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
at 
org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.spark_project.jetty.server.Server.handle(Server.java:539)
at 
org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333)
at 
org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at 
org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:808)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26302) retainedBatches configuration can eat up memory on driver

2018-12-12 Thread Behroz Sikander (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718824#comment-16718824
 ] 

Behroz Sikander commented on SPARK-26302:
-

By code, i meant a warning in logs.

I will prepare a documentation commit.

> retainedBatches configuration can eat up memory on driver
> -
>
> Key: SPARK-26302
> URL: https://issues.apache.org/jira/browse/SPARK-26302
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, DStreams
>Affects Versions: 2.4.0
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before 
> garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard 
> way.
> The size of a single BatchUIData is around 750KB. Increasing this value to 
> something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has 
> frequent GCs and has long scheduling days. Once the heap is full, the job 
> cannot be recovered.
> A note of caution should be added to the documentation to let users know the 
> impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26302) retainedBatches configuration can eat up memory on driver

2018-12-12 Thread Behroz Sikander (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16718623#comment-16718623
 ] 

Behroz Sikander commented on SPARK-26302:
-

If I get it correct, then the idea is to add a general warning but where? 
Documentation or somewhere in code ?


> retainedBatches configuration can eat up memory on driver
> -
>
> Key: SPARK-26302
> URL: https://issues.apache.org/jira/browse/SPARK-26302
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, DStreams
>Affects Versions: 2.4.0
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before 
> garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard 
> way.
> The size of a single BatchUIData is around 750KB. Increasing this value to 
> something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has 
> frequent GCs and has long scheduling days. Once the heap is full, the job 
> cannot be recovered.
> A note of caution should be added to the documentation to let users know the 
> impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26302) retainedBatches configuration can cause memory leak

2018-12-10 Thread Behroz Sikander (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715544#comment-16715544
 ] 

Behroz Sikander commented on SPARK-26302:
-

>> I think the same applies to all spark.ui.*retained* parameters. If a warning 
>> added here all other places has to be adapted.
I agree.

>> What I can imagine is a general warning but it would be hard to find a 
>> committer to merge it.
The impact is so indirect that its really hard to debug this issue. It is worth 
the effort to find a committer because a warning will be really helpful 
considering the number of configurations properties also.

>> Is it really memory leak and not slow processing or out of memory?
It is slow processing, long scheduling delays followed by out of memory. 


> retainedBatches configuration can cause memory leak
> ---
>
> Key: SPARK-26302
> URL: https://issues.apache.org/jira/browse/SPARK-26302
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, DStreams
>Affects Versions: 2.4.0
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before 
> garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard 
> way.
> The size of a single BatchUIData is around 750KB. Increasing this value to 
> something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has 
> frequent GCs and has long scheduling days. Once the heap is full, the job 
> cannot be recovered.
> A note of caution should be added to the documentation to let users know the 
> impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26302) retainedBatches configuration can cause memory leak

2018-12-07 Thread Behroz Sikander (JIRA)
Behroz Sikander created SPARK-26302:
---

 Summary: retainedBatches configuration can cause memory leak
 Key: SPARK-26302
 URL: https://issues.apache.org/jira/browse/SPARK-26302
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 2.4.0
Reporter: Behroz Sikander
 Attachments: heap_dump_detail.png

The documentation for configuration "spark.streaming.ui.retainedBatches" says

"How many batches the Spark Streaming UI and status APIs remember before 
garbage collecting"

The default for this configuration is 1000.
>From our experience, the documentation is incomplete and we found it the hard 
>way.

The size of a single BatchUIData is around 750KB. Increasing this value to 
something like 5000 increases the total size to ~4GB.

If your driver heap is not big enough, the job starts to slow down, has 
frequent GCs and has long scheduling days. Once the heap is full, the job 
cannot be recovered.

A note of caution should be added to the documentation to let users know the 
impact of this seemingly harmless configuration property.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26302) retainedBatches configuration can cause memory leak

2018-12-07 Thread Behroz Sikander (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712559#comment-16712559
 ] 

Behroz Sikander commented on SPARK-26302:
-

I am willing to do a PR for documentation once someone can give a go ahead.

> retainedBatches configuration can cause memory leak
> ---
>
> Key: SPARK-26302
> URL: https://issues.apache.org/jira/browse/SPARK-26302
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before 
> garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard 
> way.
> The size of a single BatchUIData is around 750KB. Increasing this value to 
> something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has 
> frequent GCs and has long scheduling days. Once the heap is full, the job 
> cannot be recovered.
> A note of caution should be added to the documentation to let users know the 
> impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26302) retainedBatches configuration can cause memory leak

2018-12-07 Thread Behroz Sikander (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Behroz Sikander updated SPARK-26302:

Attachment: heap_dump_detail.png

> retainedBatches configuration can cause memory leak
> ---
>
> Key: SPARK-26302
> URL: https://issues.apache.org/jira/browse/SPARK-26302
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0
>Reporter: Behroz Sikander
>Priority: Minor
> Attachments: heap_dump_detail.png
>
>
> The documentation for configuration "spark.streaming.ui.retainedBatches" says
> "How many batches the Spark Streaming UI and status APIs remember before 
> garbage collecting"
> The default for this configuration is 1000.
> From our experience, the documentation is incomplete and we found it the hard 
> way.
> The size of a single BatchUIData is around 750KB. Increasing this value to 
> something like 5000 increases the total size to ~4GB.
> If your driver heap is not big enough, the job starts to slow down, has 
> frequent GCs and has long scheduling days. Once the heap is full, the job 
> cannot be recovered.
> A note of caution should be added to the documentation to let users know the 
> impact of this seemingly harmless configuration property.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master

2018-09-22 Thread Behroz Sikander (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16624790#comment-16624790
 ] 

Behroz Sikander commented on SPARK-24794:
-

Can someone please have a look at this PR?

> DriverWrapper should have both master addresses in -Dspark.master
> -
>
> Key: SPARK-24794
> URL: https://issues.apache.org/jira/browse/SPARK-24794
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.2.1
>Reporter: Behroz Sikander
>Priority: Major
>
> In standalone cluster mode, one could launch a Driver with supervise mode 
> enabled. Spark launches the driver with a JVM argument -Dspark.master which 
> is set to [host and port of current 
> master|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149].
>  
> During the life of context, the spark masters can switch due to any reason. 
> After that if the driver dies unexpectedly and comes up it tries to connect 
> with the master which was set initially with -Dspark.master but that master 
> is in STANDBY mode. The context tries multiple times to connect to standby 
> and then just kills itself.
>  
> *Suggestion:*
> While launching the driver process, Spark master should use the [spark.master 
> passed as 
> input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124]
>  instead of master and port of the current master.
> Log messages that we observe:
>  
> {code:java}
> 2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 
> org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
> Connecting to master spark://10.100.100.22:7077..
> .
> 2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 
> org.apache.spark.network.client.TransportClientFactory []: Successfully 
> created connection to /10.100.100.22:7077 after 1 ms (0 ms spent in 
> bootstraps)
> .
> 2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 
> org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
> Connecting to master spark://10.100.100.22:7077...
> .
> 2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 
> org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
> Connecting to master spark://10.100.100.22:7077...
> .
> 2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread 
> org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application 
> has been killed. Reason: All masters are unresponsive! Giving up.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master

2018-07-12 Thread Behroz Sikander (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Behroz Sikander updated SPARK-24794:

Description: 
In standalone cluster mode, one could launch a Driver with supervise mode 
enabled. Spark launches the driver with a JVM argument -Dspark.master which is 
set to [host and port of current 
master|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149].

 

During the life of context, the spark masters can switch due to any reason. 
After that if the driver dies unexpectedly and comes up it tries to connect 
with the master which was set initially with -Dspark.master but that master is 
in STANDBY mode. The context tries multiple times to connect to standby and 
then just kills itself.

 

*Suggestion:*

While launching the driver process, Spark master should use the [spark.master 
passed as 
input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124]
 instead of master and port of the current master.

Log messages that we observe:

 
{code:java}
2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077..
.
2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 
org.apache.spark.network.client.TransportClientFactory []: Successfully created 
connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps)
.
2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application 
has been killed. Reason: All masters are unresponsive! Giving up.{code}

  was:
In standalone cluster mode, one could launch a Driver with supervise mode 
enabled. Spark launches the driver with a JVM argument -Dspark.master which is 
set to [host and port of current 
master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149]]

 

During the life of context, the spark masters can switch due to any reason. 
After that if the driver dies unexpectedly and comes up it tries to connect 
with the master which was set initially with -Dspark.master but that master is 
in STANDBY mode. The context tries multiple times to connect to standby and 
then just kills itself.

 

*Suggestion:*

While launching the driver process, Spark master should use the [spark.master 
passed as 
input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124]
 instead of master and port of the current master.

Log messages that we observe:

 
{code:java}
2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077..
.
2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 
org.apache.spark.network.client.TransportClientFactory []: Successfully created 
connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps)
.
2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application 
has been killed. Reason: All masters are unresponsive! Giving up.{code}


> DriverWrapper should have both master addresses in -Dspark.master
> -
>
> Key: SPARK-24794
> URL: https://issues.apache.org/jira/browse/SPARK-24794
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.2.1
>Reporter: Behroz Sikander
>Priority: Major
>
> In standalone cluster mode, one could launch a Driver with supervise mode 
> enabled. Spark launches the driver with a JVM argument -Dspark.master which 
> is set to [host and port of current 
> 

[jira] [Created] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master

2018-07-12 Thread Behroz Sikander (JIRA)
Behroz Sikander created SPARK-24794:
---

 Summary: DriverWrapper should have both master addresses in 
-Dspark.master
 Key: SPARK-24794
 URL: https://issues.apache.org/jira/browse/SPARK-24794
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 2.2.1
Reporter: Behroz Sikander


In standalone cluster mode, one could launch a Driver with supervise mode 
enabled. Spark launches the driver with a JVM argument -Dspark.master which is 
set to [host and port of current 
master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149].]

 

During the life of context, the spark masters can switch due to any reason. 
After that if the driver dies unexpectedly and comes up it tries to connect 
with the master which was set initially with -Dspark.master but that master is 
in STANDBY mode. The context tries multiple times to connect to standby and 
then just kills itself.

 

*Suggestion:*

While launching the driver process, Spark master should use the [spark.master 
passed as 
input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124]
 instead of master and port of the current master.

Log messages that we observe:

 
{code:java}
2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077..
.
2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 
org.apache.spark.network.client.TransportClientFactory []: Successfully created 
connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps)
.
2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application 
has been killed. Reason: All masters are unresponsive! Giving up.{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24794) DriverWrapper should have both master addresses in -Dspark.master

2018-07-12 Thread Behroz Sikander (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Behroz Sikander updated SPARK-24794:

Description: 
In standalone cluster mode, one could launch a Driver with supervise mode 
enabled. Spark launches the driver with a JVM argument -Dspark.master which is 
set to [host and port of current 
master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149]]

 

During the life of context, the spark masters can switch due to any reason. 
After that if the driver dies unexpectedly and comes up it tries to connect 
with the master which was set initially with -Dspark.master but that master is 
in STANDBY mode. The context tries multiple times to connect to standby and 
then just kills itself.

 

*Suggestion:*

While launching the driver process, Spark master should use the [spark.master 
passed as 
input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124]
 instead of master and port of the current master.

Log messages that we observe:

 
{code:java}
2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077..
.
2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 
org.apache.spark.network.client.TransportClientFactory []: Successfully created 
connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps)
.
2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application 
has been killed. Reason: All masters are unresponsive! Giving up.{code}

  was:
In standalone cluster mode, one could launch a Driver with supervise mode 
enabled. Spark launches the driver with a JVM argument -Dspark.master which is 
set to [host and port of current 
master|[https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L149].]

 

During the life of context, the spark masters can switch due to any reason. 
After that if the driver dies unexpectedly and comes up it tries to connect 
with the master which was set initially with -Dspark.master but that master is 
in STANDBY mode. The context tries multiple times to connect to standby and 
then just kills itself.

 

*Suggestion:*

While launching the driver process, Spark master should use the [spark.master 
passed as 
input|https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L124]
 instead of master and port of the current master.

Log messages that we observe:

 
{code:java}
2018-07-11 13:03:21,801 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077..
.
2018-07-11 13:03:21,806 INFO netty-rpc-connection-0 
org.apache.spark.network.client.TransportClientFactory []: Successfully created 
connection to /10.100.100.22:7077 after 1 ms (0 ms spent in bootstraps)
.
2018-07-11 13:03:41,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:01,802 INFO appclient-register-master-threadpool-0 
org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint []: 
Connecting to master spark://10.100.100.22:7077...
.
2018-07-11 13:04:21,806 ERROR appclient-registration-retry-thread 
org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend []: Application 
has been killed. Reason: All masters are unresponsive! Giving up.{code}


> DriverWrapper should have both master addresses in -Dspark.master
> -
>
> Key: SPARK-24794
> URL: https://issues.apache.org/jira/browse/SPARK-24794
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.2.1
>Reporter: Behroz Sikander
>Priority: Major
>
> In standalone cluster mode, one could launch a Driver with supervise mode 
> enabled. Spark launches the driver with a JVM argument -Dspark.master which 
> is set to [host and port of current 
>