[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751
 ] 

Ajith S edited comment on SPARK-26961 at 3/13/19 2:37 PM:
--

1) Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so *it has no effect for already 
created instance*

!image-2019-03-13-19-53-52-390.png!

 

2) URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor. Also as per javadoc

[https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html]
{code:java}
Note that the ClassLoader class is registered as parallel capable by default. 
However, its subclasses still need to register themselves if they are parallel 
capable. {code}
Hence MutableURLClassLoader lost its parallel capability by failing to register 
unlike URLClassLoader

 


was (Author: ajithshetty):
1) Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

2) URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor. Also as per javadoc

[https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html]
{code:java}
Note that the ClassLoader class is registered as parallel capable by default. 
However, its subclasses still need to register themselves if they are parallel 
capable. {code}
Hence MutableURLClassLoader lost its parallel capability by failing to register 
unlike URLClassLoader

 

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-53-52-390.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-13 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791751#comment-16791751
 ] 

Ajith S edited comment on SPARK-26961 at 3/13/19 2:32 PM:
--

1) Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

2) URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor. Also as per javadoc

[https://docs.oracle.com/javase/8/docs/api/java/lang/ClassLoader.html]
{code:java}
Note that the ClassLoader class is registered as parallel capable by default. 
However, its subclasses still need to register themselves if they are parallel 
capable. {code}
Hence MutableURLClassLoader lost its parallel capability by failing to register 
unlike URLClassLoader

 


was (Author: ajithshetty):
Yes, the registerAsParallelCapable will return true, but if you inspect the 
classloader instance, parallelLockMap is still null as it was already 
initalized via super class constructor. so it has no effect

!image-2019-03-13-19-53-52-390.png!

 

URLClassLoader is parallel capable as it does registration in static block 
which is before calling parent(ClassLoader) constructor

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
> Attachments: image-2019-03-13-19-53-52-390.png
>
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-12 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790208#comment-16790208
 ] 

Ajith S edited comment on SPARK-26961 at 3/12/19 6:45 AM:
--

[~srowen] Yes. I too have same opinion of fixing it via 
registerAsParallelCapable.  But Its not possible to do via Companion Object. I 
Tried and found a issue. Refer [https://github.com/scala/bug/issues/11429]

May be we need to move them to java implementation from scala to achieve this

[~xsapphire] i think these class loaders are child classloaders of 
LaunchAppClassLoader which already has classes for jar in class path. So 
overhead may not be of higher magnitude


was (Author: ajithshetty):
[~srowen] Yes. I too have same opinion of fixing it via 
registerAsParallelCapable.  But Its not possible to do via Companion Object. I 
Tried and found a issue. Refer https://github.com/scala/bug/issues/11429

[~xsapphire] i think these class loaders are child classloaders of 
LaunchAppClassLoader which already has classes for jar in class path. So 
overhead may not be of higher magnitude

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-12 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16790208#comment-16790208
 ] 

Ajith S edited comment on SPARK-26961 at 3/12/19 6:43 AM:
--

[~srowen] Yes. I too have same opinion of fixing it via 
registerAsParallelCapable.  But Its not possible to do via Companion Object. I 
Tried and found a issue. Refer https://github.com/scala/bug/issues/11429

[~xsapphire] i think these class loaders are child classloaders of 
LaunchAppClassLoader which already has classes for jar in class path. So 
overhead may not be of higher magnitude


was (Author: ajithshetty):
[~srowen] Yes. I too have same opinion of fixing it via 
registerAsParallelCapable. Will raise a PR for this

[~xsapphire] i think these class loaders are child classloaders of 
LaunchAppClassLoader which already has classes for jar in class path. So 
overhead may not be of higher magnitude

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
>  at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>  at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1772)
>  at 
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.doFilter(AmIpFilter.java:171)
>  at 
> org.spark_project.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1759)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:534)
>  at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
>  at 
> 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-04 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783161#comment-16783161
 ] 

Ajith S edited comment on SPARK-26961 at 3/4/19 9:47 AM:
-

The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked.
{code:java}
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
locked <0x0005b7991168> (a org.apache.spark.util.MutableURLClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348){code}
Checking java.lang.ClassLoader#getClassLoadingLock we see that
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself. The only thing i see this happening is via 
ClassLoader.registerAsParallelCapable(); i.e for example static block of 
java.net.URLClassLoader
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
This will trigger locking entire classloader @ 
java.lang.ClassLoader#loadClassInternal
{code:java}
// This method is invoked by the virtual machine to load a class.
private Class loadClassInternal(String name)
throws ClassNotFoundException
{
// For backward compatibility, explicitly lock on 'this' when
// the current class loader is not parallel capable.
if (parallelLockMap == null) {
synchronized (this) {
 return loadClass(name);
}
} else {
return loadClass(name);
}
}{code}
Even though MutableURLClassLoader is subclass of URLClassLoader it will need to 
explicitly do ClassLoader.registerAsParallelCapable()
 I see that spark do not have any ClassLoader.registerAsParallelCapable() in 
MutableURLClassLoader

Is there any code changes in your version of spark or is the spark code as is 
like opensource version.?


was (Author: ajithshetty):
The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked.
{code:java}
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
locked <0x0005b7991168> (a org.apache.spark.util.MutableURLClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348){code}
Checking java.lang.ClassLoader we see that
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself. The only thing i see this happening is via 
ClassLoader.registerAsParallelCapable(); i.e for example static block of 
java.net.URLClassLoader
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
Even though MutableURLClassLoader is subclass of URLClassLoader it will need to 
explicitly do ClassLoader.registerAsParallelCapable()
 I see that spark do not have any ClassLoader.registerAsParallelCapable() in 
MutableURLClassLoader

Is there any code changes in your version of spark or is the spark code as is 
like opensource version.?

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-04 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783161#comment-16783161
 ] 

Ajith S edited comment on SPARK-26961 at 3/4/19 9:40 AM:
-

The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked.
{code:java}
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
locked <0x0005b7991168> (a org.apache.spark.util.MutableURLClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348){code}
Checking java.lang.ClassLoader we see that
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself. The only thing i see this happening is via 
ClassLoader.registerAsParallelCapable(); i.e for example static block of 
java.net.URLClassLoader
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
Even though MutableURLClassLoader is subclass of URLClassLoader it will need to 
explicitly do ClassLoader.registerAsParallelCapable()
 I see that spark do not have any ClassLoader.registerAsParallelCapable() in 
MutableURLClassLoader

Is there any code changes in your version of spark or is the spark code as is 
like opensource version.?


was (Author: ajithshetty):
The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself. The only thing i see this happening is via 
ClassLoader.registerAsParallelCapable(); i.e for example static block of 
java.net.URLClassLoader
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
Even though MutableURLClassLoader is subclass of URLClassLoader it will need to 
explicitly do ClassLoader.registerAsParallelCapable()
I see that spark do not have any ClassLoader.registerAsParallelCapable() in 
MutableURLClassLoader

Is there any code changes in your version of spark or is the spark code as is 
like opensource version.?

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-03 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783031#comment-16783031
 ] 

Ajith S edited comment on SPARK-26961 at 3/4/19 6:53 AM:
-

The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
 

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

 

 
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock


was (Author: ajithshetty):
The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader

protected Object getClassLoadingLock(String className) {
 Object lock = this;
 if (parallelLockMap != null) {
 Object newLock = new Object();
 lock = parallelLockMap.putIfAbsent(className, newLock);
 if (lock == null) {
 lock = newLock;
 }
 }
 return lock;
}

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

static {
 sun.misc.SharedSecrets.setJavaNetAccess (
 new sun.misc.JavaNetAccess() {
 public URLClassPath getURLClassPath (URLClassLoader u) {
 return u.ucp;
 }

 public String getOriginalHostName(InetAddress ia) {
 return ia.holder.getOriginalHostName();
 }
 }
 );
 *{color:#FF}ClassLoader.registerAsParallelCapable();{color}*
}

So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-03 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783031#comment-16783031
 ] 

Ajith S edited comment on SPARK-26961 at 3/4/19 6:54 AM:
-

The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
 Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 
{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock


was (Author: ajithshetty):
The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader
{code:java}
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null)

{ lock = newLock; }

}
return lock;
}{code}
 

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

 

 
{code:java}
static {
sun.misc.SharedSecrets.setJavaNetAccess (
new sun.misc.JavaNetAccess() {
public URLClassPath getURLClassPath (URLClassLoader u)

{ return u.ucp; }

public String getOriginalHostName(InetAddress ia)

{ return ia.holder.getOriginalHostName(); }

}
);
ClassLoader.registerAsParallelCapable();
}{code}
So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doGet(JettyUtils.scala:161)
>  at 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-03 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783031#comment-16783031
 ] 

Ajith S edited comment on SPARK-26961 at 3/4/19 6:52 AM:
-

The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader

protected Object getClassLoadingLock(String className) {
 Object lock = this;
 if (parallelLockMap != null) {
 Object newLock = new Object();
 lock = parallelLockMap.putIfAbsent(className, newLock);
 if (lock == null) {
 lock = newLock;
 }
 }
 return lock;
}

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

static {
 sun.misc.SharedSecrets.setJavaNetAccess (
 new sun.misc.JavaNetAccess() {
 public URLClassPath getURLClassPath (URLClassLoader u) {
 return u.ucp;
 }

 public String getOriginalHostName(InetAddress ia) {
 return ia.holder.getOriginalHostName();
 }
 }
 );
 *{color:#FF}ClassLoader.registerAsParallelCapable();{color}*
}

So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock


was (Author: ajithshetty):
The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader

{{protected Object getClassLoadingLock(String className) {}}
{{ Object lock = this;}}
{{ if (parallelLockMap != null) {}}
{{ Object newLock = new Object();}}
{{ lock = parallelLockMap.putIfAbsent(className, newLock);}}
{{ if (lock == null) {}}
{{ lock = newLock;}}
{{ }}}
{{ }}}
{{ return lock;}}
{{}}}

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

{{static {}}
{{ sun.misc.SharedSecrets.setJavaNetAccess (}}
{{ new sun.misc.JavaNetAccess() {}}
{{ public URLClassPath getURLClassPath (URLClassLoader u) {}}
{{ return u.ucp;}}
{{ }}}

{{ public String getOriginalHostName(InetAddress ia) {}}
{{ return ia.holder.getOriginalHostName();}}
{{ }}}
{{ }}}
{{ );}}
{{ *{color:#FF}ClassLoader.registerAsParallelCapable();{color}*}}
{{}}}

So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at org.apache.spark.ui.JettyUtils$$anon$4.doRequest(JettyUtils.scala:176)
>  at 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-03-03 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783031#comment-16783031
 ] 

Ajith S edited comment on SPARK-26961 at 3/4/19 6:51 AM:
-

The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader

{{protected Object getClassLoadingLock(String className) {}}
{{ Object lock = this;}}
{{ if (parallelLockMap != null) {}}
{{ Object newLock = new Object();}}
{{ lock = parallelLockMap.putIfAbsent(className, newLock);}}
{{ if (lock == null) {}}
{{ lock = newLock;}}
{{ }}}
{{ }}}
{{ return lock;}}
{{}}}

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

{{static {}}
{{ sun.misc.SharedSecrets.setJavaNetAccess (}}
{{ new sun.misc.JavaNetAccess() {}}
{{ public URLClassPath getURLClassPath (URLClassLoader u) {}}
{{ return u.ucp;}}
{{ }}}

{{ public String getOriginalHostName(InetAddress ia) {}}
{{ return ia.holder.getOriginalHostName();}}
{{ }}}
{{ }}}
{{ );}}
{{ *{color:#FF}ClassLoader.registerAsParallelCapable();{color}*}}
{{}}}

So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock


was (Author: ajithshetty):
The problem is here org.apache.spark.util.MutableURLClassLoader (entire 
classloader) is getting locked. Checking java.lang.ClassLoader

{{protected Object getClassLoadingLock(String className) {}}
{{ Object lock = this;}}
{{ if (parallelLockMap != null) {}}
{{ Object newLock = new Object();}}
{{ lock = parallelLockMap.putIfAbsent(className, newLock);}}
{{ if (lock == null)}}{{{ lock = newLock; }}}{{}}}
{{ return lock;}}
{{ }}}

Here we see for every loading, a new object is created so it doesn't lock 
entire classloader itself.

The only thing i see this happening is via 

{{ClassLoader.registerAsParallelCapable(); i.e @ static block of 
java.net.URLClassLoader}}

{{static {}}
{{ sun.misc.SharedSecrets.setJavaNetAccess (}}
{{ new sun.misc.JavaNetAccess() {}}
{{ public URLClassPath getURLClassPath (URLClassLoader u) {}}
{{ return u.ucp;}}
{{ }}}

{{ public String getOriginalHostName(InetAddress ia) {}}
{{ return ia.holder.getOriginalHostName();}}
{{ }}}
{{ }}}
{{ );}}
{{ *{color:#FF}ClassLoader.registerAsParallelCapable();{color}*}}
{{}}}

So all subclasses of URLClassLoader will lock entire classloader for 
classloading and cause this lock

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> "ForkJoinPool-1-worker-57":
>  waiting to lock monitor 0x7f3860574298 (object 0x0005b7991168, a 
> org.apache.spark.util.MutableURLClassLoader),
>  which is held by "ForkJoinPool-1-worker-7"
> "ForkJoinPool-1-worker-7":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is held by "ForkJoinPool-1-worker-57"
> Java stack information for the threads listed above:
> ===
> "SparkUI-907":
>  at org.apache.hadoop.conf.Configuration.getOverlay(Configuration.java:1328)
>  - waiting to lock <0x0005c0c1e5e0> (a 
> org.apache.hadoop.conf.Configuration)
>  at 
> org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
>  at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
>  at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
>  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
>  at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
>  at 
> org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
>  at java.net.URL.getURLStreamHandler(URL.java:1142)
>  at java.net.URL.(URL.java:599)
>  at java.net.URL.(URL.java:490)
>  at java.net.URL.(URL.java:439)
>  at 

[jira] [Comment Edited] (SPARK-26961) Found Java-level deadlock in Spark Driver

2019-02-28 Thread Ajith S (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16780534#comment-16780534
 ] 

Ajith S edited comment on SPARK-26961 at 2/28/19 1:54 PM:
--

I think the root cause is 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L185]
 

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())

 

This induces the thread lock, this may be bug in a non spark env also

Thread 1 : Does load class which will do below
 - waiting to lock <0x0005c0c1e5e0> (a org.apache.hadoop.conf.Configuration)
at 
org.apache.hadoop.conf.Configuration.handleDeprecation(Configuration.java:684)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1088)
at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
at 
org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
at java.net.URL.getURLStreamHandler(URL.java:1142)
at java.net.URL.(URL.java:420)
at sun.misc.URLClassPath$JarLoader.(URLClassPath.java:812)
at sun.misc.URLClassPath$JarLoader$3.run(URLClassPath.java:1094)
at sun.misc.URLClassPath$JarLoader$3.run(URLClassPath.java:1091)
at java.security.AccessController.doPrivileged(Native Method)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1090)
at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:1050)
at sun.misc.URLClassPath.getResource(URLClassPath.java:239)
at java.net.URLClassLoader$1.run(URLClassLoader.java:365)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 - locked <0x0005b7991168> (a org.apache.spark.util.MutableURLClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)  

Thread 2 : Create new URL
 - waiting to lock <0x0005b7991168> (a 
org.apache.spark.util.MutableURLClassLoader)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.xerces.parsers.ObjectFactory.findProviderClass(Unknown Source)
at org.apache.xerces.parsers.ObjectFactory.newInstance(Unknown Source)
at org.apache.xerces.parsers.ObjectFactory.createObject(Unknown Source)
at org.apache.xerces.parsers.ObjectFactory.createObject(Unknown Source)
at org.apache.xerces.parsers.DOMParser.(Unknown Source)
at org.apache.xerces.parsers.DOMParser.(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderFactoryImpl.newDocumentBuilder(Unknown 
Source)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2737)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2696)
at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2579)
 - locked <0x0005c0c1e5e0> (a org.apache.hadoop.conf.Configuration)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:1091)
at org.apache.hadoop.conf.Configuration.getTrimmed(Configuration.java:1145)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2363)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2840)
at 
org.apache.hadoop.fs.FsUrlStreamHandlerFactory.createURLStreamHandler(FsUrlStreamHandlerFactory.java:74)
at java.net.URL.getURLStreamHandler(URL.java:1142)
at java.net.URL.(URL.java:599)


was (Author: ajithshetty):
I think the root cause is 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala#L185]
 

URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())

 

This induces the thread lock

> Found Java-level deadlock in Spark Driver
> -
>
> Key: SPARK-26961
> URL: https://issues.apache.org/jira/browse/SPARK-26961
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Rong Jialei
>Priority: Major
>
> Our spark job usually will finish in minutes, however, we recently found it 
> take days to run, and we can only kill it when this happened.
> An investigation show all worker container could not connect drive after 
> start, and driver is hanging, using jstack, we found a Java-level deadlock.
>  
> *Jstack output for deadlock part is showing below:*
>  
> Found one Java-level deadlock:
> =
> "SparkUI-907":
>  waiting to lock monitor 0x7f387761b398 (object 0x0005c0c1e5e0, a 
> org.apache.hadoop.conf.Configuration),
>  which is