[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-30 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756826#comment-16756826
 ] 

Kingsley Jones commented on SPARK-12216:


Well, that fits. No source-code formatter for PowerShell!

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-30 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756825#comment-16756825
 ] 

Kingsley Jones commented on SPARK-12216:


{code:powershell}
# # Shell to launch local Apache Spark REPL and do cleanup on close

$sparkid = (Start-Process spark-shell -PassThru).Id # launch spark-shell and 
save process Id
$currdir = Get-Location

Wait-Process -Id $sparkid # wait until process exits to

run garbage collection process

# once the execution flow reaches here the spark-shell has been exited and we 
can clean up

Set-Location $env:TEMP

Remove-Item spark* -Recurse -Force
Remove-Item jansi* -Recurse -Force
Remove-Item hsperfdata* -Recurse -Force

Set-Location $currdir
// code placeholder
{code}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-19 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747325#comment-16747325
 ] 

Kingsley Jones commented on SPARK-12216:


Here is where bug has popped up over on the scala sbt thread 

[https://github.com/sbt/sbt/issues/2496]

The source is old JDK bug issues which seem to have hit Web Application Servers 
before.

>From a management perspective, it is Java itself that seems broken on Windows.

Since the problem occurring here seems to come from the scala implementation a 
monkey-patch might work.

There is this one over here:

[https://github.com/jeantil/class-monkey]

At some point, I will give it a go but have decided to run Python Dask on my 
on-premises cluster and then burst to a Linux cluster if I need a Spark job. 
This is frustrating but all 100% part of the let's not fix anything culture 
today.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-17 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745850#comment-16745850
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/18/19 7:21 AM:
-

I don't know why I persist with posting anything on this issues thread when 
nobody cares.

However, I did notice this. When testing  the
{code:java}
pyspark {code}
shell the temp directory file cleanup seems to go smoothly.

The problem with this stuff is that nobody in Windows land can trust how deep 
they have to go to determine if the whole spark engine is hopelessly broken or 
it is just the reputation of the platform because of a weird REPL bug.

I suggest you look into the possibility that scala REPL is busted.

If your advice to  Windows developers is use pyspark where possible that would 
be great.

In my company, I am moving as much data workflow as possible to python dask 
because I just don't trust spark with all this weird reported behavior and no 
active follow up.

 


was (Author: kingsley):
I don't know why I persist with posting anything on this issues thread when 
nobody cares.

However, I did notice this. When testing  the
{code:java}
pyspark {code}
shell the temp directory file cleanup seems to go smoothly.

The problem with this stuff is that nobody in Windows land can trust how deep 
they have to go to determine if the whole spark engine is hopelessly broken or 
it is just the reputation of the platform because of a weird REPL bug.

I suggest you look into the possibility that scala REPL is busted.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-17 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745850#comment-16745850
 ] 

Kingsley Jones commented on SPARK-12216:


I don't know why I persist with posting anything on this issues thread when 
nobody cares.

However, I did notice this. When testing  the
{code:java}
pyspark {code}
shell the temp directory file cleanup seems to go smoothly.

The problem with this stuff is that nobody in Windows land can trust how deep 
they have to go to determine if the whole spark engine is hopelessly broken or 
it is just the reputation of the platform because of a weird REPL bug.

I suggest you look into the possibility that scala REPL is busted.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

 
{code:java}
scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
{code}
 

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:54 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

There the poster writes:

 
{code:java}
scala.tools.nsc.interpreter.IMain.TranslatingClassLoader{code}
calls a non-thread-safe method
{code:java}
translateSimpleResource{code}
(this method calls {{SymbolTable.enteringPhase}}), which makes it become 
non-thread-safe. However, a ClassLoader must be thread-safe since the class can 
be loaded in arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{code}
 

{color:#33}That is the same class in the above bug being marked as non 
thread-safe.{color}

{color:#33}See this stack overflow for a discussion of thread safety issues 
in scala:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:55 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

scala> val parent1 = loader.getParent()
parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49

scala> val parent2 = parent1.getParent()
parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2

scala> val parent3 = parent2.getParent()
parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b

scala> val parent4 = parent3.getParent()
parent4: ClassLoader = null

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

scala> parent1.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:51 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:50 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:49 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
{color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

 

{color:#33}See this stack overflow for a discussion of thread safety:{color}

{color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color}

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

 

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely {color:#24292e}TranslatingClassLoader) which is 
identified as an open bug in the scala language issues marked "non 
threadsafe".{color}

 

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color}

 

It seemed like he could only produce it with a complicated SQL script.

 

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

 

That fits the perps modus operandi in my book... marked non threadsafe and 
causes a sensitive operating system like Windows to barf.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:47 AM:
-

I am going down the Rabbit Hole of the Scala REPL.
 I think this is the right code branch
 
[https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala]

lines 357 to 352 define TranslatingClassLoader

It appears to be the central mechanism of the scala REPL to parse, compile and 
load any class that is defined in the REPL. There is an open bug on the scala 
issues section marking this class as having been identified as not thread-safe.

https://github.com/scala/bug/issues/10045

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)


was (Author: kingsley):
I am going down the Rabbit Hole of the Scala REPL.
I think this is the right code branch
https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala

lines 569 to 577
  /** This instance is no longer needed, so release any resources
*  it is using.  The reporter's output gets flushed.
*/
  override def close(): Unit = {
reporter.flush()
if (initializeComplete) {
  global.close()
}
  }

perhaps .close() is not closing everything.

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones commented on SPARK-12216:


Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
{color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

 

{color:#33}See this stack overflow for a discussion of thread safety:{color}

{color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color}

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

 

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely {color:#24292e}TranslatingClassLoader) which is 
identified as an open bug in the scala language issues marked "non 
threadsafe".{color}

 

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color}

 

It seemed like he could only produce it with a complicated SQL script.

 

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

 

That fits the perps modus operandi in my book... marked non threadsafe and 
causes a sensitive operating system like Windows to barf.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524
 ] 

Kingsley Jones commented on SPARK-12216:


I am going down the Rabbit Hole of the Scala REPL.
I think this is the right code branch
https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala

lines 569 to 577
  /** This instance is no longer needed, so release any resources
*  it is using.  The reporter's output gets flushed.
*/
  override def close(): Unit = {
reporter.flush()
if (initializeComplete) {
  global.close()
}
  }

perhaps .close() is not closing everything.

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743838#comment-16743838
 ] 

Kingsley Jones commented on SPARK-12216:


I had another go at investigating this as it continues to greatly frustrate my 
deployments.

Firstly, I had no difficulty following the build instructions for Spark from 
source on Windows. The only "trick" needed was to use git bash shell on windows 
to manage the launch of the build process. There is nothing complicated about 
that, and so I would encourage others to try.

Secondly, the build on Windows 10 worked first time which was also my first 
time using maven. So there are no real reasons why development work cannot be 
done on the Windows platform to investigate this issue (that is what I am doing 
now).

Thirdly, I re-investigated my comment from April 2018 above... 

The classLoader which is at the child level is:

scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

On searching the Spark source code, and online, I found that this class loader 
is actually from the Scala REPL.

It is not actually part of the Spark source tree.

When looking at the cruft showing up in the Windows temp directory the classes 
that pop up seem associated with the REPL.

This makes sense, since the REPL will barf with the errors indicated above if 
you do nothing more than launch a spark-shell and then close it straight away.

The conclusions I reached:
1) certainly it is possible to hack this on a Windows 10 machine (I am trying 
the incremental builds via SBT toolchain)

2) the bug does seem to be related to classloader clean-up but the fault (at 
least for the REPL) may NOT be Spark source code but the Scala REPL (or maybe 
an interaction between how Spark loads the relevant REPL code ???)

3) watching files go in and out of the Windows temp area seems reproducible 
with high reliability (as commenters above maintain)


Remaining questions on my mind...


Since this issue pops up for pretty much anybody who simply tries Spark on 
Windows, but the build from source showed NO problems with building, I think 
the runtime issue is probably to do with how classes are loaded and the file 
system difference between Windows and Linux on file locks.


The question is how to isolate which part of the codebase is actually producing 
the classes that are not cleanable. Is it the main Spark source code, or is it 
the Scala REPL. Since I got immediately discouraged on using Spark in any real 
evaluation (100% reliable barf-outs on the very first example are discouraging) 
I never actually did cluster test jobs to see if the thing worked outside the 
REPL. It is quite possible the problem is just the REPL.


I would suggest it would help to get at least some dialogue on this.


It does not seem very desirable to shut out 50% of global developers on a 
project which is manifestly *easy* to build on Windows 10 (using maven + git) 
but where the first encounter of any experienced Windows developer is an 
immediate sensation that this is an ultra-flakey codebase.


Quite possibly, it is just the REPL and it is actually a Scala code-base issue.


Folks like me will invest time and energy investigating such bugs but only if 
we think there is a will to fix issues that are persistent and discouraging of 
adoption among folks who very often have few enterprise choices before them 
other than Windows. This is not out of religious attachment, but due to 
dependencies in the data workflow chain that are not easy to fix. In 
particular, many financial industry developers have to use Windows stacks in 
some part of the data acquisition process. These are not inexperienced 
developers in cross-platform, Linux or java.


The reason why such folks are liable to get discouraged is there prior bad 
experiences with Java in distributed systems that had Windows components due to 
the well known difference in file locking behaviour between Linux and Windows. 
When such folks see barf-outs of this kind, they get discouraged that we are 
back in the EJB hell of previous times when systems broke all over the place 
and there were elaborate hacks to patch the problem.


Please consider how we can better test and isolate what is really causing this 
problem.











> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2018-04-10 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones commented on SPARK-12216:


scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

scala> val parent1 = loader.getParent()
parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49

scala> val parent2 = parent1.getParent()
parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2

scala> val parent3 = parent2.getParent()
parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b

scala> val parent4 = parent3.getParent()
parent4: ClassLoader = null

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

scala> parent1.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2018-03-31 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2018-03-31 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich]Igor Bablich... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2018-03-31 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones commented on SPARK-12216:


Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of Igor Bablich... it seems that classloaders 
instantiated by the code are not ever being closed. On *nix this is not a 
problem since the files are not locked. However, on windows the files are 
locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux subsystem to launch processes. So 
we do have a workaround.