[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2022-09-26 Thread John Pellman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507
 ] 

John Pellman edited comment on SPARK-12216 at 9/26/22 1:42 PM:
---

Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

The problem in this case seems to be that {{spark-shell}} is attempting to do a 
recursive unlink while files are still open (NFS client-side [silly 
renames|http://nfs.sourceforge.net/#faq_d2]).  It looks like this overall issue 
might be less of a "weird Windows thing" and more of an issue with spark-shell 
not waiting until all file handles are closed before attempting to remove the 
temp dir.  This behavior cannot be reproduced consistently and appears to be 
non-deterministic.

The obvious workaround here is to not put temp directories on NFS, but it does 
seem like you're relying upon file handling behavior that is specific to how 
Linux behaves using non-NFS volumes rather than doing a sanity check within 
spark-shell/scala(which might not be a bad idea).


was (Author: jpellman):
Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2022-09-26 Thread John Pellman (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507
 ] 

John Pellman edited comment on SPARK-12216 at 9/26/22 1:40 PM:
---

Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91)
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at 
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at 
org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
at 
org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.util.Try$.apply(Try.scala:213)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
{code}

The problem in this case seems to be that {{spark-shell}} is attempting to do a 
recursive unlink while files are still open (NFS client-side [silly 
renames|http://nfs.sourceforge.net/#faq_d2]).  It looks like this overall issue 
might be less of a "weird Windows thing" and more of an issue with spark-shell 
not waiting until all file handles are closed before attempting to remove the 
temp dir.  This behavior cannot be reproduced consistently and appears to be 
non-deterministic.

The obvious workaround here is to not put temp directories on NFS, but it does 
seem like you're relying upon file handling behavior that is specific to Linux 
rather than doing a sanity check within spark-shell/scala(which might not be a 
bad idea).


was (Author: jpellman):
Just as another data point, it appears that a variant of this issue also rears 
its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp 
directory to be on an NFS mount:

{code}
22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception 
while deleting Spark temp dir: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3
java.io.IOException: Failed to delete: 
/hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377
at 
org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144)
at 
org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118)
at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-17 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745850#comment-16745850
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/18/19 7:21 AM:
-

I don't know why I persist with posting anything on this issues thread when 
nobody cares.

However, I did notice this. When testing  the
{code:java}
pyspark {code}
shell the temp directory file cleanup seems to go smoothly.

The problem with this stuff is that nobody in Windows land can trust how deep 
they have to go to determine if the whole spark engine is hopelessly broken or 
it is just the reputation of the platform because of a weird REPL bug.

I suggest you look into the possibility that scala REPL is busted.

If your advice to  Windows developers is use pyspark where possible that would 
be great.

In my company, I am moving as much data workflow as possible to python dask 
because I just don't trust spark with all this weird reported behavior and no 
active follow up.

 


was (Author: kingsley):
I don't know why I persist with posting anything on this issues thread when 
nobody cares.

However, I did notice this. When testing  the
{code:java}
pyspark {code}
shell the temp directory file cleanup seems to go smoothly.

The problem with this stuff is that nobody in Windows land can trust how deep 
they have to go to determine if the whole spark engine is hopelessly broken or 
it is just the reputation of the platform because of a weird REPL bug.

I suggest you look into the possibility that scala REPL is busted.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

 
{code:java}
scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
{code}
 

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:54 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

There the poster writes:

 
{code:java}
scala.tools.nsc.interpreter.IMain.TranslatingClassLoader{code}
calls a non-thread-safe method
{code:java}
translateSimpleResource{code}
(this method calls {{SymbolTable.enteringPhase}}), which makes it become 
non-thread-safe. However, a ClassLoader must be thread-safe since the class can 
be loaded in arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{code}
 

{color:#33}That is the same class in the above bug being marked as non 
thread-safe.{color}

{color:#33}See this stack overflow for a discussion of thread safety issues 
in scala:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:55 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

scala> val parent1 = loader.getParent()
parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49

scala> val parent2 = parent1.getParent()
parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2

scala> val parent3 = parent2.getParent()
parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b

scala> val parent4 = parent3.getParent()
parent4: ClassLoader = null

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

scala> parent1.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:51 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:50 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:49 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
{color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

 

{color:#33}See this stack overflow for a discussion of thread safety:{color}

{color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color}

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

 

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely {color:#24292e}TranslatingClassLoader) which is 
identified as an open bug in the scala language issues marked "non 
threadsafe".{color}

 

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color}

 

It seemed like he could only produce it with a complicated SQL script.

 

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

 

That fits the perps modus operandi in my book... marked non threadsafe and 
causes a sensitive operating system like Windows to barf.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:47 AM:
-

I am going down the Rabbit Hole of the Scala REPL.
 I think this is the right code branch
 
[https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala]

lines 357 to 352 define TranslatingClassLoader

It appears to be the central mechanism of the scala REPL to parse, compile and 
load any class that is defined in the REPL. There is an open bug on the scala 
issues section marking this class as having been identified as not thread-safe.

https://github.com/scala/bug/issues/10045

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)


was (Author: kingsley):
I am going down the Rabbit Hole of the Scala REPL.
I think this is the right code branch
https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala

lines 569 to 577
  /** This instance is no longer needed, so release any resources
*  it is using.  The reporter's output gets flushed.
*/
  override def close(): Unit = {
reporter.flush()
if (initializeComplete) {
  global.close()
}
  }

perhaps .close() is not closing everything.

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2018-11-09 Thread Deej (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681103#comment-16681103
 ] 

Deej edited comment on SPARK-12216 at 11/9/18 9:08 AM:
---

This issue has *NOT* been fixed, so marking it as Resolved is plain silly. 
Moreover, suggesting users to switch to other OSes is not only reckless but 
also regressive when there is a large community of users attempting to adopt 
Spark as one of their large scale data processing tools. So please stop with 
the condescension and work on fixing this bug as the community has been 
expecting for a long while now.

 

As others have reported, I am able to successfully launch spark-shell and 
perform basic tasks (including sc.stop()) successfully. However, the moment I 
try to quit the repl session, it craps out immediately. Also, I am able to 
manually delete the said temp files/folders Spark creates in the temp directory 
so there are no permissions issues. Even executing these commands from a 
command prompt running as Administrator results in the same error, reinforcing 
the assumption that this is not related to permissions on the temp folder at 
all.

Here is my set-up to reproduce this issue:-

OS: Windows 10

Spark: version 2.3.2

 /_/
 Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
  
 Stack trace:
 ===
 scala> sc
 res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@41167ded
 scala> sc.stop()
 scala> :quit
 2018-11-09 00:10:42 ERROR ShutdownHookManager:91 - Exception while deleting 
Spark temp dir: 
C:\Users\user1\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87
 java.io.IOException: Failed to delete: 
C:\Users\{color:#33}user1\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87
     at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1074)
     at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
     at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
     at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
     at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
     at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
     at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
     at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
 


was (Author: laal):
This issue has *NOT* been fixed, so marking it as Resolved is plain silly. 
Moreover, suggesting users to switch to other OSes is not only reckless but 
also regressive when there is a large community of users attempting to adopt 
Spark as one of their large scale data processing tools. So please stop with 
the condescension and work on fixing this bug as the community has been 
expecting for a long while now.

 

As others have reported, I am able to successfully launch spark-shell and 
perform basic tasks (including sc.stop()) successfully. However, the moment I 
try to quit the repl session, it craps out immediately.

Here is my set-up to reproduce this issue:-

OS: Windows 10

Spark: version 2.3.2

 /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171)
 
Stack trace:
===
scala> sc
res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@41167ded
scala> sc.stop()
scala> :quit
2018-11-09 00:10:42 ERROR ShutdownHookManager:91 - Exception while deleting 
Spark temp dir: 
C:\Users\user1\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87
java.io.IOException: Failed to delete: 
C:\Users\{color:#33}user1{color}\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87
    at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1074)
    at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
    at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
    at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
    at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
    at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
    at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2018-03-31 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2018-03-31 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich]Igor Bablich... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-08-20 Thread Mark S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134527#comment-16134527
 ] 

Mark S edited comment on SPARK-12216 at 8/20/17 8:12 PM:
-

I seem to have same issue on my Windows 10 environment, when I am running in 
local mode.
{code:java}
SparkSession spark = SparkSession
.builder()
.master("local")
.appName("Java Spark SQL App")
.config("spark.some.config.option", "some-value")
.getOrCreate();

Dataset df = spark.read().json("*.gz").toDF();
//  Dataset df = 
spark.read().json("C:\\dev\\source\\_misc\\Company\\sample-data\\data01.gz").toDF();
//  Dataset df = 
spark.read().json("/mnt/c/dev/source/_misc/Company/sample-data/data01.gz").toDF();

df.createOrReplaceTempView("MyDataTable");
Dataset result01 = spark.sql("Select postcode, count(*) from MyDataTable 
Group by postcode");

// result01.write().format("parquet").save("output.parquet");
result01.write().parquet("output.parquet");
{code}

*Question - Am I to assumed this Spark 2.x + Windows issue will not be fixed?*

BTW.  I can confirm that running Spark using  "Bash on Ubuntu for Windows" as 
suggested by [~jerome.scheuring] does work.

{noformat}
17/08/20 18:29:36 INFO FileOutputCommitter: Saved output of task 
'attempt_20170820182935_0017_m_00_0' to 
file:/mnt/c/dev/source/_misc/Company/Project/target/output.parquet/_temporary/0/task_201708201829
35_0017_m_00
17/08/20 18:29:36 INFO SparkHadoopMapRedUtil: 
attempt_20170820182935_0017_m_00_0: Committed
17/08/20 18:29:36 INFO Executor: Finished task 0.0 in stage 17.0 (TID 606). 
2294 bytes result sent to driver
17/08/20 18:29:36 INFO TaskSetManager: Finished task 0.0 in stage 17.0 (TID 
606) in 899 ms on localhost (executor driver) (1/1)
17/08/20 18:29:36 INFO DAGScheduler: ResultStage 17 (parquet at 
SparkApp.java:61) finished in 0.900 s
17/08/20 18:29:36 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks 
have all completed, from pool
17/08/20 18:29:36 INFO DAGScheduler: Job 8 finished: parquet at 
SparkApp.java:61, took 6.870277 s
{noformat}


h3. Environment 1
* Windows 10 
* Java 8
* Spark 2.2.0
* Parquet 1.8.2
* Stacktrace
{noformat}
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
... 35 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-05-02 Thread Supriya Pasham (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994192#comment-15994192
 ] 

Supriya Pasham edited comment on SPARK-12216 at 5/3/17 2:46 AM:


Hi Team,

I am executing 'spark-submit' with a jar and properties file in the below 
manner 

  -> spark-submit --class package.classname --master local[*] 
\Spark.jar data.properties

When i run the above command, immediately 2-3 exceptions are displayed in the 
command prompt with below exception details.

I have seen that this is issue is marked as resolved, but i dint fin correct 
resolution.
Please let me know if there is a solution to this issue -

ERROR ShutdownHookManager: Exception while deleting Spark temp dir: 
C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387
855
java.io.IOException: Failed to delete: 
C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387855



Environment details : I am running the commands in Windows 7 machine
Request you to provide a solution asap.


was (Author: supriya):
Hi Team,

I am executing 'spark-submit' with a jar and properties file in the below 
manner 

  -> spark-submit --class package.classname --master local[*] 
\Spark.jar data.properties

When i run the above command, immediately 2-3 exceptions are displayed in the 
command prompt with below exception details.

I have seen that this is issue is marked as resolved, but i dint fin correct 
resolution.
Please let me know if there is a solution to this issue -

ERROR ShutdownHookManager: Exception while deleting Spark temp dir: 
C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387
855
java.io.IOException: Failed to delete: 
C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387855


Request you to provide a solution asap.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-03-24 Thread Jouni H (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987
 ] 

Jouni H edited comment on SPARK-12216 at 3/24/17 10:25 PM:
---

I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar


bugtest.txt:
{noformat}
one
two
three
{noformat}

sparkbugtest.py:

{noformat}
import sys
import time

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()


lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])

time.sleep(10) # just for debug to see how files change in temporary folder

# at this point there is only this script (sparkbugtest.py) in the 
temporary folder

lines.saveAsTextFile(sys.argv[2])

# at this point there is both sparkbugtest.py and aws-java-sdk-1.7.4.jar in 
the temporary folder

time.sleep(10) # for debug

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end (I think during saveAsTextFile, or after it), the 
{{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} 
get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\




was (Author: jouni):
I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end (I think during saveAsTextFile, or after it), the 
{{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} 
get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\



> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-03-24 Thread Jouni H (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987
 ] 

Jouni H edited comment on SPARK-12216 at 3/24/17 9:58 PM:
--

I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end (I think during saveAsTextFile, or after it), the 
{{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} 
get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\




was (Author: jouni):
I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end, the {{aws-java-sdk-1.7.4.jar}} file is copied there and the 
{{sparkbugtest.py}} get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\



> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-03-24 Thread Jouni H (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987
 ] 

Jouni H edited comment on SPARK-12216 at 3/24/17 9:54 PM:
--

I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

What happens in the userFiles tmp folder is interesting:

* At first there is the {{sparkbugtest.py}}
* At the end, the {{aws-java-sdk-1.7.4.jar}} file is copied there and the 
{{sparkbugtest.py}} get's deleted
* After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in 
the temporary folder that couldn't be deleted 

The temp folder in this example was like: 
C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\




was (Author: jouni):
I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

After the error is thrown and and spark-submit has ended, I take a look at the 
folder that couldn't be deleted, it has the .jar file inside, for example 
{{C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\aws-java-sdk-1.7.4.jar}}


> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-03-24 Thread Jouni H (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987
 ] 

Jouni H edited comment on SPARK-12216 at 3/24/17 9:37 PM:
--

I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

After the error is thrown and and spark-submit has ended, I take a look at the 
folder that couldn't be deleted, it has the .jar file inside, for example 
{{C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\aws-java-sdk-1.7.4.jar}}



was (Author: jouni):
I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-03-24 Thread Jouni H (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987
 ] 

Jouni H edited comment on SPARK-12216 at 3/24/17 8:59 PM:
--

I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

aws-java-sdk-1.7.4.jar can be downloaded from here: 
https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html


was (Author: jouni):
I was able to reproduce this bug on Windows with the latest spark version: 
spark-2.1.0-bin-hadoop2.7

This bug happens for me when I include --jars for spark-submit AND use 
saveAsTextOut on the script.

Example scenarios:

* ERROR when include --jars AND use saveAsTextFile 
* Works when use saveAsTextFile, but don't use any --jars on command line 
* Works when you include --jars on command line but don't use saveAsTextOut 
(comment out)

Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar 
sparkbugtest.py bugtest.txt ./output/test1/}}

The script here doesn't need the --jars file, but if you include it on the 
command line, it causes the shutdown bug.

The input in the bugtest.txt doesn't matter.

Example script:

{noformat}
import sys

from pyspark.sql import SparkSession

def main():

# Initialize the spark context.
spark = SparkSession\
.builder\
.appName("SparkParseLogTest")\
.getOrCreate()

lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0])
lines.saveAsTextFile(sys.argv[2])

if __name__ == "__main__":
main()

{noformat}

I also use winutils.exe as mentioned here: 
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-02-26 Thread Vlad Sadilovski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884892#comment-15884892
 ] 

Vlad Sadilovski edited comment on SPARK-12216 at 2/26/17 7:58 PM:
--

I found following bug while installing spark on Windows 10 prof. When exiting 
spark-shell it complains as the following:

scala> :quit
17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting 
Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
java.io.IOException: Failed to delete: 
C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

I immediately switched to the windows file explorer and was able to delete this 
folder. So, it doesn't look like a permission issue.

Is this the same issue reported by the OP? If so, why this issue is closed as 
it is still reproducible/exists in the latest spark distribution? Are there any 
workarounds (beside moving to linux)?

Thanks


was (Author: vlovsky):
I found following bug while installing spark on Windows 10 prof. When exiting 
spark-shell it complains as the following:

scala> :quit
17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting 
Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
java.io.IOException: Failed to delete: 
C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-02-26 Thread Vlad Sadilovski (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884892#comment-15884892
 ] 

Vlad Sadilovski edited comment on SPARK-12216 at 2/26/17 7:57 PM:
--

I found following bug while installing spark on Windows 10 prof. When exiting 
spark-shell it complains as the following:

scala> :quit
17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting 
Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
java.io.IOException: Failed to delete: 
C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

Is this the same issue reported by the OP? If so, why this issue is closed as 
it is still reproducible/exists in the latest spark distribution? Are there any 
workarounds (beside moving to linux)?

Thanks


was (Author: vlovsky):
I found following bug while installing spark on windows. When exiting 
spark-shell it complains as the following:

scala> :quit
17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting 
Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
java.io.IOException: Failed to delete: 
C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-12-01 Thread Brian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713978#comment-15713978
 ] 

Brian edited comment on SPARK-12216 at 12/2/16 4:24 AM:


Theory or no for what caused it, it's a bug in spark.  Other programs and 
libraries I run on windows do not have this problem... Just because you don't 
know how to fix a bug doesn't mean it doesn't exist, I really don't understand 
that logic.


was (Author: brian44):
Theory or no for what caused it, it's a bug in spark.  Other programs and 
libraries I run on windows do not have this problem... Just because you don't 
knwo how to fix a bug doesn't mean it doesn't exist, I really don't understand 
that logic.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349
 ] 

Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:59 PM:


_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

_Update:_  The bug also does not occur when run on the installation of Spark 
2.0.1 on the Windows 10 machine running inside "Bash on Ubuntu on Windows", 
i.e. the Linux subsystem running on the Windows 10 machine where the bug _does_ 
occur when the program is executed from Windows.

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}


was (Author: jerome.scheuring):
_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349
 ] 

Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:34 PM:


_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}


was (Author: jerome.scheuring):
_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-03-13 Thread Guram Savinov (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192281#comment-15192281
 ] 

Guram Savinov edited comment on SPARK-12216 at 3/13/16 10:49 AM:
-

I have the same problem when exit from spark-shell on windows 7.
Seems that it's not the permission problems because I start console as admin 
and have no problems with removng this directories manually.
Maybe this directory is locked by some thread when shutdown hook executes.

Take a look at this post, it has details about possible directory lock:
http://jakzaprogramowac.pl/pytanie/12478,how-to-find-which-java-scala-thread-has-locked-a-file


was (Author: gsavinov):
I have the same problem when exit from spark-shell on windows 7.
Seems that it's not the permission problems because I start console as admin 
and have no problems with removng this directories manually.
Maybe this directory is locked by some thread when shutdown hook executes.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org