[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507 ] John Pellman edited comment on SPARK-12216 at 9/26/22 1:42 PM: --- Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {code} The problem in this case seems to be that {{spark-shell}} is attempting to do a recursive unlink while files are still open (NFS client-side [silly renames|http://nfs.sourceforge.net/#faq_d2]). It looks like this overall issue might be less of a "weird Windows thing" and more of an issue with spark-shell not waiting until all file handles are closed before attempting to remove the temp dir. This behavior cannot be reproduced consistently and appears to be non-deterministic. The obvious workaround here is to not put temp directories on NFS, but it does seem like you're relying upon file handling behavior that is specific to how Linux behaves using non-NFS volumes rather than doing a sanity check within spark-shell/scala(which might not be a bad idea). was (Author: jpellman): Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609507#comment-17609507 ] John Pellman edited comment on SPARK-12216 at 9/26/22 1:40 PM: --- Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:128) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:91) at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1141) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$4$adapted(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at org.apache.spark.util.ShutdownHookManager$.$anonfun$new$2(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:214) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$2(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996) at org.apache.spark.util.SparkShutdownHookManager.$anonfun$runAll$1(ShutdownHookManager.scala:188) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at scala.util.Try$.apply(Try.scala:213) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {code} The problem in this case seems to be that {{spark-shell}} is attempting to do a recursive unlink while files are still open (NFS client-side [silly renames|http://nfs.sourceforge.net/#faq_d2]). It looks like this overall issue might be less of a "weird Windows thing" and more of an issue with spark-shell not waiting until all file handles are closed before attempting to remove the temp dir. This behavior cannot be reproduced consistently and appears to be non-deterministic. The obvious workaround here is to not put temp directories on NFS, but it does seem like you're relying upon file handling behavior that is specific to Linux rather than doing a sanity check within spark-shell/scala(which might not be a bad idea). was (Author: jpellman): Just as another data point, it appears that a variant of this issue also rears its head on GNU/Linux (Debian 10, 3.1.2, Scala 2.12.14) if you set your temp directory to be on an NFS mount: {code} 22/09/26 13:19:09 ERROR org.apache.spark.util.ShutdownHookManager: Exception while deleting Spark temp dir: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3 java.io.IOException: Failed to delete: /hadoop/spark/tmp/spark-af087c3d-6abf-40cb-b3c8-b86e38f2f827/repl-60fe6d34-7dfd-4530-bbb6-f1ace7e953b3/$line10/.nfs026e00cd1377 at org.apache.spark.network.util.JavaUtils.deleteRecursivelyUsingJavaIO(JavaUtils.java:144) at org.apache.spark.network.util.JavaUtils.deleteRecursively(JavaUtils.java:118) at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745850#comment-16745850 ] Kingsley Jones edited comment on SPARK-12216 at 1/18/19 7:21 AM: - I don't know why I persist with posting anything on this issues thread when nobody cares. However, I did notice this. When testing the {code:java} pyspark {code} shell the temp directory file cleanup seems to go smoothly. The problem with this stuff is that nobody in Windows land can trust how deep they have to go to determine if the whole spark engine is hopelessly broken or it is just the reputation of the platform because of a weird REPL bug. I suggest you look into the possibility that scala REPL is busted. If your advice to Windows developers is use pyspark where possible that would be great. In my company, I am moving as much data workflow as possible to python dask because I just don't trust spark with all this weird reported behavior and no active follow up. was (Author: kingsley): I don't know why I persist with posting anything on this issues thread when nobody cares. However, I did notice this. When testing the {code:java} pyspark {code} shell the temp directory file cleanup seems to go smoothly. The problem with this stuff is that nobody in Windows land can trust how deep they have to go to determine if the whole spark engine is hopelessly broken or it is just the reputation of the platform because of a weird REPL bug. I suggest you look into the possibility that scala REPL is busted. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM: - Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 2.2.1, Hadoop 2.7 My tests support the contention of [~IgorBabalich] ... it seems that classloaders instantiated by the code are not ever being closed. On *nix this is not a problem since the files are not locked. However, on windows the files are locked. In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 seems relevant: [https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html] A new method "close()" was introduced to address the problem, which shows up on Windows due to the differing treatment of file locks between the Windows file system and *nix file system. I would point out that this is a generic java issue which breaks the cross-platform intention of that platform as a whole. The Oracle blog also contains a post: [https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader] I have been searching the Apache Spark code-base for classloader instances, in search of any ".close()" action. I could not find any, so I believe [~IgorBabalich] is correct - the issue has to do with classloaders not being closed. I would fix it myself, but thusfar it is not clear to me *when* the classloader needs to be closed. That is just ignorance on my part. The question is whether the classloader should be closed when still available as variable at the point where it has been instantiated, or later during the ShutdownHookManger cleanup. If the latter, then it was not clear to me how to actually get a list of open class loaders. That is where I am at so far. I am prepared to put some work into this, but I need some help from those who know the codebase to help answer the above question - maybe with a well-isolated test. MY TESTS... This issue has been around in one form or another for at least four years and shows up on many threads. The standard answer is that it is a "permissions issue" to do with Windows. That assertion is objectively false. There is simple test to prove it. At a windows prompt, start spark-shell C:\spark\spark-shell then get the temp file directory: scala> sc.getConf.get("spark.repl.class.outputDir") it will be in %AppData%\Local\Temp tree e.g. C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f where the last file name has GUID that changes in each iteration. With the spark session still open, go to the Temp directory and try to delete the given directory. You won't be able to... there is a lock on it. Now issue scala> :quit to quit the session. The stack trace will show that ShutdownHookManager tried to delete the directory above but could not. If you now try and delete it through the file system you can. This is because the JVM actually cleans up the locks on exit. So, it is not a permission issue, but a feature of the Windows treatment of file locks. This is the *known issue* that was addressed in the Java bug fix through introduction of a Closeable interface close method for URLClassLoader. It was fixed there since many enterprise systems run on Windows. Now... to further test the cause, I used the Windows Linux Subsytem. To acces this (post install) you run C:> bash from a command prompt. In order to get this to work, I used the same spark install, but had to install a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is standard ubuntu stuff, but the path to your windows c drive is /mnt/c If I rerun the same test, the new output of scala> sc.getConf.get("spark.repl.class.outputDir") will be a different folder location under Linux /tmp but with the same setup. With the spark session still active it is possible to delete the spark folders in the /tmp folder *while the session is still active*. This is the difference between Windows and Linux. While bash is running Ubuntu on Windows, it has the different file locking behaviour which means you can delete the spark temp folders while a session is running. If you run through a new session with spark-shell at the linux prompt and issue :quit it will shutdown without any stacktrace error from ShutdownHookManger. So, my conclusions are as follows: 1) this is not a permissions issue as per the common assertion 2) it is a Windows specific problem for *known* reasons - namely the difference on file-locking as compared with Linux 3) it was considered a *bug* in the Java ecosystem and was fixed as such from Java 1.7 with the .close() method Further... People who need to run Spark on windows infrastructure (like me) can either run a docker container or use the windows linux
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM: - {code:java} scala> val loader = Thread.currentThread.getContextClassLoader() loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f scala> val parent1 = loader.getParent() parent1: ClassLoader = scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49 scala> val parent2 = parent1.getParent() parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2 scala> val parent3 = parent2.getParent() parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b scala> val parent4 = parent3.getParent() parent4: ClassLoader = null {code} I did experiment with trying to find the open ClassLoaders in the scala session (shown above). shows exposed methods on the loaders, but there is no close method: {code:java} scala> loader. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus scala> parent1. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus {code} There is no close method on any of these, so I could not try closing them prior to quitting the session. This was just a simple hack to see if there was any way to use reflection to find the open ClassLoaders. I thought perhaps it might be possible to walk this tree and then close them within ShutDownHookManager ??? was (Author: kingsley): {code:java} scala> val loader = Thread.currentThread.getContextClassLoader() loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f scala> val parent1 = loader.getParent() parent1: ClassLoader = scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49 scala> val parent2 = parent1.getParent() parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2 scala> val parent3 = parent2.getParent() parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b scala> val parent4 = parent3.getParent() parent4: ClassLoader = null {code} I did experiment with trying to find the open ClassLoaders in the scala session (shown above). shows exposed methods on the loaders, but there is no close method: scala> loader. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus scala> parent1. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus There is no close method on any of these, so I could not try closing them prior to quitting the session. This was just a simple hack to see if there was any way to use reflection to find the open ClassLoaders. I thought perhaps it might be possible to walk this tree and then close them within ShutDownHookManager ??? > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:54 AM: - Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] There the poster writes: {code:java} scala.tools.nsc.interpreter.IMain.TranslatingClassLoader{code} calls a non-thread-safe method {code:java} translateSimpleResource{code} (this method calls {{SymbolTable.enteringPhase}}), which makes it become non-thread-safe. However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread. In my REPL reflection experiment above the relevant class is: {code:java} scala> val loader = Thread.currentThread.getContextClassLoader() loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{code} {color:#33}That is the same class in the above bug being marked as non thread-safe.{color} {color:#33}See this stack overflow for a discussion of thread safety issues in scala:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. was (Author: kingsley): Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] `scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a non-thread-safe method `{{translateSimpleResource`}} (this method calls `{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe. "However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread." In my REPL reflection experiment above the relevant class is: {color:#33}`scala> val loader = Thread.currentThread.getContextClassLoader()`{color} {color:#33}`loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color} {color:#33}Ergo ... the same class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:55 AM: - {code:java} scala> val loader = Thread.currentThread.getContextClassLoader() loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f scala> val parent1 = loader.getParent() parent1: ClassLoader = scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49 scala> val parent2 = parent1.getParent() parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2 scala> val parent3 = parent2.getParent() parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b scala> val parent4 = parent3.getParent() parent4: ClassLoader = null {code} I did experiment with trying to find the open ClassLoaders in the scala session (shown above). shows exposed methods on the loaders, but there is no close method: scala> loader. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus scala> parent1. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus There is no close method on any of these, so I could not try closing them prior to quitting the session. This was just a simple hack to see if there was any way to use reflection to find the open ClassLoaders. I thought perhaps it might be possible to walk this tree and then close them within ShutDownHookManager ??? was (Author: kingsley): scala> val loader = Thread.currentThread.getContextClassLoader() loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f scala> val parent1 = loader.getParent() parent1: ClassLoader = scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49 scala> val parent2 = parent1.getParent() parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2 scala> val parent3 = parent2.getParent() parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b scala> val parent4 = parent3.getParent() parent4: ClassLoader = null I did experiment with trying to find the open ClassLoaders in the scala session (shown above). shows exposed methods on the loaders, but there is no close method: scala> loader. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus scala> parent1. clearAssertionStatus getResource getResources setClassAssertionStatus setPackageAssertionStatus getParent getResourceAsStream loadClass setDefaultAssertionStatus There is no close method on any of these, so I could not try closing them prior to quitting the session. This was just a simple hack to see if there was any way to use reflection to find the open ClassLoaders. I thought perhaps it might be possible to walk this tree and then close them within ShutDownHookManager ??? > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:51 AM: - Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] `scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a non-thread-safe method `{{translateSimpleResource`}} (this method calls `{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe. "However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread." In my REPL reflection experiment above the relevant class is: {color:#33}`scala> val loader = Thread.currentThread.getContextClassLoader()`{color} {color:#33}`loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color} {color:#33}Ergo ... the same class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. was (Author: kingsley): Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] 'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a non-thread-safe method {{translateSimpleResource}} (this method calls {{SymbolTable.enteringPhase}}), which makes it become non-thread-safe. However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread. In my REPL reflection experiment above the relevant class is: {color:#33}scala> val loader = Thread.currentThread.getContextClassLoader(){color} {color:#33} loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color} {color:#33}Ergo ... the very class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:50 AM: - Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] 'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a non-thread-safe method {{translateSimpleResource}} (this method calls {{SymbolTable.enteringPhase}}), which makes it become non-thread-safe. However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread. In my REPL reflection experiment above the relevant class is: {color:#33}scala> val loader = Thread.currentThread.getContextClassLoader(){color} {color:#33} loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color} {color:#33}Ergo ... the very class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. was (Author: kingsley): Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a non-thread-safe method {{translateSimpleResource}} (this method calls {{SymbolTable.enteringPhase}}), which makes it become non-thread-safe. However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread. In my REPL reflection experiment above the relevant class is: {color:#33}scala> val loader = Thread.currentThread.getContextClassLoader(){color} {color:#33} loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color} {color:#33}Ergo ... the very class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:49 AM: - Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a non-thread-safe method {{translateSimpleResource}} (this method calls {{SymbolTable.enteringPhase}}), which makes it become non-thread-safe. However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread. In my REPL reflection experiment above the relevant class is: {color:#33}scala> val loader = Thread.currentThread.getContextClassLoader(){color} {color:#33} loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color} {color:#33}Ergo ... the very class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} [https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety] The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe". {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. UPDATE: I cross-posted on [https://github.com/scala/bug/issues/10045] with an explanation of the observations made here and a link back to this issue. was (Author: kingsley): Okay, so I think we have a candidate for what is actually causing the problem. There is an open bug on the scala language site for a class within the scala REPL IMain.scala [https://github.com/scala/bug/issues/10045] scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a non-thread-safe method {{translateSimpleResource}} (this method calls {{SymbolTable.enteringPhase}}), which makes it become non-thread-safe. However, a ClassLoader must be thread-safe since the class can be loaded in arbitrary thread. In my REPL reflection experiment above the relevant class is: {color:#33}scala> val loader = Thread.currentThread.getContextClassLoader(){color} {color:#33} loader: ClassLoader = scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color} {color:#33}Ergo ... the very class in the above bug being marked as non threadsafe.{color} {color:#33}See this stack overflow for a discussion of thread safety:{color} {color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color} The scala REPL code has some internal classloaders which are used to compile and execute any code entered into the REPL. On Windows 10, if you simply start a spark-shell from the command line, do nothing, and then :quit the REPL will barf with a stacktrace to this particular class reference (namely {color:#24292e}TranslatingClassLoader) which is identified as an open bug in the scala language issues marked "non threadsafe".{color} {color:#24292e}I am gonna try and contact the person who raised the bug on the scala issues thread and get some input.{color} It seemed like he could only produce it with a complicated SQL script. Here we have with Apache Spark a simple, and on my tests 100% reproducible, instance of the bug on Windows 10 and in my tests Windows Server 2016. That fits the perps modus operandi in my book... marked non threadsafe and causes a sensitive operating system like Windows to barf. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524 ] Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:47 AM: - I am going down the Rabbit Hole of the Scala REPL. I think this is the right code branch [https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala] lines 357 to 352 define TranslatingClassLoader It appears to be the central mechanism of the scala REPL to parse, compile and load any class that is defined in the REPL. There is an open bug on the scala issues section marking this class as having been identified as not thread-safe. https://github.com/scala/bug/issues/10045 The scala stuff has a different idiom to Java so maybe the closing of classloaders is less refined in experience (meaning it is just less clear what is the right way to catch 'em all) was (Author: kingsley): I am going down the Rabbit Hole of the Scala REPL. I think this is the right code branch https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala lines 569 to 577 /** This instance is no longer needed, so release any resources * it is using. The reporter's output gets flushed. */ override def close(): Unit = { reporter.flush() if (initializeComplete) { global.close() } } perhaps .close() is not closing everything. The scala stuff has a different idiom to Java so maybe the closing of classloaders is less refined in experience (meaning it is just less clear what is the right way to catch 'em all) > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16681103#comment-16681103 ] Deej edited comment on SPARK-12216 at 11/9/18 9:08 AM: --- This issue has *NOT* been fixed, so marking it as Resolved is plain silly. Moreover, suggesting users to switch to other OSes is not only reckless but also regressive when there is a large community of users attempting to adopt Spark as one of their large scale data processing tools. So please stop with the condescension and work on fixing this bug as the community has been expecting for a long while now. As others have reported, I am able to successfully launch spark-shell and perform basic tasks (including sc.stop()) successfully. However, the moment I try to quit the repl session, it craps out immediately. Also, I am able to manually delete the said temp files/folders Spark creates in the temp directory so there are no permissions issues. Even executing these commands from a command prompt running as Administrator results in the same error, reinforcing the assumption that this is not related to permissions on the temp folder at all. Here is my set-up to reproduce this issue:- OS: Windows 10 Spark: version 2.3.2 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) Stack trace: === scala> sc res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@41167ded scala> sc.stop() scala> :quit 2018-11-09 00:10:42 ERROR ShutdownHookManager:91 - Exception while deleting Spark temp dir: C:\Users\user1\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87 java.io.IOException: Failed to delete: C:\Users\{color:#33}user1\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87 at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1074) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) was (Author: laal): This issue has *NOT* been fixed, so marking it as Resolved is plain silly. Moreover, suggesting users to switch to other OSes is not only reckless but also regressive when there is a large community of users attempting to adopt Spark as one of their large scale data processing tools. So please stop with the condescension and work on fixing this bug as the community has been expecting for a long while now. As others have reported, I am able to successfully launch spark-shell and perform basic tasks (including sc.stop()) successfully. However, the moment I try to quit the repl session, it craps out immediately. Here is my set-up to reproduce this issue:- OS: Windows 10 Spark: version 2.3.2 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_171) Stack trace: === scala> sc res0: org.apache.spark.SparkContext = org.apache.spark.SparkContext@41167ded scala> sc.stop() scala> :quit 2018-11-09 00:10:42 ERROR ShutdownHookManager:91 - Exception while deleting Spark temp dir: C:\Users\user1\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87 java.io.IOException: Failed to delete: C:\Users\{color:#33}user1{color}\AppData\Local\Temp\spark-b155db59-b7c5-4f64-8cfb-00d8f95ea348\repl-fed61a6e-3a1e-46cf-90e9-3fbfcb8a1d87 at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1074) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530 ] Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM: - Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 2.2.1, Hadoop 2.7 My tests support the contention of [~IgorBabalich] ... it seems that classloaders instantiated by the code are not ever being closed. On *nix this is not a problem since the files are not locked. However, on windows the files are locked. In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 seems relevant: [https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html] A new method "close()" was introduced to address the problem, which shows up on Windows due to the differing treatment of file locks between the Windows file system and *nix file system. I would point out that this is a generic java issue which breaks the cross-platform intention of that platform as a whole. The Oracle blog also contains a post: [https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader] I have been searching the Apache Spark code-base for classloader instances, in search of any ".close()" action. I could not find any, so I believe [~IgorBabalich] is correct - the issue has to do with classloaders not being closed. I would fix it myself, but thusfar it is not clear to me *when* the classloader needs to be closed. That is just ignorance on my part. The question is whether the classloader should be closed when still available as variable at the point where it has been instantiated, or later during the ShutdownHookManger cleanup. If the latter, then it was not clear to me how to actually get a list of open class loaders. That is where I am at so far. I am prepared to put some work into this, but I need some help from those who know the codebase to help answer the above question - maybe with a well-isolated test. MY TESTS... This issue has been around in one form or another for at least four years and shows up on many threads. The standard answer is that it is a "permissions issue" to do with Windows. That assertion is objectively false. There is simple test to prove it. At a windows prompt, start spark-shell C:\spark\spark-shell then get the temp file directory: scala> sc.getConf.get("spark.repl.class.outputDir") it will be in %AppData%\Local\Temp tree e.g. C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f where the last file name has GUID that changes in each iteration. With the spark session still open, go to the Temp directory and try to delete the given directory. You won't be able to... there is a lock on it. Now issue scala> :quit to quit the session. The stack trace will show that ShutdownHookManager tried to delete the directory above but could not. If you now try and delete it through the file system you can. This is because the JVM actually cleans up the locks on exit. So, it is not a permission issue, but a feature of the Windows treatment of file locks. This is the *known issue* that was addressed in the Java bug fix through introduction of a Closeable interface close method for URLClassLoader. It was fixed there since many enterprise systems run on Windows. Now... to further test the cause, I used the Windows Linux Subsytem. To acces this (post install) you run C:> bash from a command prompt. In order to get this to work, I used the same spark install, but had to install a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is standard ubuntu stuff, but the path to your windows c drive is /mnt/c If I rerun the same test, the new output of scala> sc.getConf.get("spark.repl.class.outputDir") will be a different folder location under Linux /tmp but with the same setup. With the spark session still active it is possible to delete the spark folders in the /tmp folder *while the session is still active*. This is the difference between Windows and Linux. While bash is running Ubuntu on Windows, it has the different file locking behaviour which means you can delete the spark temp folders while a session is running. If you run through a new session with spark-shell at the linux prompt and issue :quit it will shutdown without any stacktrace error from ShutdownHookManger. So, my conclusions are as follows: 1) this is not a permissions issue as per the common assertion 2) it is a Windows specific problem for *known* reasons - namely the difference on file-locking as compared with Linux 3) it was considered a *bug* in the Java ecosystem and was fixed as such from Java 1.7 with the .close() method Further... People who need to run Spark on windows infrastructure (like me) can either run a docker container or use the windows linux
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530 ] Kingsley Jones edited comment on SPARK-12216 at 4/1/18 12:21 AM: - Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 2.2.1, Hadoop 2.7 My tests support the contention of [~IgorBabalich]Igor Bablich... it seems that classloaders instantiated by the code are not ever being closed. On *nix this is not a problem since the files are not locked. However, on windows the files are locked. In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 seems relevant: [https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html] A new method "close()" was introduced to address the problem, which shows up on Windows due to the differing treatment of file locks between the Windows file system and *nix file system. I would point out that this is a generic java issue which breaks the cross-platform intention of that platform as a whole. The Oracle blog also contains a post: [https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader] I have been searching the Apache Spark code-base for classloader instances, in search of any ".close()" action. I could not find any, so I believe [~IgorBabalich] is correct - the issue has to do with classloaders not being closed. I would fix it myself, but thusfar it is not clear to me *when* the classloader needs to be closed. That is just ignorance on my part. The question is whether the classloader should be closed when still available as variable at the point where it has been instantiated, or later during the ShutdownHookManger cleanup. If the latter, then it was not clear to me how to actually get a list of open class loaders. That is where I am at so far. I am prepared to put some work into this, but I need some help from those who know the codebase to help answer the above question - maybe with a well-isolated test. MY TESTS... This issue has been around in one form or another for at least four years and shows up on many threads. The standard answer is that it is a "permissions issue" to do with Windows. That assertion is objectively false. There is simple test to prove it. At a windows prompt, start spark-shell C:\spark\spark-shell then get the temp file directory: scala> sc.getConf.get("spark.repl.class.outputDir") it will be in %AppData%\Local\Temp tree e.g. C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f where the last file name has GUID that changes in each iteration. With the spark session still open, go to the Temp directory and try to delete the given directory. You won't be able to... there is a lock on it. Now issue scala> :quit to quit the session. The stack trace will show that ShutdownHookManager tried to delete the directory above but could not. If you now try and delete it through the file system you can. This is because the JVM actually cleans up the locks on exit. So, it is not a permission issue, but a feature of the Windows treatment of file locks. This is the *known issue* that was addressed in the Java bug fix through introduction of a Closeable interface close method for URLClassLoader. It was fixed there since many enterprise systems run on Windows. Now... to further test the cause, I used the Windows Linux Subsytem. To acces this (post install) you run C:> bash from a command prompt. In order to get this to work, I used the same spark install, but had to install a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is standard ubuntu stuff, but the path to your windows c drive is /mnt/c If I rerun the same test, the new output of scala> sc.getConf.get("spark.repl.class.outputDir") will be a different folder location under Linux /tmp but with the same setup. With the spark session still active it is possible to delete the spark folders in the /tmp folder *while the session is still active*. This is the difference between Windows and Linux. While bash is running Ubuntu on Windows, it has the different file locking behaviour which means you can delete the spark temp folders while a session is running. If you run through a new session with spark-shell at the linux prompt and issue :quit it will shutdown without any stacktrace error from ShutdownHookManger. So, my conclusions are as follows: 1) this is not a permissions issue as per the common assertion 2) it is a Windows specific problem for *known* reasons - namely the difference on file-locking as compared with Linux 3) it was considered a *bug* in the Java ecosystem and was fixed as such from Java 1.7 with the .close() method Further... People who need to run Spark on windows infrastructure (like me) can either run a docker container or use the windows
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134527#comment-16134527 ] Mark S edited comment on SPARK-12216 at 8/20/17 8:12 PM: - I seem to have same issue on my Windows 10 environment, when I am running in local mode. {code:java} SparkSession spark = SparkSession .builder() .master("local") .appName("Java Spark SQL App") .config("spark.some.config.option", "some-value") .getOrCreate(); Dataset df = spark.read().json("*.gz").toDF(); // Dataset df = spark.read().json("C:\\dev\\source\\_misc\\Company\\sample-data\\data01.gz").toDF(); // Dataset df = spark.read().json("/mnt/c/dev/source/_misc/Company/sample-data/data01.gz").toDF(); df.createOrReplaceTempView("MyDataTable"); Dataset result01 = spark.sql("Select postcode, count(*) from MyDataTable Group by postcode"); // result01.write().format("parquet").save("output.parquet"); result01.write().parquet("output.parquet"); {code} *Question - Am I to assumed this Spark 2.x + Windows issue will not be fixed?* BTW. I can confirm that running Spark using "Bash on Ubuntu for Windows" as suggested by [~jerome.scheuring] does work. {noformat} 17/08/20 18:29:36 INFO FileOutputCommitter: Saved output of task 'attempt_20170820182935_0017_m_00_0' to file:/mnt/c/dev/source/_misc/Company/Project/target/output.parquet/_temporary/0/task_201708201829 35_0017_m_00 17/08/20 18:29:36 INFO SparkHadoopMapRedUtil: attempt_20170820182935_0017_m_00_0: Committed 17/08/20 18:29:36 INFO Executor: Finished task 0.0 in stage 17.0 (TID 606). 2294 bytes result sent to driver 17/08/20 18:29:36 INFO TaskSetManager: Finished task 0.0 in stage 17.0 (TID 606) in 899 ms on localhost (executor driver) (1/1) 17/08/20 18:29:36 INFO DAGScheduler: ResultStage 17 (parquet at SparkApp.java:61) finished in 0.900 s 17/08/20 18:29:36 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks have all completed, from pool 17/08/20 18:29:36 INFO DAGScheduler: Job 8 finished: parquet at SparkApp.java:61, took 6.870277 s {noformat} h3. Environment 1 * Windows 10 * Java 8 * Spark 2.2.0 * Parquet 1.8.2 * Stacktrace {noformat} Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188) ... 35 more Caused by: org.apache.spark.SparkException: Task failed while writing rows at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by:
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994192#comment-15994192 ] Supriya Pasham edited comment on SPARK-12216 at 5/3/17 2:46 AM: Hi Team, I am executing 'spark-submit' with a jar and properties file in the below manner -> spark-submit --class package.classname --master local[*] \Spark.jar data.properties When i run the above command, immediately 2-3 exceptions are displayed in the command prompt with below exception details. I have seen that this is issue is marked as resolved, but i dint fin correct resolution. Please let me know if there is a solution to this issue - ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387 855 java.io.IOException: Failed to delete: C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387855 Environment details : I am running the commands in Windows 7 machine Request you to provide a solution asap. was (Author: supriya): Hi Team, I am executing 'spark-submit' with a jar and properties file in the below manner -> spark-submit --class package.classname --master local[*] \Spark.jar data.properties When i run the above command, immediately 2-3 exceptions are displayed in the command prompt with below exception details. I have seen that this is issue is marked as resolved, but i dint fin correct resolution. Please let me know if there is a solution to this issue - ERROR ShutdownHookManager: Exception while deleting Spark temp dir: C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387 855 java.io.IOException: Failed to delete: C:\Users\user1\AppData\Local\Temp\spark-5e37d680-2e9f-4aed-ac59-2f24d8387855 Request you to provide a solution asap. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987 ] Jouni H edited comment on SPARK-12216 at 3/24/17 10:25 PM: --- I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar bugtest.txt: {noformat} one two three {noformat} sparkbugtest.py: {noformat} import sys import time from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) time.sleep(10) # just for debug to see how files change in temporary folder # at this point there is only this script (sparkbugtest.py) in the temporary folder lines.saveAsTextFile(sys.argv[2]) # at this point there is both sparkbugtest.py and aws-java-sdk-1.7.4.jar in the temporary folder time.sleep(10) # for debug if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html What happens in the userFiles tmp folder is interesting: * At first there is the {{sparkbugtest.py}} * At the end (I think during saveAsTextFile, or after it), the {{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} get's deleted * After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in the temporary folder that couldn't be deleted The temp folder in this example was like: C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\ was (Author: jouni): I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html What happens in the userFiles tmp folder is interesting: * At first there is the {{sparkbugtest.py}} * At the end (I think during saveAsTextFile, or after it), the {{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} get's deleted * After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in the temporary folder that couldn't be deleted The temp folder in this example was like: C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\ > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987 ] Jouni H edited comment on SPARK-12216 at 3/24/17 9:58 PM: -- I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html What happens in the userFiles tmp folder is interesting: * At first there is the {{sparkbugtest.py}} * At the end (I think during saveAsTextFile, or after it), the {{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} get's deleted * After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in the temporary folder that couldn't be deleted The temp folder in this example was like: C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\ was (Author: jouni): I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html What happens in the userFiles tmp folder is interesting: * At first there is the {{sparkbugtest.py}} * At the end, the {{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} get's deleted * After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in the temporary folder that couldn't be deleted The temp folder in this example was like: C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\ > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority:
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987 ] Jouni H edited comment on SPARK-12216 at 3/24/17 9:54 PM: -- I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html What happens in the userFiles tmp folder is interesting: * At first there is the {{sparkbugtest.py}} * At the end, the {{aws-java-sdk-1.7.4.jar}} file is copied there and the {{sparkbugtest.py}} get's deleted * After the spark-submit has ended the {{aws-java-sdk-1.7.4.jar}} is still in the temporary folder that couldn't be deleted The temp folder in this example was like: C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\ was (Author: jouni): I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html After the error is thrown and and spark-submit has ended, I take a look at the folder that couldn't be deleted, it has the .jar file inside, for example {{C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\aws-java-sdk-1.7.4.jar}} > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987 ] Jouni H edited comment on SPARK-12216 at 3/24/17 9:37 PM: -- I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html After the error is thrown and and spark-submit has ended, I take a look at the folder that couldn't be deleted, it has the .jar file inside, for example {{C:\Users\Jouni\AppData\Local\Temp\spark-9b68fc91-7ee7-481a-970d-38a6db6f6160\userFiles-948dc876-bced-4778-98a7-90944a7fb155\aws-java-sdk-1.7.4.jar}} was (Author: jouni): I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15940987#comment-15940987 ] Jouni H edited comment on SPARK-12216 at 3/24/17 8:59 PM: -- I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. aws-java-sdk-1.7.4.jar can be downloaded from here: https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html was (Author: jouni): I was able to reproduce this bug on Windows with the latest spark version: spark-2.1.0-bin-hadoop2.7 This bug happens for me when I include --jars for spark-submit AND use saveAsTextOut on the script. Example scenarios: * ERROR when include --jars AND use saveAsTextFile * Works when use saveAsTextFile, but don't use any --jars on command line * Works when you include --jars on command line but don't use saveAsTextOut (comment out) Example command line: {{spark-submit --jars aws-java-sdk-1.7.4.jar sparkbugtest.py bugtest.txt ./output/test1/}} The script here doesn't need the --jars file, but if you include it on the command line, it causes the shutdown bug. The input in the bugtest.txt doesn't matter. Example script: {noformat} import sys from pyspark.sql import SparkSession def main(): # Initialize the spark context. spark = SparkSession\ .builder\ .appName("SparkParseLogTest")\ .getOrCreate() lines = spark.read.text(sys.argv[1]).rdd.map(lambda r: r[0]) lines.saveAsTextFile(sys.argv[2]) if __name__ == "__main__": main() {noformat} I also use winutils.exe as mentioned here: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-tips-and-tricks-running-spark-windows.html > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884892#comment-15884892 ] Vlad Sadilovski edited comment on SPARK-12216 at 2/26/17 7:58 PM: -- I found following bug while installing spark on Windows 10 prof. When exiting spark-shell it complains as the following: scala> :quit 17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f java.io.IOException: Failed to delete: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) I immediately switched to the windows file explorer and was able to delete this folder. So, it doesn't look like a permission issue. Is this the same issue reported by the OP? If so, why this issue is closed as it is still reproducible/exists in the latest spark distribution? Are there any workarounds (beside moving to linux)? Thanks was (Author: vlovsky): I found following bug while installing spark on Windows 10 prof. When exiting spark-shell it complains as the following: scala> :quit 17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f java.io.IOException: Failed to delete: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884892#comment-15884892 ] Vlad Sadilovski edited comment on SPARK-12216 at 2/26/17 7:57 PM: -- I found following bug while installing spark on Windows 10 prof. When exiting spark-shell it complains as the following: scala> :quit 17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f java.io.IOException: Failed to delete: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) Is this the same issue reported by the OP? If so, why this issue is closed as it is still reproducible/exists in the latest spark distribution? Are there any workarounds (beside moving to linux)? Thanks was (Author: vlovsky): I found following bug while installing spark on windows. When exiting spark-shell it complains as the following: scala> :quit 17/02/26 14:44:01 ERROR util.ShutdownHookManager: Exception while deleting Spark temp dir: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f java.io.IOException: Failed to delete: C:\temp\spark-6cb6ceac-06a8-44d7-9f3a-907c2dc7067f at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65) at org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62) at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188) at scala.util.Try$.apply(Try.scala:192) at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188) at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178) at
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15713978#comment-15713978 ] Brian edited comment on SPARK-12216 at 12/2/16 4:24 AM: Theory or no for what caused it, it's a bug in spark. Other programs and libraries I run on windows do not have this problem... Just because you don't know how to fix a bug doesn't mean it doesn't exist, I really don't understand that logic. was (Author: brian44): Theory or no for what caused it, it's a bug in spark. Other programs and libraries I run on windows do not have this problem... Just because you don't knwo how to fix a bug doesn't mean it doesn't exist, I really don't understand that logic. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349 ] Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:59 PM: _Note that I am entirely new to the process of submitting issues on this system: if this needs to be a new issue, I would appreciate someone letting me know._ A bug very similar to this one is 100% reproducible across multiple machines, running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running under Spark 2.0.1. It occurs * in Scala, but not Python (have not tried R) * only when reading CSV files (and not, for example, when reading Parquet files) * only when running local, not submitted to a cluster _Update:_ The bug also does not occur when run on the installation of Spark 2.0.1 on the Windows 10 machine running inside "Bash on Ubuntu on Windows", i.e. the Linux subsystem running on the Windows 10 machine where the bug _does_ occur when the program is executed from Windows. This program will produce the bug (if {{poemData}} is defined per the commented-out section, rather than being read from a CSV file, the bug does not occur): {code} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ object SparkBugDemo { def main(args: Array[String]): Unit = { val poemSchema = StructType( Seq( StructField("label",IntegerType), StructField("line",StringType) ) ) val sparkSession = SparkSession.builder() .appName("Spark Bug Demonstration") .master("local[*]") .getOrCreate() //val poemData = sparkSession.createDataFrame(Seq( // (0, "There's many a strong farmer"), // (0, "Who's heart would break in two"), // (1, "If he could see the townland"), // (1, "That we are riding to;") //)).toDF("label", "line") val poemData = sparkSession.read .option("quote", value="") .schema(poemSchema) .csv(args(0)) println(s"Record count: ${poemData.count()}") } } {code} Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string pairs, as in: {noformat} 0,There's many a strong farmer 0,Who's heart would break in two 1,If he could see the townland 1,That we are riding to; {noformat} was (Author: jerome.scheuring): _Note that I am entirely new to the process of submitting issues on this system: if this needs to be a new issue, I would appreciate someone letting me know._ A bug very similar to this one is 100% reproducible across multiple machines, running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running under Spark 2.0.1. It occurs * in Scala, but not Python (have not tried R) * only when reading CSV files (and not, for example, when reading Parquet files) * only when running local, not submitted to a cluster This program will produce the bug (if {{poemData}} is defined per the commented-out section, rather than being read from a CSV file, the bug does not occur): {code} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ object SparkBugDemo { def main(args: Array[String]): Unit = { val poemSchema = StructType( Seq( StructField("label",IntegerType), StructField("line",StringType) ) ) val sparkSession = SparkSession.builder() .appName("Spark Bug Demonstration") .master("local[*]") .getOrCreate() //val poemData = sparkSession.createDataFrame(Seq( // (0, "There's many a strong farmer"), // (0, "Who's heart would break in two"), // (1, "If he could see the townland"), // (1, "That we are riding to;") //)).toDF("label", "line") val poemData = sparkSession.read .option("quote", value="") .schema(poemSchema) .csv(args(0)) println(s"Record count: ${poemData.count()}") } } {code} Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string pairs, as in: {noformat} 0,There's many a strong farmer 0,Who's heart would break in two 1,If he could see the townland 1,That we are riding to; {noformat} > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349 ] Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:34 PM: _Note that I am entirely new to the process of submitting issues on this system: if this needs to be a new issue, I would appreciate someone letting me know._ A bug very similar to this one is 100% reproducible across multiple machines, running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running under Spark 2.0.1. It occurs * in Scala, but not Python (have not tried R) * only when reading CSV files (and not, for example, when reading Parquet files) * only when running local, not submitted to a cluster This program will produce the bug (if {{poemData}} is defined per the commented-out section, rather than being read from a CSV file, the bug does not occur): {code} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ object SparkBugDemo { def main(args: Array[String]): Unit = { val poemSchema = StructType( Seq( StructField("label",IntegerType), StructField("line",StringType) ) ) val sparkSession = SparkSession.builder() .appName("Spark Bug Demonstration") .master("local[*]") .getOrCreate() //val poemData = sparkSession.createDataFrame(Seq( // (0, "There's many a strong farmer"), // (0, "Who's heart would break in two"), // (1, "If he could see the townland"), // (1, "That we are riding to;") //)).toDF("label", "line") val poemData = sparkSession.read .option("quote", value="") .schema(poemSchema) .csv(args(0)) println(s"Record count: ${poemData.count()}") } } {code} Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string pairs, as in: {noformat} 0,There's many a strong farmer 0,Who's heart would break in two 1,If he could see the townland 1,That we are riding to; {noformat} was (Author: jerome.scheuring): _Note that I am entirely new to the process of submitting issues on this system: if this needs to be a new issue, I would appreciate someone letting me know._ A bug very similar to this one is 100% reproducible across multiple machines, running both Windows 8.1 and Windows 10. It occurs * in Scala, but not Python (have not tried R) * only when reading CSV files (and not, for example, when reading Parquet files) * only when running local, not submitted to a cluster This program will produce the bug (if {{poemData}} is defined per the commented-out section, rather than being read from a CSV file, the bug does not occur): {code} import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ object SparkBugDemo { def main(args: Array[String]): Unit = { val poemSchema = StructType( Seq( StructField("label",IntegerType), StructField("line",StringType) ) ) val sparkSession = SparkSession.builder() .appName("Spark Bug Demonstration") .master("local[*]") .getOrCreate() //val poemData = sparkSession.createDataFrame(Seq( // (0, "There's many a strong farmer"), // (0, "Who's heart would break in two"), // (1, "If he could see the townland"), // (1, "That we are riding to;") //)).toDF("label", "line") val poemData = sparkSession.read .option("quote", value="") .schema(poemSchema) .csv(args(0)) println(s"Record count: ${poemData.count()}") } } {code} Assuming that {{args(0)}} contains the path to a file with comma-separated integer/string pairs, as in: {noformat} 0,There's many a strong farmer 0,Who's heart would break in two 1,If he could see the townland 1,That we are riding to; {noformat} > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete:
[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15192281#comment-15192281 ] Guram Savinov edited comment on SPARK-12216 at 3/13/16 10:49 AM: - I have the same problem when exit from spark-shell on windows 7. Seems that it's not the permission problems because I start console as admin and have no problems with removng this directories manually. Maybe this directory is locked by some thread when shutdown hook executes. Take a look at this post, it has details about possible directory lock: http://jakzaprogramowac.pl/pytanie/12478,how-to-find-which-java-scala-thread-has-locked-a-file was (Author: gsavinov): I have the same problem when exit from spark-shell on windows 7. Seems that it's not the permission problems because I start console as admin and have no problems with removng this directories manually. Maybe this directory is locked by some thread when shutdown hook executes. > Spark failed to delete temp directory > -- > > Key: SPARK-12216 > URL: https://issues.apache.org/jira/browse/SPARK-12216 > Project: Spark > Issue Type: Bug > Components: Spark Shell > Environment: windows 7 64 bit > Spark 1.52 > Java 1.8.0.65 > PATH includes: > C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin > C:\ProgramData\Oracle\Java\javapath > C:\Users\Stefan\scala\bin > SYSTEM variables set are: > JAVA_HOME=C:\Program Files\Java\jre1.8.0_65 > HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin > (where the bin\winutils resides) > both \tmp and \tmp\hive have permissions > drwxrwxrwx as detected by winutils ls >Reporter: stefan >Priority: Minor > > The mailing list archives have no obvious solution to this: > scala> :q > Stopping spark context. > 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark > temp dir: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > java.io.IOException: Failed to delete: > C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff > at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60) > at scala.collection.mutable.HashSet.foreach(HashSet.scala:79) > at > org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60) > at > org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234) > at scala.util.Try$.apply(Try.scala:161) > at > org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234) > at > org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216) > at > org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org