date:20190116

[jira] [Updated] (SPARK-26634) OutputCommitCoordinator may allow task of FetchFailureStage commit again

2019-01-16 Thread liupengcheng (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liupengcheng updated SPARK-26634:
-
Affects Version/s: (was: 2.4.0)

> OutputCommitCoordinator may allow task of FetchFailureStage commit again
> 
>
> Key: SPARK-26634
> URL: https://issues.apache.org/jira/browse/SPARK-26634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: liupengcheng
>Priority: Major
>
> In our production spark cluster, we encoutered a case that the task of retry 
> stage due to FetchFailure is denied to commit. However, the task is the first 
> attempt of this retry stage.
> After carefully investigating, it was found that the call of canCommit of 
> OutputCommitCoordinator would allow the task of FetchFailure stage(with the 
> same parition number as new task of retry stage) commit. which result in the 
> TaskCommitDenied for all the task (same partition) of retry stage. Becuase of 
> TaskCommitDenied is not countTowardsFailure, thus might cause Application 
> hangs forever.
>  
> {code:java}
> 2019-01-09,08:39:53,676 INFO org.apache.spark.scheduler.TaskSetManager: 
> Starting task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, 
> executor 456, partition 138, PROCESS_LOCAL, 5829 bytes)
> 2019-01-09,08:43:37,514 INFO org.apache.spark.scheduler.TaskSetManager: 
> Finished task 138.0 in stage 5.0 (TID 30634) in 466958 ms on 
> zjy-hadoop-prc-st1212.bj (executor 1632) (674/5000)
> 2019-01-09,08:45:57,372 WARN org.apache.spark.scheduler.TaskSetManager: Lost 
> task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, executor 456): 
> TaskCommitDenied (Driver denied task commit) for job: 5, partition: 138, 
> attemptNumber: 1
> 166483 2019-01-09,08:45:57,373 INFO 
> org.apache.spark.scheduler.OutputCommitCoordinator: Task was denied 
> committing, stage: 5, partition: 138, attempt number: 0, attempt 
> number(counting failed stage): 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
]

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that
classloaders instantiated by the code are not ever being closed. On *nix this
is not a problem since the files are not locked. However, on windows the files
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on
Windows due to the differing treatment of file locks between the Windows file
system and *nix file system.

I would point out that this is a generic java issue which breaks the
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in
search of any ".close()" action. I could not find any, so I believe
[~IgorBabalich] is correct - the issue has to do with classloaders not being
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader
needs to be closed. That is just ignorance on my part. The question is whether
the classloader should be closed when still available as variable at the point
where it has been instantiated, or later during the ShutdownHookManger cleanup.
If the latter, then it was not clear to me how to actually get a list of open
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I
need some help from those who know the codebase to help answer the above
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of
file locks.

This is the *known issue* that was addressed in the Java bug fix through
introduction of a Closeable interface close method for URLClassLoader. It was
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders
in the /tmp folder *while the session is still active*. This is the difference
between Windows and Linux. While bash is running Ubuntu on Windows, it has the
different file locking behaviour which means you can delete the spark temp
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run
a docker container or use the windows linux

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

 
{code:java}
scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
{code}
 

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
>

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:54 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

There the poster writes:

{code:java}
scala.tools.nsc.interpreter.IMain.TranslatingClassLoader{code}
calls a non-thread-safe method
{code:java}
translateSimpleResource{code}
(this method calls {{SymbolTable.enteringPhase}}), which makes it become 
non-thread-safe. However, a ClassLoader must be thread-safe since the class can 
be loaded in arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{code}

{color:#33}That is the same class in the above bug being marked as non 
thread-safe.{color}

{color:#33}See this stack overflow for a discussion of thread safety issues 
in scala:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have

[jira] [Updated] (SPARK-26641) Seperate capacity Configurations of different event queue.

2019-01-16 Thread jiaan.geng (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-26641:
---
Description: 
I always maintance a spark on yarn cluster on line.I always found the error:

`Dropping event from queue eventLog. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.`

Spark event log write to a hdfs cluster on line, and this hdfs cluster support 
write a few TB every day.The performance issuse of write frequently, lead to 
the `EventLoggingListener` exists the bottleneck.

But other event queue appear the problem rarely,so I think seperate the 
configurations of different event queue.

  was:
I always maintance a spark on yarn cluster on line.I always found the error:

`Dropping event from queue eventLog. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.`

Spark event log write to a hdfs cluster on line, and this hdfs cluster support 
write a few TB.The performance issuse of write frequently, lead to the 
`EventLoggingListener` exists the bottleneck.

But other event queue appear the problem rarely,so I think seperate the 
configurations of different event queue.


> Seperate capacity Configurations of different event queue.
> --
>
> Key: SPARK-26641
> URL: https://issues.apache.org/jira/browse/SPARK-26641
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: jiaan.geng
>Priority: Minor
>
> I always maintance a spark on yarn cluster on line.I always found the error:
> `Dropping event from queue eventLog. This likely means one of the listeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.`
> Spark event log write to a hdfs cluster on line, and this hdfs cluster 
> support write a few TB every day.The performance issuse of write frequently, 
> lead to the `EventLoggingListener` exists the bottleneck.
> But other event queue appear the problem rarely,so I think seperate the 
> configurations of different event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:55 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

scala> val parent1 = loader.getParent()
parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49

scala> val parent2 = parent1.getParent()
parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2

scala> val parent3 = parent2.getParent()
parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b

scala> val parent4 = parent3.getParent()
parent4: ClassLoader = null

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

scala> parent1.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
>

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:51 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:50 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:49 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
{color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

 

{color:#33}See this stack overflow for a discussion of thread safety:{color}

{color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color}

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

 

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely {color:#24292e}TranslatingClassLoader) which is 
identified as an open bug in the scala language issues marked "non 
threadsafe".{color}

 

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color}

 

It seemed like he could only produce it with a complicated SQL script.

 

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

 

That fits the perps modus operandi in my book... marked non threadsafe and 
causes a sensitive operating system like Windows to barf.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:47 AM:
-

I am going down the Rabbit Hole of the Scala REPL.
 I think this is the right code branch
 
[https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala]

lines 357 to 352 define TranslatingClassLoader

It appears to be the central mechanism of the scala REPL to parse, compile and 
load any class that is defined in the REPL. There is an open bug on the scala 
issues section marking this class as having been identified as not thread-safe.

https://github.com/scala/bug/issues/10045

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)


was (Author: kingsley):
I am going down the Rabbit Hole of the Scala REPL.
I think this is the right code branch
https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala

lines 569 to 577
  /** This instance is no longer needed, so release any resources
*  it is using.  The reporter's output gets flushed.
*/
  override def close(): Unit = {
reporter.flush()
if (initializeComplete) {
  global.close()
}
  }

perhaps .close() is not closing everything.

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional

75 matches

Mail list logo