[jira] [Updated] (SPARK-26634) OutputCommitCoordinator may allow task of FetchFailureStage commit again

2019-01-16 Thread liupengcheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liupengcheng updated SPARK-26634:
-
Affects Version/s: (was: 2.4.0)

> OutputCommitCoordinator may allow task of FetchFailureStage commit again
> 
>
> Key: SPARK-26634
> URL: https://issues.apache.org/jira/browse/SPARK-26634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: liupengcheng
>Priority: Major
>
> In our production spark cluster, we encoutered a case that the task of retry 
> stage due to FetchFailure is denied to commit. However, the task is the first 
> attempt of this retry stage.
> After carefully investigating, it was found that the call of canCommit of 
> OutputCommitCoordinator would allow the task of FetchFailure stage(with the 
> same parition number as new task of retry stage) commit. which result in the 
> TaskCommitDenied for all the task (same partition) of retry stage. Becuase of 
> TaskCommitDenied is not countTowardsFailure, thus might cause Application 
> hangs forever.
>  
> {code:java}
> 2019-01-09,08:39:53,676 INFO org.apache.spark.scheduler.TaskSetManager: 
> Starting task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, 
> executor 456, partition 138, PROCESS_LOCAL, 5829 bytes)
> 2019-01-09,08:43:37,514 INFO org.apache.spark.scheduler.TaskSetManager: 
> Finished task 138.0 in stage 5.0 (TID 30634) in 466958 ms on 
> zjy-hadoop-prc-st1212.bj (executor 1632) (674/5000)
> 2019-01-09,08:45:57,372 WARN org.apache.spark.scheduler.TaskSetManager: Lost 
> task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, executor 456): 
> TaskCommitDenied (Driver denied task commit) for job: 5, partition: 138, 
> attemptNumber: 1
> 166483 2019-01-09,08:45:57,373 INFO 
> org.apache.spark.scheduler.OutputCommitCoordinator: Task was denied 
> committing, stage: 5, partition: 138, attempt number: 0, attempt 
> number(counting failed stage): 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421530#comment-16421530
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

Same issue under Windows 10 and Windows Server 2016 using Java 1.8, Spark 
2.2.1, Hadoop 2.7

My tests support the contention of [~IgorBabalich] ... it seems that 
classloaders instantiated by the code are not ever being closed. On *nix this 
is not a problem since the files are not locked. However, on windows the files 
are locked.

In addition to the resources mentioned by Igor this Oracle bug fix from Java 7 
seems relevant:

[https://docs.oracle.com/javase/7/docs/technotes/guides/net/ClassLoader.html]

A new method "close()" was introduced to address the problem, which shows up on 
Windows due to the differing treatment of file locks between the Windows file 
system and *nix file system.

I would point out that this is a generic java issue which breaks the 
cross-platform intention of that platform as a whole.

The Oracle blog also contains a post:

[https://blogs.oracle.com/corejavatechtips/closing-a-urlclassloader]

I have been searching the Apache Spark code-base for classloader instances, in 
search of any ".close()" action. I could not find any, so I believe 
[~IgorBabalich] is correct - the issue has to do with classloaders not being 
closed.

I would fix it myself, but thusfar it is not clear to me *when* the classloader 
needs to be closed. That is just ignorance on my part. The question is whether 
the classloader should be closed when still available as variable at the point 
where it has been instantiated, or later during the ShutdownHookManger cleanup. 
If the latter, then it was not clear to me how to actually get a list of open 
class loaders.

That is where I am at so far. I am prepared to put some work into this, but I 
need some help from those who know the codebase to help answer the above 
question - maybe with a well-isolated test.

MY TESTS...

This issue has been around in one form or another for at least four years and 
shows up on many threads.

The standard answer is that it is a "permissions issue" to do with Windows.

That assertion is objectively false.

There is simple test to prove it.

At a windows prompt, start spark-shell

C:\spark\spark-shell   

then get the temp file directory:

scala> sc.getConf.get("spark.repl.class.outputDir")

it will be in %AppData%\Local\Temp tree e.g.

C:\Users\kings\AppData\Local\Temp\spark-d67b262e-f6c8-43d7-8790-731308497f02\repl-4cc87dce-8608-4643-b869-b0287ac4571f

where the last file name has GUID that changes in each iteration.

With the spark session still open, go to the Temp directory and try to delete 
the given directory.

You won't be able to... there is a lock on it.

Now issue

scala> :quit

to quit the session.

The stack trace will show that ShutdownHookManager tried to delete the 
directory above but could not.

If you now try and delete it through the file system you can.

This is because the JVM actually cleans up the locks on exit.

So, it is not a permission issue, but a feature of the Windows treatment of 
file locks.

This is the *known issue* that was addressed in the Java bug fix through 
introduction of a Closeable interface close method for URLClassLoader. It was 
fixed there since many enterprise systems run on Windows.

Now... to further test the cause, I used the Windows Linux Subsytem.

To acces this (post install) you run

C:> bash

from a command prompt.

In order to get this to work, I used the same spark install, but had to install 
a fresh copy of jdk on ubuntu within the Windows bash subsystem. This is 
standard ubuntu stuff, but the path to your windows c drive is /mnt/c

If I rerun the same test, the new output of 

scala> sc.getConf.get("spark.repl.class.outputDir")

will be a different folder location under Linux /tmp but with the same setup.

With the spark session still active it is possible to delete the spark folders 
in the /tmp folder *while the session is still active*. This is the difference 
between Windows and Linux. While bash is running Ubuntu on Windows, it has the 
different file locking behaviour which means you can delete the spark temp 
folders while a session is running.

If you run through a new session with spark-shell at the linux prompt and issue 
:quit it will shutdown without any stacktrace error from ShutdownHookManger.

So, my conclusions are as follows:

1) this is not a permissions issue as per the common assertion

2) it is a Windows specific problem for *known* reasons - namely the difference 
on file-locking as compared with Linux

3) it was considered a *bug* in the Java ecosystem and was fixed as such from 
Java 1.7 with the .close() method

Further...

People who need to run Spark on windows infrastructure (like me) can either run 
a docker container or use the windows linux 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:56 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

 
{code:java}
scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus
{code}
 

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:54 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

There the poster writes:

 
{code:java}
scala.tools.nsc.interpreter.IMain.TranslatingClassLoader{code}
calls a non-thread-safe method
{code:java}
translateSimpleResource{code}
(this method calls {{SymbolTable.enteringPhase}}), which makes it become 
non-thread-safe. However, a ClassLoader must be thread-safe since the class can 
be loaded in arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{code}
 

{color:#33}That is the same class in the above bug being marked as non 
thread-safe.{color}

{color:#33}See this stack overflow for a discussion of thread safety issues 
in scala:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have 

[jira] [Updated] (SPARK-26641) Seperate capacity Configurations of different event queue.

2019-01-16 Thread jiaan.geng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-26641:
---
Description: 
I always maintance a spark on yarn cluster on line.I always found the error:

`Dropping event from queue eventLog. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.`

Spark event log write to a hdfs cluster on line, and this hdfs cluster support 
write a few TB every day.The performance issuse of write frequently, lead to 
the `EventLoggingListener` exists the bottleneck.

But other event queue appear the problem rarely,so I think seperate the 
configurations of different event queue.

  was:
I always maintance a spark on yarn cluster on line.I always found the error:

`Dropping event from queue eventLog. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.`

Spark event log write to a hdfs cluster on line, and this hdfs cluster support 
write a few TB.The performance issuse of write frequently, lead to the 
`EventLoggingListener` exists the bottleneck.

But other event queue appear the problem rarely,so I think seperate the 
configurations of different event queue.


> Seperate capacity Configurations of different event queue.
> --
>
> Key: SPARK-26641
> URL: https://issues.apache.org/jira/browse/SPARK-26641
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: jiaan.geng
>Priority: Minor
>
> I always maintance a spark on yarn cluster on line.I always found the error:
> `Dropping event from queue eventLog. This likely means one of the listeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.`
> Spark event log write to a hdfs cluster on line, and this hdfs cluster 
> support write a few TB every day.The performance issuse of write frequently, 
> lead to the `EventLoggingListener` exists the bottleneck.
> But other event queue appear the problem rarely,so I think seperate the 
> configurations of different event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432207#comment-16432207
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:55 AM:
-

 
{code:java}
scala> val loader = Thread.currentThread.getContextClassLoader()
 loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f
scala> val parent1 = loader.getParent()
 parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49
scala> val parent2 = parent1.getParent()
 parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2
scala> val parent3 = parent2.getParent()
 parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b
scala> val parent4 = parent3.getParent()
 parent4: ClassLoader = null
{code}
 

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

scala> parent1.
 clearAssertionStatus getResource getResources setClassAssertionStatus 
setPackageAssertionStatus
 getParent getResourceAsStream loadClass setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


was (Author: kingsley):
scala> val loader = Thread.currentThread.getContextClassLoader()
loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

scala> val parent1 = loader.getParent()
parent1: ClassLoader = 
scala.reflect.internal.util.ScalaClassLoader$URLClassLoader@66e6af49

scala> val parent2 = parent1.getParent()
parent2: ClassLoader = sun.misc.Launcher$AppClassLoader@5fcfe4b2

scala> val parent3 = parent2.getParent()
parent3: ClassLoader = sun.misc.Launcher$ExtClassLoader@5257226b

scala> val parent4 = parent3.getParent()
parent4: ClassLoader = null

I did experiment with trying to find the open ClassLoaders in the scala session 
(shown above).

 shows exposed methods on the loaders, but there is no close 
method:

scala> loader.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

scala> parent1.
clearAssertionStatus   getResource   getResources   
setClassAssertionStatus setPackageAssertionStatus
getParent  getResourceAsStream   loadClass  
setDefaultAssertionStatus

There is no close method on any of these, so I could not try closing them prior 
to quitting the session.

This was just a simple hack to see if there was any way to use reflection to 
find the open ClassLoaders.

I thought perhaps it might be possible to walk this tree and then close them 
within ShutDownHookManager ???


> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:51 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

`scala.tools.nsc.interpreter.IMain.TranslatingClassLoader` calls a 
non-thread-safe method `{{translateSimpleResource`}} (this method calls 
`{{SymbolTable.enteringPhase`}}), which makes it become non-thread-safe.

"However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread."

In my REPL reflection experiment above the relevant class is:

{color:#33}`scala> val loader = 
Thread.currentThread.getContextClassLoader()`{color}
{color:#33}`loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f`{color}

{color:#33}Ergo ... the same class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:50 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

'scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:49 AM:
-

Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
 {color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

{color:#33}See this stack overflow for a discussion of thread safety:{color}

[https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety]


The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.
 
On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely TranslatingClassLoader) which is identified as an open 
bug in the scala language issues marked "non threadsafe".

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color} It seemed like he could only 
produce it with a complicated SQL script.

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

UPDATE: I cross-posted on 

[https://github.com/scala/bug/issues/10045]

with an explanation of the observations made here and a link back to this issue.


was (Author: kingsley):
Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
{color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

 

{color:#33}See this stack overflow for a discussion of thread safety:{color}

{color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color}

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

 

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely {color:#24292e}TranslatingClassLoader) which is 
identified as an open bug in the scala language issues marked "non 
threadsafe".{color}

 

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color}

 

It seemed like he could only produce it with a complicated SQL script.

 

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

 

That fits the perps modus operandi in my book... marked non threadsafe and 
causes a sensitive operating system like Windows to barf.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive 

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524
 ] 

Kingsley Jones edited comment on SPARK-12216 at 1/17/19 4:47 AM:
-

I am going down the Rabbit Hole of the Scala REPL.
 I think this is the right code branch
 
[https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala]

lines 357 to 352 define TranslatingClassLoader

It appears to be the central mechanism of the scala REPL to parse, compile and 
load any class that is defined in the REPL. There is an open bug on the scala 
issues section marking this class as having been identified as not thread-safe.

https://github.com/scala/bug/issues/10045

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)


was (Author: kingsley):
I am going down the Rabbit Hole of the Scala REPL.
I think this is the right code branch
https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala

lines 569 to 577
  /** This instance is no longer needed, so release any resources
*  it is using.  The reporter's output gets flushed.
*/
  override def close(): Unit = {
reporter.flush()
if (initializeComplete) {
  global.close()
}
  }

perhaps .close() is not closing everything.

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Assigned] (SPARK-26641) Seperate capacity Configurations of different event queue.

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26641:


Assignee: Apache Spark

> Seperate capacity Configurations of different event queue.
> --
>
> Key: SPARK-26641
> URL: https://issues.apache.org/jira/browse/SPARK-26641
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Minor
>
> I always maintance a spark on yarn cluster on line.I always found the error:
> `Dropping event from queue eventLog. This likely means one of the listeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.`
> Spark event log write to a hdfs cluster on line, and this hdfs cluster 
> support write a few TB.The performance issuse of write frequently, lead to 
> the `EventLoggingListener` exists the bottleneck.
> But other event queue appear the problem rarely,so I think seperate the 
> configurations of different event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26641) Seperate capacity Configurations of different event queue.

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26641:


Assignee: (was: Apache Spark)

> Seperate capacity Configurations of different event queue.
> --
>
> Key: SPARK-26641
> URL: https://issues.apache.org/jira/browse/SPARK-26641
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0, 2.4.0, 3.0.0
>Reporter: jiaan.geng
>Priority: Minor
>
> I always maintance a spark on yarn cluster on line.I always found the error:
> `Dropping event from queue eventLog. This likely means one of the listeners 
> is too slow and cannot keep up with the rate at which tasks are being started 
> by the scheduler.`
> Spark event log write to a hdfs cluster on line, and this hdfs cluster 
> support write a few TB.The performance issuse of write frequently, lead to 
> the `EventLoggingListener` exists the bottleneck.
> But other event queue appear the problem rarely,so I think seperate the 
> configurations of different event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26641) Seperate capacity Configurations of different event queue.

2019-01-16 Thread jiaan.geng (JIRA)
jiaan.geng created SPARK-26641:
--

 Summary: Seperate capacity Configurations of different event queue.
 Key: SPARK-26641
 URL: https://issues.apache.org/jira/browse/SPARK-26641
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0, 2.3.0, 3.0.0
Reporter: jiaan.geng


I always maintance a spark on yarn cluster on line.I always found the error:

`Dropping event from queue eventLog. This likely means one of the listeners is 
too slow and cannot keep up with the rate at which tasks are being started by 
the scheduler.`

Spark event log write to a hdfs cluster on line, and this hdfs cluster support 
write a few TB.The performance issuse of write frequently, lead to the 
`EventLoggingListener` exists the bottleneck.

But other event queue appear the problem rarely,so I think seperate the 
configurations of different event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744675#comment-16744675
 ] 

Kingsley Jones commented on SPARK-12216:


Okay, so I think we have a candidate for what is actually causing the problem.

There is an open bug on the scala language site for a class within the scala 
REPL IMain.scala

[https://github.com/scala/bug/issues/10045]

scala.tools.nsc.interpreter.IMain.TranslatingClassLoader calls a 
non-thread-safe method {{translateSimpleResource}} (this method calls 
{{SymbolTable.enteringPhase}}), which makes it become non-thread-safe.

However, a ClassLoader must be thread-safe since the class can be loaded in 
arbitrary thread.

 

In my REPL reflection experiment above the relevant class is:

{color:#33}scala> val loader = 
Thread.currentThread.getContextClassLoader(){color}
{color:#33} loader: ClassLoader = 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f{color}

{color:#33}Ergo ... the very class in the above  bug being marked as non 
threadsafe.{color}

 

{color:#33}See this stack overflow for a discussion of thread safety:{color}

{color:#33}{color:#006000}https://stackoverflow.com/questions/46258558/scala-objects-and-thread-safety{color}{color}

The scala REPL code has some internal classloaders which are used to compile 
and execute any code entered into the REPL.

 

On Windows 10, if you simply start a spark-shell from the command line, do 
nothing, and then :quit the REPL will barf with a stacktrace to this particular 
class reference (namely {color:#24292e}TranslatingClassLoader) which is 
identified as an open bug in the scala language issues marked "non 
threadsafe".{color}

 

{color:#24292e}I am gonna try and contact the person who raised the bug on the 
scala issues thread and get some input.{color}

 

It seemed like he could only produce it with a complicated SQL script.

 

Here we have with Apache Spark a simple, and on my tests 100% reproducible, 
instance of the bug on Windows 10 and in my tests Windows Server 2016.

 

That fits the perps modus operandi in my book... marked non threadsafe and 
causes a sensitive operating system like Windows to barf.

 

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 

[jira] [Assigned] (SPARK-26466) Use ConfigEntry for hardcoded configs for submit categories.

2019-01-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26466:
-

Assignee: Jungtaek Lim

> Use ConfigEntry for hardcoded configs for submit categories.
> 
>
> Key: SPARK-26466
> URL: https://issues.apache.org/jira/browse/SPARK-26466
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Assignee: Jungtaek Lim
>Priority: Major
>
> Make the following hardcoded configs to use {{ConfigEntry}}.
> {code}
> spark.kryo
> spark.kryoserializer
> spark.jars
> spark.submit
> spark.serializer
> spark.deploy
> spark.worker
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26466) Use ConfigEntry for hardcoded configs for submit categories.

2019-01-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26466.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23532
[https://github.com/apache/spark/pull/23532]

> Use ConfigEntry for hardcoded configs for submit categories.
> 
>
> Key: SPARK-26466
> URL: https://issues.apache.org/jira/browse/SPARK-26466
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Make the following hardcoded configs to use {{ConfigEntry}}.
> {code}
> spark.kryo
> spark.kryoserializer
> spark.jars
> spark.submit
> spark.serializer
> spark.deploy
> spark.worker
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26600) Update spark-submit usage message

2019-01-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26600:
-

Assignee: Luca Canali

> Update spark-submit usage message
> -
>
> Key: SPARK-26600
> URL: https://issues.apache.org/jira/browse/SPARK-26600
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 3.0.0
>
>
> Spark-submit usage message should be put in sync with recent changes in 
> particular regarding K8S support. These are the proposed changes to the usage 
> message:
> --executor-cores NUM -> can be useed for Spark on YARN and K8S 
> --principal PRINCIPAL  and --keytab KEYTAB -> can be used for Spark on YARN 
> and K8S
> --total-executor-cores NUM -> can be used for Spark standalone, YARN and K8S 
> In addition this PR proposes to remove certain implementation details from 
> the --keytab argument description as the implementation details vary between 
> YARN and K8S, for example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26600) Update spark-submit usage message

2019-01-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26600.
---
Resolution: Fixed

Issue resolved by pull request 23518
[https://github.com/apache/spark/pull/23518]

> Update spark-submit usage message
> -
>
> Key: SPARK-26600
> URL: https://issues.apache.org/jira/browse/SPARK-26600
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
> Fix For: 3.0.0
>
>
> Spark-submit usage message should be put in sync with recent changes in 
> particular regarding K8S support. These are the proposed changes to the usage 
> message:
> --executor-cores NUM -> can be useed for Spark on YARN and K8S 
> --principal PRINCIPAL  and --keytab KEYTAB -> can be used for Spark on YARN 
> and K8S
> --total-executor-cores NUM -> can be used for Spark standalone, YARN and K8S 
> In addition this PR proposes to remove certain implementation details from 
> the --keytab argument description as the implementation details vary between 
> YARN and K8S, for example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Ke Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744620#comment-16744620
 ] 

Ke Jia commented on SPARK-26639:


[~hyukjin.kwon] Thanks for your interesting. 

As discussion in [https://github.com/apache/spark/pull/14548]  When I run Q23b 
of TPC-DS, I found the visualized plan do show the subquery is executed once as 
following.

!https://user-images.githubusercontent.com/11972570/51232955-813af880-19a3-11e9-9d1c-96bb9de0c130.png!

But the stage of same subquery execute maybe not once as following:

!https://user-images.githubusercontent.com/11972570/51233118-fb6b7d00-19a3-11e9-9b48-9cebfb74ebd1.png!
 So I guess the reuse subquery does not work in fact. Maybe I miss some 
knowledge.  Thanks.

> The reuse subquery function maybe does not work in SPARK SQL
> 
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
>
> The subquery reuse feature has done in 
> [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26640) Code cleanup from lgtm.com analysis

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26640:


Assignee: Apache Spark  (was: Sean Owen)

> Code cleanup from lgtm.com analysis
> ---
>
> Key: SPARK-26640
> URL: https://issues.apache.org/jira/browse/SPARK-26640
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, Spark Core, SQL, Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Minor
>
> https://github.com/apache/spark/pull/23567 brought to my attention that 
> lgtm.com has a recent analysis of Spark that turned up at least one bug: 
> https://issues.apache.org/jira/browse/SPARK-26638
> See 
> https://lgtm.com/projects/g/apache/spark/snapshot/0655f1624ff7b73e5c8937ab9e83453a5a3a4466/files/dev/create-release/releaseutils.py?sort=name=ASC=heatmap#x1434915b6576fb40:1
> Most of these are valid, small suggestions to clean up the code. I'm going to 
> make a PR to implement the obvious ones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26640) Code cleanup from lgtm.com analysis

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26640:


Assignee: Sean Owen  (was: Apache Spark)

> Code cleanup from lgtm.com analysis
> ---
>
> Key: SPARK-26640
> URL: https://issues.apache.org/jira/browse/SPARK-26640
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, Spark Core, SQL, Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Minor
>
> https://github.com/apache/spark/pull/23567 brought to my attention that 
> lgtm.com has a recent analysis of Spark that turned up at least one bug: 
> https://issues.apache.org/jira/browse/SPARK-26638
> See 
> https://lgtm.com/projects/g/apache/spark/snapshot/0655f1624ff7b73e5c8937ab9e83453a5a3a4466/files/dev/create-release/releaseutils.py?sort=name=ASC=heatmap#x1434915b6576fb40:1
> Most of these are valid, small suggestions to clean up the code. I'm going to 
> make a PR to implement the obvious ones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26636) How to know that a partition is ready when using Structured Streaming

2019-01-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26636.
--
Resolution: Invalid

> How to know that a partition is ready when using Structured Streaming 
> --
>
> Key: SPARK-26636
> URL: https://issues.apache.org/jira/browse/SPARK-26636
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Guo Wei
>Priority: Minor
>
> When using structured streaming, we use "partitionBy" api  to partition the 
> output data, and use the watermark based on event-time to handle delay 
> records, but how to tell downstream users  that a partition is ready? For 
> example, when to write an empty "hadoop.done" file in a paritition directory?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26640) Code cleanup from lgtm.com analysis

2019-01-16 Thread Sean Owen (JIRA)
Sean Owen created SPARK-26640:
-

 Summary: Code cleanup from lgtm.com analysis
 Key: SPARK-26640
 URL: https://issues.apache.org/jira/browse/SPARK-26640
 Project: Spark
  Issue Type: Improvement
  Components: ML, Spark Core, SQL, Structured Streaming, Tests
Affects Versions: 2.4.0
Reporter: Sean Owen
Assignee: Sean Owen


https://github.com/apache/spark/pull/23567 brought to my attention that 
lgtm.com has a recent analysis of Spark that turned up at least one bug: 
https://issues.apache.org/jira/browse/SPARK-26638

See 
https://lgtm.com/projects/g/apache/spark/snapshot/0655f1624ff7b73e5c8937ab9e83453a5a3a4466/files/dev/create-release/releaseutils.py?sort=name=ASC=heatmap#x1434915b6576fb40:1

Most of these are valid, small suggestions to clean up the code. I'm going to 
make a PR to implement the obvious ones.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26636) How to know that a partition is ready when using Structured Streaming

2019-01-16 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744614#comment-16744614
 ] 

Hyukjin Kwon commented on SPARK-26636:
--

For questions. should better go to mailing list. Let's interact with it first 
before filing an issue. I think you could have a better answer there.

> How to know that a partition is ready when using Structured Streaming 
> --
>
> Key: SPARK-26636
> URL: https://issues.apache.org/jira/browse/SPARK-26636
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Guo Wei
>Priority: Minor
>
> When using structured streaming, we use "partitionBy" api  to partition the 
> output data, and use the watermark based on event-time to handle delay 
> records, but how to tell downstream users  that a partition is ready? For 
> example, when to write an empty "hadoop.done" file in a paritition directory?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744613#comment-16744613
 ] 

Hyukjin Kwon commented on SPARK-26639:
--

What's the input/output, and expected input/output? Can you describe a 
reproducer please? What's an issue here?

> The reuse subquery function maybe does not work in SPARK SQL
> 
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
>
> The subquery reuse feature has done in 
> [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Ke Jia (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744606#comment-16744606
 ] 

Ke Jia commented on SPARK-26639:


[@davies|https://github.com/davies] [@hvanhovell|https://github.com/hvanhovell] 
[@gatorsmile|https://github.com/gatorsmile] can you help verify this issue? 
Thanks for your help!

> The reuse subquery function maybe does not work in SPARK SQL
> 
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
>
> The subquery reuse feature has done in 
> [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-26639:
---
Description: 
The subquery reuse feature has done in 
[https://github.com/apache/spark/pull/14548]

In my test, I found the visualized plan do show the subquery is executed once. 
But the stage of same subquery execute maybe not once.

 

  was:
The subquery reuse feature has done in 
[PR#14548|[https://github.com/apache/spark/pull/14548]]

In my test, I found the visualized plan do show the subquery is executed once. 
But the stage of same subquery execute maybe not once.

 


> The reuse subquery function maybe does not work in SPARK SQL
> 
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
>
> The subquery reuse feature has done in 
> [https://github.com/apache/spark/pull/14548]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-26639:
---
Description: 
The subquery reuse feature has done in 
[PR#14548|[https://github.com/apache/spark/pull/14548]]

In my test, I found the visualized plan do show the subquery is executed once. 
But the stage of same subquery execute maybe not once.

 

  was:
The subquery reuse feature has done in 
[14548|[https://github.com/apache/spark/pull/14548]|https://github.com/apache/spark/pull/14548].]

In my test, I found the visualized plan do show the subquery is executed once. 
But the stage of same subquery execute maybe not once.

 


> The reuse subquery function maybe does not work in SPARK SQL
> 
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
>
> The subquery reuse feature has done in 
> [PR#14548|[https://github.com/apache/spark/pull/14548]]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Ke Jia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ke Jia updated SPARK-26639:
---
Description: 
The subquery reuse feature has done in 
[14548|[https://github.com/apache/spark/pull/14548]|https://github.com/apache/spark/pull/14548].]

In my test, I found the visualized plan do show the subquery is executed once. 
But the stage of same subquery execute maybe not once.

 

> The reuse subquery function maybe does not work in SPARK SQL
> 
>
> Key: SPARK-26639
> URL: https://issues.apache.org/jira/browse/SPARK-26639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Ke Jia
>Priority: Major
>
> The subquery reuse feature has done in 
> [14548|[https://github.com/apache/spark/pull/14548]|https://github.com/apache/spark/pull/14548].]
> In my test, I found the visualized plan do show the subquery is executed 
> once. But the stage of same subquery execute maybe not once.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26639) The reuse subquery function maybe does not work in SPARK SQL

2019-01-16 Thread Ke Jia (JIRA)
Ke Jia created SPARK-26639:
--

 Summary: The reuse subquery function maybe does not work in SPARK 
SQL
 Key: SPARK-26639
 URL: https://issues.apache.org/jira/browse/SPARK-26639
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 2.4.0, 2.3.2
Reporter: Ke Jia






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26633) Add ExecutorClassLoader.getResourceAsStream

2019-01-16 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-26633.
-
   Resolution: Fixed
 Assignee: Kris Mok
Fix Version/s: 3.0.0
   2.4.1

> Add ExecutorClassLoader.getResourceAsStream
> ---
>
> Key: SPARK-26633
> URL: https://issues.apache.org/jira/browse/SPARK-26633
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 3.0.0
>Reporter: Kris Mok
>Assignee: Kris Mok
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> {{ExecutorClassLoader}} is capable of loading dynamically generated classes 
> from the REPL via either RPC or HDFS, but right now it always delegates 
> resource loading to the parent class loader. That makes the dynamically 
> generated classes unavailable to uses other than class loading.
> Such need may arise, for example, when json4s wants to parse the Class file 
> to extract parameter name information. Internally it'd call the class 
> loader's {{getResourceAsStream}} to obtain the Class file content as an 
> {{InputStream}}.
> This ticket tracks an improvement to the {{ExecutorClassLoader}} to allow 
> fetching dynamically generated Class files from the REPL as resource streams.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26638) Pyspark vector classes always return error for unary negation

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26638:


Assignee: Apache Spark  (was: Sean Owen)

> Pyspark vector classes always return error for unary negation
> -
>
> Key: SPARK-26638
> URL: https://issues.apache.org/jira/browse/SPARK-26638
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sean Owen
>Assignee: Apache Spark
>Priority: Major
>
> It looks like the implementation of {{__neg__}} for Pyspark vector classes is 
> wrong:
> {code}
> def _delegate(op):
> def func(self, other):
> if isinstance(other, DenseVector):
> other = other.array
> return DenseVector(getattr(self.array, op)(other))
> return func
> __neg__ = _delegate("__neg__")
> {code}
> This delegation works for binary operators but not for unary, and indeed, it 
> doesn't work at all:
> {code}
> from pyspark.ml.linalg import DenseVector
> v = DenseVector([1,2,3])
> -v
> ...
> TypeError: func() missing 1 required positional argument: 'other'
> {code}
> This was spotted by static analyis on lgtm.com:
> https://lgtm.com/projects/g/apache/spark/alerts/?mode=tree=python=7850093
> Easy to fix and add a test for, as I presume we want this to be implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26638) Pyspark vector classes always return error for unary negation

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26638:


Assignee: Sean Owen  (was: Apache Spark)

> Pyspark vector classes always return error for unary negation
> -
>
> Key: SPARK-26638
> URL: https://issues.apache.org/jira/browse/SPARK-26638
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
>
> It looks like the implementation of {{__neg__}} for Pyspark vector classes is 
> wrong:
> {code}
> def _delegate(op):
> def func(self, other):
> if isinstance(other, DenseVector):
> other = other.array
> return DenseVector(getattr(self.array, op)(other))
> return func
> __neg__ = _delegate("__neg__")
> {code}
> This delegation works for binary operators but not for unary, and indeed, it 
> doesn't work at all:
> {code}
> from pyspark.ml.linalg import DenseVector
> v = DenseVector([1,2,3])
> -v
> ...
> TypeError: func() missing 1 required positional argument: 'other'
> {code}
> This was spotted by static analyis on lgtm.com:
> https://lgtm.com/projects/g/apache/spark/alerts/?mode=tree=python=7850093
> Easy to fix and add a test for, as I presume we want this to be implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26638) Pyspark vector classes always return error for unary negation

2019-01-16 Thread Sean Owen (JIRA)
Sean Owen created SPARK-26638:
-

 Summary: Pyspark vector classes always return error for unary 
negation
 Key: SPARK-26638
 URL: https://issues.apache.org/jira/browse/SPARK-26638
 Project: Spark
  Issue Type: Bug
  Components: ML, PySpark
Affects Versions: 2.4.0, 2.3.2
Reporter: Sean Owen
Assignee: Sean Owen


It looks like the implementation of {{__neg__}} for Pyspark vector classes is 
wrong:

{code}
def _delegate(op):
def func(self, other):
if isinstance(other, DenseVector):
other = other.array
return DenseVector(getattr(self.array, op)(other))
return func

__neg__ = _delegate("__neg__")
{code}

This delegation works for binary operators but not for unary, and indeed, it 
doesn't work at all:

{code}
from pyspark.ml.linalg import DenseVector
v = DenseVector([1,2,3])
-v
...
TypeError: func() missing 1 required positional argument: 'other'
{code}

This was spotted by static analyis on lgtm.com:
https://lgtm.com/projects/g/apache/spark/alerts/?mode=tree=python=7850093

Easy to fix and add a test for, as I presume we want this to be implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744524#comment-16744524
 ] 

Kingsley Jones commented on SPARK-12216:


I am going down the Rabbit Hole of the Scala REPL.
I think this is the right code branch
https://github.com/scala/scala/blob/0c335456f295459efa22d91a7b7d49bb9b5f3c15/src/repl/scala/tools/nsc/interpreter/IMain.scala

lines 569 to 577
  /** This instance is no longer needed, so release any resources
*  it is using.  The reporter's output gets flushed.
*/
  override def close(): Unit = {
reporter.flush()
if (initializeComplete) {
  global.close()
}
  }

perhaps .close() is not closing everything.

The scala stuff has a different idiom to Java so maybe the closing of 
classloaders is less refined in experience (meaning it is just less clear what 
is the right way to catch 'em all)

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
> org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26566) Upgrade apache/arrow to 0.12.0

2019-01-16 Thread Bryan Cutler (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-26566:
-
Description: 
_This is just a placeholder for now to collect what needs to be fixed when we 
upgrade next time_

Version 0.12.0 includes the following:
 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
 * conversion to date object no longer needed, ARROW-3910

 

  was:
_This is just a placeholder for now to collect what needs to be fixed when we 
upgrade next time_

Version 0.12.0 includes the following:
 * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098

 


> Upgrade apache/arrow to 0.12.0
> --
>
> Key: SPARK-26566
> URL: https://issues.apache.org/jira/browse/SPARK-26566
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Bryan Cutler
>Priority: Major
>
> _This is just a placeholder for now to collect what needs to be fixed when we 
> upgrade next time_
> Version 0.12.0 includes the following:
>  * pyarrow open_stream deprecated, use ipc.open_stream, ARROW-4098
>  * conversion to date object no longer needed, ARROW-3910
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26625) spark.redaction.regex should include oauthToken

2019-01-16 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-26625.

   Resolution: Fixed
Fix Version/s: 3.0.0

> spark.redaction.regex should include oauthToken
> ---
>
> Key: SPARK-26625
> URL: https://issues.apache.org/jira/browse/SPARK-26625
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Vinoo Ganesh
>Priority: Critical
> Fix For: 3.0.0
>
>
> The regex (spark.redaction.regex) that is used to decide which config 
> properties or environment settings are sensitive should also include 
> oauthToken to match  spark.kubernetes.authenticate.submission.oauthToken



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26619) Prune the unused serializers from `SerializeFromObject`

2019-01-16 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai reassigned SPARK-26619:
---

Assignee: Liang-Chi Hsieh

> Prune the unused serializers from `SerializeFromObject`
> ---
>
> Key: SPARK-26619
> URL: https://issues.apache.org/jira/browse/SPARK-26619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26619) Prune the unused serializers from `SerializeFromObject`

2019-01-16 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai resolved SPARK-26619.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23562
[https://github.com/apache/spark/pull/23562]

> Prune the unused serializers from `SerializeFromObject`
> ---
>
> Key: SPARK-26619
> URL: https://issues.apache.org/jira/browse/SPARK-26619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26550) New datasource for benchmarking

2019-01-16 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26550.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> New datasource for benchmarking
> ---
>
> Key: SPARK-26550
> URL: https://issues.apache.org/jira/browse/SPARK-26550
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Purpose of new datasource is materialisation of dataset without additional 
> overhead associated with actions and converting row's values to other types. 
> This can be used in benchmarking as well as in cases when need to materialise 
> a dataset for side effects like in caching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-26629:
-
Fix Version/s: (was: 2.3.4)

> Error with multiple file stream in a query + restart on a batch that has no 
> data for one file stream
> 
>
> Key: SPARK-26629
> URL: https://issues.apache.org/jira/browse/SPARK-26629
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> When a streaming query has multiple file streams, and there is a batch where 
> one of the file streams dont have data in that batch, then if the query has 
> to restart from that, it will throw the following error.
> {code}
> java.lang.IllegalStateException: batch 1 doesn't exist
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205)
> {code}
> **Reason**
> Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds 
> list was empty. In the context of {{FileStreamSource.getBatch}} (where verify 
> is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is 
> usually okay because, in a streaming query with one file stream, the batchIds 
> can never be empty:
> A batch is planned only when the FileStreamSourceLog has seen new offset 
> (that is, there are new data files).
> So FileStreamSource.getBatch will be called on X to Y where X will always be 
> > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with 
> X+1-Y ids.
> For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds 
> = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true 
> when there are two file stream sources, as a batch may be planned even when 
> only one of the file streams has data. So one of the file stream may not have 
> data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = 
> Seq.empty, start = X+1, end = X) -> failure}}.
> Note that FileStreamSource.getBatch(X, X) gets called only when restarting a 
> query in a batch 

[jira] [Updated] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-26629:
-
Fix Version/s: 3.0.0
   2.4.1
   2.3.4

> Error with multiple file stream in a query + restart on a batch that has no 
> data for one file stream
> 
>
> Key: SPARK-26629
> URL: https://issues.apache.org/jira/browse/SPARK-26629
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0, 2.4.1
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.4.1, 3.0.0, 2.3.4
>
>
> When a streaming query has multiple file streams, and there is a batch where 
> one of the file streams dont have data in that batch, then if the query has 
> to restart from that, it will throw the following error.
> {code}
> java.lang.IllegalStateException: batch 1 doesn't exist
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205)
> {code}
> **Reason**
> Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds 
> list was empty. In the context of {{FileStreamSource.getBatch}} (where verify 
> is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is 
> usually okay because, in a streaming query with one file stream, the batchIds 
> can never be empty:
> A batch is planned only when the FileStreamSourceLog has seen new offset 
> (that is, there are new data files).
> So FileStreamSource.getBatch will be called on X to Y where X will always be 
> > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with 
> X+1-Y ids.
> For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds 
> = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true 
> when there are two file stream sources, as a batch may be planned even when 
> only one of the file streams has data. So one of the file stream may not have 
> data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = 
> Seq.empty, start = X+1, end = X) -> failure}}.
> Note that FileStreamSource.getBatch(X, X) gets called only 

[jira] [Resolved] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-26629.
--
Resolution: Fixed

> Error with multiple file stream in a query + restart on a batch that has no 
> data for one file stream
> 
>
> Key: SPARK-26629
> URL: https://issues.apache.org/jira/browse/SPARK-26629
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.4.1, 3.0.0, 2.3.4
>
>
> When a streaming query has multiple file streams, and there is a batch where 
> one of the file streams dont have data in that batch, then if the query has 
> to restart from that, it will throw the following error.
> {code}
> java.lang.IllegalStateException: batch 1 doesn't exist
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205)
> {code}
> **Reason**
> Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds 
> list was empty. In the context of {{FileStreamSource.getBatch}} (where verify 
> is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is 
> usually okay because, in a streaming query with one file stream, the batchIds 
> can never be empty:
> A batch is planned only when the FileStreamSourceLog has seen new offset 
> (that is, there are new data files).
> So FileStreamSource.getBatch will be called on X to Y where X will always be 
> > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with 
> X+1-Y ids.
> For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds 
> = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true 
> when there are two file stream sources, as a batch may be planned even when 
> only one of the file streams has data. So one of the file stream may not have 
> data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = 
> Seq.empty, start = X+1, end = X) -> failure}}.
> Note that FileStreamSource.getBatch(X, X) gets called only when restarting a 
> query in a batch where 

[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-16 Thread Bryan Cutler (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744288#comment-16744288
 ] 

Bryan Cutler commented on SPARK-26591:
--

[~elch10] please go ahead and make a Jira for Arrow regarding the pyarrow 
import error. Also, include all relevant details about your system.

> Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain 
> environment
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Major
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26629) Error with multiple file stream in a query + restart on a batch that has no data for one file stream

2019-01-16 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-26629:
-
Affects Version/s: 2.3.3

> Error with multiple file stream in a query + restart on a batch that has no 
> data for one file stream
> 
>
> Key: SPARK-26629
> URL: https://issues.apache.org/jira/browse/SPARK-26629
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.4.0, 2.4.1
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>Priority: Major
> Fix For: 2.4.1, 3.0.0, 2.3.4
>
>
> When a streaming query has multiple file streams, and there is a batch where 
> one of the file streams dont have data in that batch, then if the query has 
> to restart from that, it will throw the following error.
> {code}
> java.lang.IllegalStateException: batch 1 doesn't exist
>   at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog$.verifyBatchIds(HDFSMetadataLog.scala:300)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSourceLog.get(FileStreamSourceLog.scala:120)
>   at 
> org.apache.spark.sql.execution.streaming.FileStreamSource.getBatch(FileStreamSource.scala:181)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:294)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets$2.apply(MicroBatchExecution.scala:291)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.sql.execution.streaming.StreamProgress.foreach(StreamProgress.scala:25)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$populateStartOffsets(MicroBatchExecution.scala:291)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:178)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:251)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:61)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:175)
>   at 
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>   at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:169)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:205)
> {code}
> **Reason**
> Existing {{HDFSMetadata.verifyBatchIds}} throws error whenever the batchIds 
> list was empty. In the context of {{FileStreamSource.getBatch}} (where verify 
> is called) and FileStreamSourceLog (subclass of HDFSMetadata), this is 
> usually okay because, in a streaming query with one file stream, the batchIds 
> can never be empty:
> A batch is planned only when the FileStreamSourceLog has seen new offset 
> (that is, there are new data files).
> So FileStreamSource.getBatch will be called on X to Y where X will always be 
> > Y. This calls internally {{HDFSMetadata.verifyBatchIds (X+1, Y)}} with 
> X+1-Y ids.
> For example, {{FileStreamSource.getBatch(4, 5)}} will call {{verify(batchIds 
> = Seq(5), start = 5, end = 5)}}. However, the invariant of X > Y is not true 
> when there are two file stream sources, as a batch may be planned even when 
> only one of the file streams has data. So one of the file stream may not have 
> data, which can call {{FileStreamSource.getBatch(X, X) -> verify(batchIds = 
> Seq.empty, start = X+1, end = X) -> failure}}.
> Note that FileStreamSource.getBatch(X, X) gets called only when restarting a 
> query in a batch 

[jira] [Assigned] (SPARK-25713) Implement copy() for ColumnarArray

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25713:


Assignee: Apache Spark

> Implement copy() for ColumnarArray
> --
>
> Key: SPARK-25713
> URL: https://issues.apache.org/jira/browse/SPARK-25713
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Liwen Sun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25713) Implement copy() for ColumnarArray

2019-01-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744259#comment-16744259
 ] 

Apache Spark commented on SPARK-25713:
--

User 'ayudovin' has created a pull request for this issue:
https://github.com/apache/spark/pull/23569

> Implement copy() for ColumnarArray
> --
>
> Key: SPARK-25713
> URL: https://issues.apache.org/jira/browse/SPARK-25713
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Liwen Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25713) Implement copy() for ColumnarArray

2019-01-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744257#comment-16744257
 ] 

Apache Spark commented on SPARK-25713:
--

User 'ayudovin' has created a pull request for this issue:
https://github.com/apache/spark/pull/23569

> Implement copy() for ColumnarArray
> --
>
> Key: SPARK-25713
> URL: https://issues.apache.org/jira/browse/SPARK-25713
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Liwen Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25713) Implement copy() for ColumnarArray

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25713:


Assignee: (was: Apache Spark)

> Implement copy() for ColumnarArray
> --
>
> Key: SPARK-25713
> URL: https://issues.apache.org/jira/browse/SPARK-25713
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Liwen Sun
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25992) Accumulators giving KeyError in pyspark

2019-01-16 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25992.
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   2.4.1

Fixed at https://github.com/apache/spark/pull/23564

I documented the limitation.

> Accumulators giving KeyError in pyspark
> ---
>
> Key: SPARK-25992
> URL: https://issues.apache.org/jira/browse/SPARK-25992
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Abdeali Kothari
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> I am using accumulators and when I run my code, I sometimes get some warn 
> messages. When I checked, there was nothing accumulated - not sure if I lost 
> info from the accumulator or it worked and I can ignore this error ?
> The message:
> {noformat}
> Exception happened during processing of request from
> ('127.0.0.1', 62099)
> Traceback (most recent call last):
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 317, in 
> _handle_request_noblock
> self.process_request(request, client_address)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 348, in 
> process_request
> self.finish_request(request, client_address)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 361, in 
> finish_request
> self.RequestHandlerClass(request, client_address, self)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 696, in 
> __init__
> self.handle()
> File "/usr/local/hadoop/spark2.3.1/python/pyspark/accumulators.py", line 238, 
> in handle
> _accumulatorRegistry[aid] += update
> KeyError: 0
> 
> 2018-11-09 19:09:08 ERROR DAGScheduler:91 - Failed to update accumulators for 
> task 0
> org.apache.spark.SparkException: EOF reached before Python server acknowledged
>   at 
> org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:634)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1131)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1123)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1123)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1206)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26615) Fixing transport server/client resource leaks in the core unittests

2019-01-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26615.
---
   Resolution: Fixed
Fix Version/s: 2.4.1
   3.0.0

Issue resolved by pull request 23540
[https://github.com/apache/spark/pull/23540]

> Fixing transport server/client resource leaks in the core unittests 
> 
>
> Key: SPARK-26615
> URL: https://issues.apache.org/jira/browse/SPARK-26615
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 3.0.0, 2.4.1
>
>
> The testing of SPARK-24938 PR ([https://github.com/apache/spark/pull/22114)] 
> always fails with OOM. Analysing this problem lead to identifying some 
> resource leaks where TransportClient/TransportServer instances are nor closed 
> properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26615) Fixing transport server/client resource leaks in the core unittests

2019-01-16 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26615:
-

Assignee: Attila Zsolt Piros

> Fixing transport server/client resource leaks in the core unittests 
> 
>
> Key: SPARK-26615
> URL: https://issues.apache.org/jira/browse/SPARK-26615
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Attila Zsolt Piros
>Assignee: Attila Zsolt Piros
>Priority: Major
>
> The testing of SPARK-24938 PR ([https://github.com/apache/spark/pull/22114)] 
> always fails with OOM. Analysing this problem lead to identifying some 
> resource leaks where TransportClient/TransportServer instances are nor closed 
> properly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26429) Add jdbc sink support for Structured Streaming

2019-01-16 Thread Wang Yanlin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wang Yanlin resolved SPARK-26429.
-
Resolution: Duplicate

> Add jdbc sink support for Structured Streaming
> --
>
> Key: SPARK-26429
> URL: https://issues.apache.org/jira/browse/SPARK-26429
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Wang Yanlin
>Priority: Major
>
> Currently, spark sql support read and write to jdbc in batch mode, but do not 
> support for Structured Streaming. During work, even thought we can write to 
> jdbc using foreach sink, but providing a more easier way for writing to jdbc 
> would be helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26636) How to know that a partition is ready when using Structured Streaming

2019-01-16 Thread Guo Wei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Wei updated SPARK-26636:

Description: When using structured streaming, we use "partitionBy" api  to 
partition the output data, and use the watermark based on event-time to handle 
delay records, but how to tell downstream users  that a partition is ready? For 
example, when to write an empty "hadoop.done" file in a paritition directory?  
(was: When using structured streaming, we use "partitionBy" api  to partition 
the output data, and use the watermark based on event-time to handle delay 
records, but how to tell downstream users  that a partition data is ready? For 
example, when to write an empty "hadoop.done" file in a paritition directory?)

> How to know that a partition is ready when using Structured Streaming 
> --
>
> Key: SPARK-26636
> URL: https://issues.apache.org/jira/browse/SPARK-26636
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Guo Wei
>Priority: Minor
>
> When using structured streaming, we use "partitionBy" api  to partition the 
> output data, and use the watermark based on event-time to handle delay 
> records, but how to tell downstream users  that a partition is ready? For 
> example, when to write an empty "hadoop.done" file in a paritition directory?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26637) Makes GetArrayItem nullability more precise

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26637:


Assignee: (was: Apache Spark)

> Makes GetArrayItem nullability more precise
> ---
>
> Key: SPARK-26637
> URL: https://issues.apache.org/jira/browse/SPARK-26637
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> In master, GetArrayItem nullable is always true;
> https://github.com/apache/spark/blob/cf133e611020ed178f90358464a1b88cdd9b7889/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L236
> But, If input array size is constant and ordinal is foldable, we could make 
> GetArrayItem nullability more precise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26636) How to know that a partition is ready when using Structured Streaming

2019-01-16 Thread Guo Wei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guo Wei updated SPARK-26636:

Summary: How to know that a partition is ready when using Structured 
Streaming   (was: How to know a partition is ready when using Structured 
Streaming )

> How to know that a partition is ready when using Structured Streaming 
> --
>
> Key: SPARK-26636
> URL: https://issues.apache.org/jira/browse/SPARK-26636
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Guo Wei
>Priority: Minor
>
> When using structured streaming, we use "partitionBy" api  to partition the 
> output data, and use the watermark based on event-time to handle delay 
> records, but how to tell downstream users  that a partition data is ready? 
> For example, when to write an empty "hadoop.done" file in a paritition 
> directory?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26637) Makes GetArrayItem nullability more precise

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26637:


Assignee: Apache Spark

> Makes GetArrayItem nullability more precise
> ---
>
> Key: SPARK-26637
> URL: https://issues.apache.org/jira/browse/SPARK-26637
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Minor
>
> In master, GetArrayItem nullable is always true;
> https://github.com/apache/spark/blob/cf133e611020ed178f90358464a1b88cdd9b7889/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L236
> But, If input array size is constant and ordinal is foldable, we could make 
> GetArrayItem nullability more precise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26637) Makes GetArrayItem nullability more precise

2019-01-16 Thread Takeshi Yamamuro (JIRA)
Takeshi Yamamuro created SPARK-26637:


 Summary: Makes GetArrayItem nullability more precise
 Key: SPARK-26637
 URL: https://issues.apache.org/jira/browse/SPARK-26637
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Takeshi Yamamuro


In master, GetArrayItem nullable is always true;
https://github.com/apache/spark/blob/cf133e611020ed178f90358464a1b88cdd9b7889/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala#L236

But, If input array size is constant and ordinal is foldable, we could make 
GetArrayItem nullability more precise.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26636) How to know a partition is ready when using Structured Streaming

2019-01-16 Thread Guo Wei (JIRA)
Guo Wei created SPARK-26636:
---

 Summary: How to know a partition is ready when using Structured 
Streaming 
 Key: SPARK-26636
 URL: https://issues.apache.org/jira/browse/SPARK-26636
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 2.3.2
Reporter: Guo Wei


When using structured streaming, we use "partitionBy" api  to partition the 
output data, and use the watermark based on event-time to handle delay records, 
but how to tell downstream users  that a partition data is ready? For example, 
when to write an empty "hadoop.done" file in a paritition directory?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2019-01-16 Thread Kingsley Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743838#comment-16743838
 ] 

Kingsley Jones commented on SPARK-12216:


I had another go at investigating this as it continues to greatly frustrate my 
deployments.

Firstly, I had no difficulty following the build instructions for Spark from 
source on Windows. The only "trick" needed was to use git bash shell on windows 
to manage the launch of the build process. There is nothing complicated about 
that, and so I would encourage others to try.

Secondly, the build on Windows 10 worked first time which was also my first 
time using maven. So there are no real reasons why development work cannot be 
done on the Windows platform to investigate this issue (that is what I am doing 
now).

Thirdly, I re-investigated my comment from April 2018 above... 

The classLoader which is at the child level is:

scala.tools.nsc.interpreter.IMain$TranslatingClassLoader@3a1a20f

On searching the Spark source code, and online, I found that this class loader 
is actually from the Scala REPL.

It is not actually part of the Spark source tree.

When looking at the cruft showing up in the Windows temp directory the classes 
that pop up seem associated with the REPL.

This makes sense, since the REPL will barf with the errors indicated above if 
you do nothing more than launch a spark-shell and then close it straight away.

The conclusions I reached:
1) certainly it is possible to hack this on a Windows 10 machine (I am trying 
the incremental builds via SBT toolchain)

2) the bug does seem to be related to classloader clean-up but the fault (at 
least for the REPL) may NOT be Spark source code but the Scala REPL (or maybe 
an interaction between how Spark loads the relevant REPL code ???)

3) watching files go in and out of the Windows temp area seems reproducible 
with high reliability (as commenters above maintain)


Remaining questions on my mind...


Since this issue pops up for pretty much anybody who simply tries Spark on 
Windows, but the build from source showed NO problems with building, I think 
the runtime issue is probably to do with how classes are loaded and the file 
system difference between Windows and Linux on file locks.


The question is how to isolate which part of the codebase is actually producing 
the classes that are not cleanable. Is it the main Spark source code, or is it 
the Scala REPL. Since I got immediately discouraged on using Spark in any real 
evaluation (100% reliable barf-outs on the very first example are discouraging) 
I never actually did cluster test jobs to see if the thing worked outside the 
REPL. It is quite possible the problem is just the REPL.


I would suggest it would help to get at least some dialogue on this.


It does not seem very desirable to shut out 50% of global developers on a 
project which is manifestly *easy* to build on Windows 10 (using maven + git) 
but where the first encounter of any experienced Windows developer is an 
immediate sensation that this is an ultra-flakey codebase.


Quite possibly, it is just the REPL and it is actually a Scala code-base issue.


Folks like me will invest time and energy investigating such bugs but only if 
we think there is a will to fix issues that are persistent and discouraging of 
adoption among folks who very often have few enterprise choices before them 
other than Windows. This is not out of religious attachment, but due to 
dependencies in the data workflow chain that are not easy to fix. In 
particular, many financial industry developers have to use Windows stacks in 
some part of the data acquisition process. These are not inexperienced 
developers in cross-platform, Linux or java.


The reason why such folks are liable to get discouraged is there prior bad 
experiences with Java in distributed systems that had Windows components due to 
the well known difference in file locking behaviour between Linux and Windows. 
When such folks see barf-outs of this kind, they get discouraged that we are 
back in the EJB hell of previous times when systems broke all over the place 
and there were elaborate hacks to patch the problem.


Please consider how we can better test and isolate what is really causing this 
problem.











> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> 

[jira] [Commented] (SPARK-26059) Spark standalone mode, does not correctly record a failed Spark Job.

2019-01-16 Thread Prashant Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743783#comment-16743783
 ] 

Prashant Sharma commented on SPARK-26059:
-

One work around is to capture the JVM exit code for the driver jvm. It is 
non-zero if job fails and there are exceptions, and zero if the job is 
successful. 

> Spark standalone mode, does not correctly record a failed Spark Job.
> 
>
> Key: SPARK-26059
> URL: https://issues.apache.org/jira/browse/SPARK-26059
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 3.0.0
>Reporter: Prashant Sharma
>Priority: Major
>
> In order to reproduce submit a failing job to spark standalone master. The 
> status for the failed job is shown as FINISHED, irrespective of the fact it 
> failed or succeeded. 
> EDIT: It happens only when deploy-mode is client, and when deploy mode is 
> cluster it works as expected.
> - Reported by: Surbhi Bakhtiyar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26616) Expose document frequency in IDFModel

2019-01-16 Thread Jatin Puri (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jatin Puri updated SPARK-26616:
---
Docs Text: Provide DocumentFrequency in IDF  (was: Provide 
DocumentFrequency vector in IDF)

> Expose document frequency in IDFModel
> -
>
> Key: SPARK-26616
> URL: https://issues.apache.org/jira/browse/SPARK-26616
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.4.0
>Reporter: Jatin Puri
>Priority: Minor
>
> As part of `org.apache.spark.ml.feature.IDFModel`, the following can be 
> exposed:
>  
> 1. Document frequency vector
> 2. Number of documents
>  
> The above are already computed in calculating idf vector. It simply need to 
> be exposed as `public val`
>  
> This avoids re-implementation for someone who needs to compute 
> DocumentFrequency of terms. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26635) illegal hardware instruction

2019-01-16 Thread Elchin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elchin resolved SPARK-26635.

Resolution: Fixed

> illegal hardware instruction
> 
>
> Key: SPARK-26635
> URL: https://issues.apache.org/jira/browse/SPARK-26635
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 2.4.0
>Reporter: Elchin
>Priority: Major
>
> I can't import pyarrow:
> {code:java}
> >>> import pyarrow as pa
> [1]    31441 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-26635) illegal hardware instruction

2019-01-16 Thread Elchin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elchin closed SPARK-26635.
--

> illegal hardware instruction
> 
>
> Key: SPARK-26635
> URL: https://issues.apache.org/jira/browse/SPARK-26635
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 2.4.0
>Reporter: Elchin
>Priority: Major
>
> I can't import pyarrow:
> {code:java}
> >>> import pyarrow as pa
> [1]    31441 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26635) illegal hardware instruction

2019-01-16 Thread Elchin (JIRA)
Elchin created SPARK-26635:
--

 Summary: illegal hardware instruction
 Key: SPARK-26635
 URL: https://issues.apache.org/jira/browse/SPARK-26635
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Core
Affects Versions: 2.4.0
Reporter: Elchin


I can't import pyarrow:
{code:java}
>>> import pyarrow as pa
[1]    31441 illegal hardware instruction (core dumped)  python3{code}
The environment is:

Python 3.6.7
 PySpark 2.4.0
 PyArrow: 0.11.1
 Pandas: 0.23.4
 NumPy: 1.15.4
 OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-16 Thread Elchin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elchin closed SPARK-26591.
--

> Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain 
> environment
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Major
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-16 Thread Elchin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elchin resolved SPARK-26591.

Resolution: Feedback Received

> Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain 
> environment
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Major
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-16 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743761#comment-16743761
 ] 

Hyukjin Kwon commented on SPARK-26591:
--

Hm, I see. The problem is specific to PyArrow then. Let's leave this ticket 
closed in Spark side then.

> Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain 
> environment
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Major
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25992) Accumulators giving KeyError in pyspark

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25992:


Assignee: (was: Apache Spark)

> Accumulators giving KeyError in pyspark
> ---
>
> Key: SPARK-25992
> URL: https://issues.apache.org/jira/browse/SPARK-25992
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Abdeali Kothari
>Priority: Major
>
> I am using accumulators and when I run my code, I sometimes get some warn 
> messages. When I checked, there was nothing accumulated - not sure if I lost 
> info from the accumulator or it worked and I can ignore this error ?
> The message:
> {noformat}
> Exception happened during processing of request from
> ('127.0.0.1', 62099)
> Traceback (most recent call last):
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 317, in 
> _handle_request_noblock
> self.process_request(request, client_address)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 348, in 
> process_request
> self.finish_request(request, client_address)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 361, in 
> finish_request
> self.RequestHandlerClass(request, client_address, self)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 696, in 
> __init__
> self.handle()
> File "/usr/local/hadoop/spark2.3.1/python/pyspark/accumulators.py", line 238, 
> in handle
> _accumulatorRegistry[aid] += update
> KeyError: 0
> 
> 2018-11-09 19:09:08 ERROR DAGScheduler:91 - Failed to update accumulators for 
> task 0
> org.apache.spark.SparkException: EOF reached before Python server acknowledged
>   at 
> org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:634)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1131)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1123)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1123)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1206)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25992) Accumulators giving KeyError in pyspark

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25992:


Assignee: Apache Spark

> Accumulators giving KeyError in pyspark
> ---
>
> Key: SPARK-25992
> URL: https://issues.apache.org/jira/browse/SPARK-25992
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
>Reporter: Abdeali Kothari
>Assignee: Apache Spark
>Priority: Major
>
> I am using accumulators and when I run my code, I sometimes get some warn 
> messages. When I checked, there was nothing accumulated - not sure if I lost 
> info from the accumulator or it worked and I can ignore this error ?
> The message:
> {noformat}
> Exception happened during processing of request from
> ('127.0.0.1', 62099)
> Traceback (most recent call last):
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 317, in 
> _handle_request_noblock
> self.process_request(request, client_address)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 348, in 
> process_request
> self.finish_request(request, client_address)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 361, in 
> finish_request
> self.RequestHandlerClass(request, client_address, self)
> File "/Users/abdealijk/anaconda3/lib/python3.6/socketserver.py", line 696, in 
> __init__
> self.handle()
> File "/usr/local/hadoop/spark2.3.1/python/pyspark/accumulators.py", line 238, 
> in handle
> _accumulatorRegistry[aid] += update
> KeyError: 0
> 
> 2018-11-09 19:09:08 ERROR DAGScheduler:91 - Failed to update accumulators for 
> task 0
> org.apache.spark.SparkException: EOF reached before Python server acknowledged
>   at 
> org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:634)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1131)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1123)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1123)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1206)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1820)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26591) Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain environment

2019-01-16 Thread Elchin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743758#comment-16743758
 ] 

Elchin commented on SPARK-26591:


[~bryanc] I even can't import pyarrow:
{code:java}
>>> import pyarrow as pa
[1]    31441 illegal hardware instruction (core dumped)  python3
{code}

> Scalar Pandas UDF fails with 'illegal hardware instruction' in a certain 
> environment
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Major
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}
> The environment is:
> Python 3.6.7
>  PySpark 2.4.0
>  PyArrow: 0.11.1
>  Pandas: 0.23.4
>  NumPy: 1.15.4
>  OS: Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 
> x86_64 x86_64 x86_64 GNU/Linux



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26634) OutputCommitCoordinator may allow task of FetchFailureStage commit again

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26634:


Assignee: Apache Spark

> OutputCommitCoordinator may allow task of FetchFailureStage commit again
> 
>
> Key: SPARK-26634
> URL: https://issues.apache.org/jira/browse/SPARK-26634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.4.0
>Reporter: liupengcheng
>Assignee: Apache Spark
>Priority: Major
>
> In our production spark cluster, we encoutered a case that the task of retry 
> stage due to FetchFailure is denied to commit. However, the task is the first 
> attempt of this retry stage.
> After carefully investigating, it was found that the call of canCommit of 
> OutputCommitCoordinator would allow the task of FetchFailure stage(with the 
> same parition number as new task of retry stage) commit. which result in the 
> TaskCommitDenied for all the task (same partition) of retry stage. Becuase of 
> TaskCommitDenied is not countTowardsFailure, thus might cause Application 
> hangs forever.
>  
> {code:java}
> 2019-01-09,08:39:53,676 INFO org.apache.spark.scheduler.TaskSetManager: 
> Starting task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, 
> executor 456, partition 138, PROCESS_LOCAL, 5829 bytes)
> 2019-01-09,08:43:37,514 INFO org.apache.spark.scheduler.TaskSetManager: 
> Finished task 138.0 in stage 5.0 (TID 30634) in 466958 ms on 
> zjy-hadoop-prc-st1212.bj (executor 1632) (674/5000)
> 2019-01-09,08:45:57,372 WARN org.apache.spark.scheduler.TaskSetManager: Lost 
> task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, executor 456): 
> TaskCommitDenied (Driver denied task commit) for job: 5, partition: 138, 
> attemptNumber: 1
> 166483 2019-01-09,08:45:57,373 INFO 
> org.apache.spark.scheduler.OutputCommitCoordinator: Task was denied 
> committing, stage: 5, partition: 138, attempt number: 0, attempt 
> number(counting failed stage): 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26634) OutputCommitCoordinator may allow task of FetchFailureStage commit again

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26634:


Assignee: (was: Apache Spark)

> OutputCommitCoordinator may allow task of FetchFailureStage commit again
> 
>
> Key: SPARK-26634
> URL: https://issues.apache.org/jira/browse/SPARK-26634
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.4.0
>Reporter: liupengcheng
>Priority: Major
>
> In our production spark cluster, we encoutered a case that the task of retry 
> stage due to FetchFailure is denied to commit. However, the task is the first 
> attempt of this retry stage.
> After carefully investigating, it was found that the call of canCommit of 
> OutputCommitCoordinator would allow the task of FetchFailure stage(with the 
> same parition number as new task of retry stage) commit. which result in the 
> TaskCommitDenied for all the task (same partition) of retry stage. Becuase of 
> TaskCommitDenied is not countTowardsFailure, thus might cause Application 
> hangs forever.
>  
> {code:java}
> 2019-01-09,08:39:53,676 INFO org.apache.spark.scheduler.TaskSetManager: 
> Starting task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, 
> executor 456, partition 138, PROCESS_LOCAL, 5829 bytes)
> 2019-01-09,08:43:37,514 INFO org.apache.spark.scheduler.TaskSetManager: 
> Finished task 138.0 in stage 5.0 (TID 30634) in 466958 ms on 
> zjy-hadoop-prc-st1212.bj (executor 1632) (674/5000)
> 2019-01-09,08:45:57,372 WARN org.apache.spark.scheduler.TaskSetManager: Lost 
> task 138.0 in stage 5.1 (TID 31437, zjy-hadoop-prc-st159.bj, executor 456): 
> TaskCommitDenied (Driver denied task commit) for job: 5, partition: 138, 
> attemptNumber: 1
> 166483 2019-01-09,08:45:57,373 INFO 
> org.apache.spark.scheduler.OutputCommitCoordinator: Task was denied 
> committing, stage: 5, partition: 138, attempt number: 0, attempt 
> number(counting failed stage): 1
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26619) Prune the unused serializers from `SerializeFromObject`

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26619:


Assignee: (was: Apache Spark)

> Prune the unused serializers from `SerializeFromObject`
> ---
>
> Key: SPARK-26619
> URL: https://issues.apache.org/jira/browse/SPARK-26619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26619) Prune the unused serializers from `SerializeFromObject`

2019-01-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26619:


Assignee: Apache Spark

> Prune the unused serializers from `SerializeFromObject`
> ---
>
> Key: SPARK-26619
> URL: https://issues.apache.org/jira/browse/SPARK-26619
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org