[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349
 ] 

Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:59 PM:


_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

_Update:_  The bug also does not occur when run on the installation of Spark 
2.0.1 on the Windows 10 machine running inside "Bash on Ubuntu on Windows", 
i.e. the Linux subsystem running on the Windows 10 machine where the bug _does_ 
occur when the program is executed from Windows.

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}


was (Author: jerome.scheuring):
_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>  

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349
 ] 

Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:34 PM:


_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10, compiled with Scala 2.11 and running 
under Spark 2.0.1.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}


was (Author: jerome.scheuring):
_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349
 ] 

Jerome Scheuring commented on SPARK-12216:
--

_Note that I am entirely new to the process of submitting issues on this 
system: if this needs to be a new issue, I would appreciate someone letting me 
know._

A bug very similar to this one is 100% reproducible across multiple machines, 
running both Windows 8.1 and Windows 10.

It occurs

* in Scala, but not Python (have not tried R)
* only when reading CSV files (and not, for example, when reading Parquet files)
* only when running local, not submitted to a cluster

This program will produce the bug (if {{poemData}} is defined per the 
commented-out section, rather than being read from a CSV file, the bug does not 
occur):

{code}
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types._

object SparkBugDemo {
  def main(args: Array[String]): Unit = {

val poemSchema = StructType(
  Seq(
StructField("label",IntegerType), 
StructField("line",StringType)
  )
)

val sparkSession = SparkSession.builder()
  .appName("Spark Bug Demonstration")
  .master("local[*]")
  .getOrCreate()

//val poemData = sparkSession.createDataFrame(Seq(
//  (0, "There's many a strong farmer"),
//  (0, "Who's heart would break in two"),
//  (1, "If he could see the townland"),
//  (1, "That we are riding to;")
//)).toDF("label", "line")

val poemData = sparkSession.read
  .option("quote", value="")
  .schema(poemSchema)
  .csv(args(0))

println(s"Record count: ${poemData.count()}")

  }
}
{code}

Assuming that {{args(0)}} contains the path to a file with comma-separated 
integer/string pairs, as in:

{noformat}
0,There's many a strong farmer
0,Who's heart would break in two
1,If he could see the townland
1,That we are riding to;
{noformat}

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:234)
> at scala.util.Try$.apply(Try.scala:161)
> at 
> org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:216)
> at 
>