[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2017-08-20 Thread Mark S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134527#comment-16134527
 ] 

Mark S edited comment on SPARK-12216 at 8/20/17 8:12 PM:
-

I seem to have same issue on my Windows 10 environment, when I am running in 
local mode.
{code:java}
SparkSession spark = SparkSession
.builder()
.master("local")
.appName("Java Spark SQL App")
.config("spark.some.config.option", "some-value")
.getOrCreate();

Dataset df = spark.read().json("*.gz").toDF();
//  Dataset df = 
spark.read().json("C:\\dev\\source\\_misc\\Company\\sample-data\\data01.gz").toDF();
//  Dataset df = 
spark.read().json("/mnt/c/dev/source/_misc/Company/sample-data/data01.gz").toDF();

df.createOrReplaceTempView("MyDataTable");
Dataset result01 = spark.sql("Select postcode, count(*) from MyDataTable 
Group by postcode");

// result01.write().format("parquet").save("output.parquet");
result01.write().parquet("output.parquet");
{code}

*Question - Am I to assumed this Spark 2.x + Windows issue will not be fixed?*

BTW.  I can confirm that running Spark using  "Bash on Ubuntu for Windows" as 
suggested by [~jerome.scheuring] does work.

{noformat}
17/08/20 18:29:36 INFO FileOutputCommitter: Saved output of task 
'attempt_20170820182935_0017_m_00_0' to 
file:/mnt/c/dev/source/_misc/Company/Project/target/output.parquet/_temporary/0/task_201708201829
35_0017_m_00
17/08/20 18:29:36 INFO SparkHadoopMapRedUtil: 
attempt_20170820182935_0017_m_00_0: Committed
17/08/20 18:29:36 INFO Executor: Finished task 0.0 in stage 17.0 (TID 606). 
2294 bytes result sent to driver
17/08/20 18:29:36 INFO TaskSetManager: Finished task 0.0 in stage 17.0 (TID 
606) in 899 ms on localhost (executor driver) (1/1)
17/08/20 18:29:36 INFO DAGScheduler: ResultStage 17 (parquet at 
SparkApp.java:61) finished in 0.900 s
17/08/20 18:29:36 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks 
have all completed, from pool
17/08/20 18:29:36 INFO DAGScheduler: Job 8 finished: parquet at 
SparkApp.java:61, took 6.870277 s
{noformat}


h3. Environment 1
* Windows 10 
* Java 8
* Spark 2.2.0
* Parquet 1.8.2
* Stacktrace
{noformat}
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
... 35 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: 

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2017-08-20 Thread Mark S (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16134527#comment-16134527
 ] 

Mark S commented on SPARK-12216:


I seem to have same issue on my Windows 10 environment, when I am running in 
local mode.
{code:java}
SparkSession spark = SparkSession
.builder()
.master("local")
.appName("Java Spark SQL App")
.config("spark.some.config.option", "some-value")
.getOrCreate();

Dataset df = spark.read().json("*.gz").toDF();
//  Dataset df = 
spark.read().json("C:\\dev\\source\\_misc\\Company\\sample-data\\data01.gz").toDF();
//  Dataset df = 
spark.read().json("/mnt/c/dev/source/_misc/Company/sample-data/data01.gz").toDF();

df.createOrReplaceTempView("MyDataTable");
Dataset result01 = spark.sql("Select postcode, count(*) from MyDataTable 
Group by postcode");

// result01.write().format("parquet").save("output.parquet");
result01.write().parquet("output.parquet");
{code}

*Question - Am I to assumed that Spark 2.x + Windows is not supported?*

BTW.  I can confirm that running Spark using  "Bash on Ubuntu for Windows" as 
suggested by [~jerome.scheuring] does work.

{noformat}
17/08/20 18:29:36 INFO FileOutputCommitter: Saved output of task 
'attempt_20170820182935_0017_m_00_0' to 
file:/mnt/c/dev/source/_misc/Company/Project/target/output.parquet/_temporary/0/task_201708201829
35_0017_m_00
17/08/20 18:29:36 INFO SparkHadoopMapRedUtil: 
attempt_20170820182935_0017_m_00_0: Committed
17/08/20 18:29:36 INFO Executor: Finished task 0.0 in stage 17.0 (TID 606). 
2294 bytes result sent to driver
17/08/20 18:29:36 INFO TaskSetManager: Finished task 0.0 in stage 17.0 (TID 
606) in 899 ms on localhost (executor driver) (1/1)
17/08/20 18:29:36 INFO DAGScheduler: ResultStage 17 (parquet at 
SparkApp.java:61) finished in 0.900 s
17/08/20 18:29:36 INFO TaskSchedulerImpl: Removed TaskSet 17.0, whose tasks 
have all completed, from pool
17/08/20 18:29:36 INFO DAGScheduler: Job 8 finished: parquet at 
SparkApp.java:61, took 6.870277 s
{noformat}


h3. Environment 1
* Windows 10 
* Java 8
* Spark 2.2.0
* Parquet 1.8.2
* Stacktrace
{noformat}
Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:188)
... 35 more
Caused by: org.apache.spark.SparkException: Task failed while writing rows
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:272)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:191)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: (null) entry in command string: null chmod 0644