I'm trying to save a table using this code in pyspark with 1.6.1: prices = sqlContext.sql("SELECT AVG(amount) AS mean_price, country FROM src GROUP BY country") prices.collect() prices.write.saveAsTable('prices', format='parquet', mode='overwrite', path='/mnt/bigdisk/tables')
but I'm getting this error: 16/05/13 02:04:24 INFO HadoopRDD: Input split: file:/mnt/bigdisk/src.csv:100663296+33554432 16/05/13 02:04:33 WARN TaskMemoryManager: leak 68.0 MB memory from org.apache.spark.unsafe.map.BytesToBytesMap@f9f1b5e 16/05/13 02:04:33 ERROR Executor: Managed memory leak detected; size = 71303168 bytes, TID = 4085 16/05/13 02:04:33 ERROR Executor: Exception in task 2.0 in stage 35.0 (TID 4085) java.io.FileNotFoundException: /mnt/bigdisk/spark_tmp/blockmgr-69da47e4-3a75-4244-80d3-9c7c0943e7f8/25/temp_shuffle_77078209-a2c5-466c-bba1-ff1a700f257c (No such file or directory) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:221) at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) any ideas what could be wrong? thanks, imran