beliefer opened a new pull request #23841: [SPARK-26936][SQL] Fix bug of insert overwrite local dir and inconsistent behavior with Hive URL: https://github.com/apache/spark/pull/23841 ## What changes were proposed in this pull request? The feature is 'insert overwrite local directory' has an inconsistent behavior with Hive and has a bug. ### First, let me introduce the inconsistent behavior. There exists a local path '/home/spark/' and not contains child directory 'result' on driver node. I want save data of hive table A into '/home/spark/result/A/', so I use the SQL as follows: `insert overwrite local directory '/home/spark/result/A/' select * from A;` When I execute this SQL, Hive will create the parent directory 'result' and child directory 'A', and finally mv the data into '/home/spark/result/A/'. But Spark SQL will not do these things. This pr will use LocalFileSystem to create path that not exists. ### Second, let me introduce bug of 'insert overwrite local directory'. If I execute the SQL mentioned before, a HiveException will appear as follows: `Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: Mkdirs failed to create file:/home/xitong/hive/stagingdir_hive_2019-02-19_17-31-00_678_1816816774691551856-1/-ext-10000/_temporary/0/_temporary/attempt_20190219173233_0002_m_000000_3 (exists=false, cwd=file:/data10/yarn/nm-local-dir/usercache/xitong/appcache/application_1543893582405_6126857/container_e124_1543893582405_6126857_01_000011) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)` Current spark sql generate a local temporary path in local staging directory.The schema of local temporary path is file,so the HiveException appears. This pr change the local temporary path to HDFS temporary path, and use DistributedFileSystem instance copy the data from HDFS temporary path to local directory. ## How was this patch tested? Using exists junit or suite. Please review http://spark.apache.org/contributing.html before opening a pull request.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
