beliefer opened a new pull request #23841: [SPARK-26936][SQL] Fix bug of insert 
overwrite local dir and inconsistent behavior with Hive
URL: https://github.com/apache/spark/pull/23841
 
 
   ## What changes were proposed in this pull request?
   The feature is 'insert overwrite local directory' has an inconsistent 
behavior with Hive and has a bug.
   
   ### First, let me introduce the inconsistent behavior.
   
   There exists a local path '/home/spark/' and not contains child directory 
'result' on driver node.
   I want save data of hive table A into  '/home/spark/result/A/', so I use the 
SQL as follows:
   `insert overwrite local directory '/home/spark/result/A/' select * from A;`
   When I execute this SQL, Hive will create the parent directory 'result' and 
child directory 'A', and finally mv the data into  '/home/spark/result/A/'.
   But Spark SQL will not do these things.
   This pr will use LocalFileSystem to create path that not exists.
   ### Second, let me introduce bug of  'insert overwrite local directory'.
   
   If I execute the SQL mentioned before, a HiveException will appear as 
follows:
   `Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.io.IOException: Mkdirs failed to create 
file:/home/xitong/hive/stagingdir_hive_2019-02-19_17-31-00_678_1816816774691551856-1/-ext-10000/_temporary/0/_temporary/attempt_20190219173233_0002_m_000000_3
 (exists=false, 
cwd=file:/data10/yarn/nm-local-dir/usercache/xitong/appcache/application_1543893582405_6126857/container_e124_1543893582405_6126857_01_000011)
   at 
org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)`
   Current spark sql generate a local temporary path in local staging 
directory.The schema of local temporary path is file,so the HiveException 
appears.
   This pr change the local temporary path to HDFS temporary path, and use 
DistributedFileSystem instance copy the data from HDFS temporary path to local 
directory.
   ## How was this patch tested?
   
   Using exists junit or suite.
   
   Please review http://spark.apache.org/contributing.html before opening a 
pull request.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to