jiaan.geng created SPARK-27140: ---------------------------------- Summary: The feature is 'insert overwrite local directory' has an inconsistent behavior in different environment. Key: SPARK-27140 URL: https://issues.apache.org/jira/browse/SPARK-27140 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0, 2.3.0, 3.0.0 Reporter: jiaan.geng
In local[*] mode, maropu give a test case as follows: {code:java} $ls /tmp/noexistdir ls: /tmp/noexistdir: No such file or directory scala> sql("""create table t(c0 int, c1 int)""") scala> spark.table("t").explain == Physical Plan == Scan hive default.t [c0#5, c1#6], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6] scala> sql("""insert into t values(1, 1)""") scala> sql("""select * from t""").show +---+---+ | c0| c1| +---+---+ | 1| 1| +---+---+ scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * from t""") $ls /tmp/noexistdir/t/ _SUCCESS part-00000-bbea4213-071a-49b4-aac8-8510e7263d45-c000 {code} This test case prove spark will create the not exists path and move middle result from local temporary path to created path.This test based on newest master. I follow the test case provided by maropu,but find another behavior. I run these SQL maropu provided on local[*] deploy mode based on 2.3.0. Inconsistent behavior appears as follows: {code:java} ls /tmp/noexistdir ls: cannot access /tmp/noexistdir: No such file or directory scala> sql("""create table t(c0 int, c1 int)""") res0: org.apache.spark.sql.DataFrame = [] scala> spark.table("t").explain == Physical Plan == HiveTableScan [c0#5, c1#6], HiveTableRelation `default`.`t`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6] scala> sql("""insert into t values(1, 1)""") scala> sql("""select * from t""").show +---+---+ | c0| c1| +---+---+ | 1| 1| +---+---+ scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * from t""") res1: org.apache.spark.sql.DataFrame = [] ls /tmp/noexistdir/t/ /tmp/noexistdir/t vi /tmp/noexistdir/t 1 {code} Then I pull the master branch and compile it and deploy it on my hadoop cluster.I get the inconsistent behavior again. The spark version to test is 3.0.0. {code:java} ls /tmp/noexistdir ls: cannot access /tmp/noexistdir: No such file or directory Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release Spark context Web UI available at http://10.198.66.204:55326 Spark context available as 'sc' (master = local[*], app id = local-1551259036573). Spark session available as 'spark'. Welcome to spark version 3.0.0-SNAPSHOT Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131) Type in expressions to have them evaluated. Type :help for more information. scala> sql("""select * from t""").show +---+---+ | c0| c1| +---+---+ | 1| 1| +---+---+ scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * from t""") res1: org.apache.spark.sql.DataFrame = [] scala> ll /tmp/noexistdir/t -rw-r--r-- 1 xitong xitong 0 Feb 27 17:19 /tmp/noexistdir/t vi /tmp/noexistdir/t 1 {code} The /tmp/noexistdir/t is a file too. So -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org