jiaan.geng created SPARK-27140:
----------------------------------

             Summary: The feature is 'insert overwrite local directory' has an 
inconsistent behavior in different environment.
                 Key: SPARK-27140
                 URL: https://issues.apache.org/jira/browse/SPARK-27140
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.0, 2.3.0, 3.0.0
            Reporter: jiaan.geng


In local[*] mode, maropu give a test case as follows:
{code:java}
$ls /tmp/noexistdir
ls: /tmp/noexistdir: No such file or directory

scala> sql("""create table t(c0 int, c1 int)""")
scala> spark.table("t").explain
== Physical Plan ==
Scan hive default.t [c0#5, c1#6], HiveTableRelation `default`.`t`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6]

scala> sql("""insert into t values(1, 1)""")
scala> sql("""select * from t""").show
+---+---+
| c0| c1|
+---+---+
|  1|  1|
+---+---+

scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
from t""")

$ls /tmp/noexistdir/t/
_SUCCESS  part-00000-bbea4213-071a-49b4-aac8-8510e7263d45-c000
{code}
This test case prove spark will create the not exists path and move middle 
result from local temporary path to created path.This test based on newest 
master.

I follow the test case provided by maropu,but find another behavior.
I run these SQL maropu provided on local[*] deploy mode based on 2.3.0.
Inconsistent behavior appears as follows:
{code:java}
ls /tmp/noexistdir
ls: cannot access /tmp/noexistdir: No such file or directory

scala> sql("""create table t(c0 int, c1 int)""")
res0: org.apache.spark.sql.DataFrame = []
scala> spark.table("t").explain
== Physical Plan ==
HiveTableScan [c0#5, c1#6], HiveTableRelation `default`.`t`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c0#5, c1#6]

scala> sql("""insert into t values(1, 1)""")
scala> sql("""select * from t""").show
+---+---+                                                                       
| c0| c1|
+---+---+
|  1|  1|
+---+---+

scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
from t""")
res1: org.apache.spark.sql.DataFrame = [] 

ls /tmp/noexistdir/t/
/tmp/noexistdir/t

vi /tmp/noexistdir/t
  1 
{code}
Then I pull the master branch and compile it and deploy it on my hadoop 
cluster.I get the inconsistent behavior again. The spark version to test is 
3.0.0.
{code:java}
ls /tmp/noexistdir
ls: cannot access /tmp/noexistdir: No such file or directory
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector 
with the Serial old collector is deprecated and will likely be removed in a 
future release
Spark context Web UI available at http://10.198.66.204:55326
Spark context available as 'sc' (master = local[*], app id = 
local-1551259036573).
Spark session available as 'spark'.
Welcome to spark version 3.0.0-SNAPSHOT
Using Scala version 2.12.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> sql("""select * from t""").show
+---+---+                                                                       
| c0| c1|
+---+---+
|  1|  1|
+---+---+


scala> sql("""insert overwrite local directory '/tmp/noexistdir/t' select * 
from t""")
res1: org.apache.spark.sql.DataFrame = []                                       

scala> 
ll /tmp/noexistdir/t
-rw-r--r-- 1 xitong xitong 0 Feb 27 17:19 /tmp/noexistdir/t
vi /tmp/noexistdir/t
  1
{code}
The /tmp/noexistdir/t is a file too.

So



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to