[ 
https://issues.apache.org/jira/browse/SPARK-30328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-30328.
----------------------------------
    Resolution: Invalid

> Fail to write local files with RDD.saveTextFile when setting the incorrect 
> Hadoop configuration files
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30328
>                 URL: https://issues.apache.org/jira/browse/SPARK-30328
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.0
>            Reporter: chendihao
>            Priority: Major
>
> We find that the incorrect Hadoop configuration files cause the failure of 
> saving RDD to local file system. It is not expected because we have specify 
> the local url and the API of DataFrame.write.text does not have this issue. 
> It is easy to reproduce and verify with Spark 2.3.0.
> 1.Do not set environment variable of `HADOOP_CONF_DIR`.
> 2.Install pyspark and run the local Python script. This should work and save 
> files to local file system.
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.master("local").getOrCreate()
> sc = spark.sparkContextrdd = sc.parallelize([1, 2, 3])
> rdd.saveAsTextFile("file:///tmp/rdd.text")
> {code}
> 3.Set environment variable of `HADOOP_CONF_DIR` and put the Hadoop 
> configuration files there. Make sure the format of `core-site.xml` is right 
> but it has an unresolved host name.
> 4.Run the same Python script again. If it try to connect HDFS and found the 
> unresolved host name, Java exception happens.
> We thinks `saveAsTextFile("file:///)` should not attempt to connect HDFS 
> whenever `HADOOP_CONF_DIR` is set or not. Actually the following code of 
> DataFrame will work with the same incorrect Hadoop configuration files.
> {code:java}
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.master("local").getOrCreate()
> df = spark.createDataFrame(rows, ["attribute", "value"])
> df.write.parquet("file:///tmp/df.parquet")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to