[ https://issues.apache.org/jira/browse/SPARK-21661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-21661. ---------------------------------- Resolution: Fixed I think it was fixed in the PR above. Please reopen this if this still exists. > SparkSQL can't merge load table from Hadoop > ------------------------------------------- > > Key: SPARK-21661 > URL: https://issues.apache.org/jira/browse/SPARK-21661 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.2.0 > Reporter: Dapeng Sun > > Here is the original text of external table on HDFS: > {noformat} > Permission Owner Group Size Last Modified Replication Block > Size Name > -rw-r--r-- root supergroup 0 B 8/6/2017, 11:43:03 PM 3 > 256 MB income_band_001.dat > -rw-r--r-- root supergroup 0 B 8/6/2017, 11:39:31 PM 3 > 256 MB income_band_002.dat > ... > -rw-r--r-- root supergroup 327 B 8/6/2017, 11:44:47 PM 3 > 256 MB income_band_530.dat > {noformat} > After SparkSQL load, every files have a output file, even the files are 0B. > For the load on Hive, the data files would be merged according the data size > of original files. > Reproduce: > {noformat} > CREATE EXTERNAL TABLE t1 (a int,b string) STORED AS TEXTFILE LOCATION > "hdfs://xxx:9000/data/t1" > CREATE TABLE t2 STORED AS PARQUET AS SELECT * FROM t1; > {noformat} > The table t2 have many small files without data. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org