Hi,

unfortunately it is not so straightforward

xxx_parquet.db

is a folder of managed database created by hive/impala, so, every sub
element in it is a table in hive/impala, they are folders in HDFS, and each
table has different schema, and in its folder there are one or more parquet
files.

that means

xxxxxx001_suffix
xxxxxx002_suffix

are folders, there are some parquet files like

xxxxxx001_suffix/parquet_file1_with_schema1

xxxxxx002_suffix/parquet_file1_with_schema2
xxxxxx002_suffix/parquet_file2_with_schema2

it seems only union can do this job~

Nonetheless, thank you very much, maybe the only reason is that spark eating
up too much memory...



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to