Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

chutium Mon, 21 Jul 2014 11:21:13 -0700

Hi,

unfortunately it is not so straightforward


xxx_parquet.db

is a folder of managed database created by hive/impala, so, every sub
element in it is a table in hive/impala, they are folders in HDFS, and each
table has different schema, and in its folder there are one or more parquet
files.

that means

xxxxxx001_suffix
xxxxxx002_suffix

are folders, there are some parquet files like

xxxxxx001_suffix/parquet_file1_with_schema1

xxxxxx002_suffix/parquet_file1_with_schema2
xxxxxx002_suffix/parquet_file2_with_schema2

it seems only union can do this job~

Nonetheless, thank you very much, maybe the only reason is that spark eating
up too much memory...



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254p10335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark 1.0.1 SQL on 160 G parquet file (snappy compressed, made by cloudera impala), 23 core and 60G mem / node, yarn-client mode, always failed

Reply via email to