160G parquet files (ca. 30 files, snappy compressed, made by cloudera impala)
ca. 30 full table scan, took 3-5 columns out, then some normal scala operations like substring, groupby, filter, at the end, save as file in HDFS yarn-client mode, 23 core and 60G mem / node but, always failed ! startup script (3 NodeManager, each an executor): some screenshot: <http://apache-spark-user-list.1001560.n3.nabble.com/file/n10254/spark1.png> <http://apache-spark-user-list.1001560.n3.nabble.com/file/n10254/spark2.png> i got some log like: same job using standalone mode (3 slaves) works... startup script (each 24 cores, 64g mem) : any idea? thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-0-1-SQL-on-160-G-parquet-file-snappy-compressed-made-by-cloudera-impala-23-core-and-60G-mem-d-tp10254.html Sent from the Apache Spark User List mailing list archive at Nabble.com.