Hi,

we always get issues on inserting or creating table with Amazon EMR Spark
version, by inserting about 1GB resultset, the spark sql query will never be
finished.

by inserting small resultset (like 500MB), works fine. 

*spark.sql.shuffle.partitions* by default 200
or *set spark.sql.shuffle.partitions=1*
do not help.

the log stopped at:
*/15/04/01 15:48:13 INFO s3n.S3NativeFileSystem: rename
s3://hive-db/tmp/hive-hadoop/hive_2015-04-01_15-47-43_036_1196347178448825102-15/-ext-10000
s3://hive-db/db_xxx/some_huge_table/*

then only metrics.MetricsSaver logs.

we set
/  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>s3://hive-db</value>
  </property>/
but hive.exec.scratchdir ist not set, i have no idea why the tmp files were
created in /s3://hive-db/tmp/hive-hadoop//

we just tried the newest Spark 1.3.0 on AMI 3.5.x and AMI 3.6
(https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/VersionInformation.md),
still not work.

anyone get same issue? any idea about how to fix it?

i believe Amazon EMR's Spark version use
com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem to access s3, but not the
original hadoop s3n implementation, right?

/home/hadoop/spark/classpath/emr/*
and
/home/hadoop/spark/classpath/emrfs/*
is in classpath

btw. is there any plan to use the new hadoop s3a implementation instead of
s3n ?

Thanks for any help.

Teng




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Issue-on-Spark-SQL-insert-or-create-table-with-Spark-running-on-AWS-EMR-s3n-S3NativeFileSystem-renamd-tp22340.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to