Hi, I have Spark SQL performance issue. My code contains a simple JavaBean:
public class Person implements Externalizable { private int id; private String name; private double salary; .................... } Apply a schema to an RDD and register table. JavaRDD<Person> rdds = ... rdds.cache(); DataFrame dataFrame = sqlContext.createDataFrame(rdds, Person.class); dataFrame.registerTempTable("person"); sqlContext.cacheTable("person"); Run sql query. sqlContext.sql("SELECT id, name, salary FROM person WHERE salary >= YYY AND salary <= XXX").collectAsList() I launch standalone cluster which contains 4 workers. Each node runs on machine with 8 CPU and 15 Gb memory. When I run the query on the environment over RDD which contains 1 million persons it takes 1 minute. Somebody can tell me how to tuning the performance? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-performance-issue-tp22627.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org