Hi,
I have Spark SQL performance issue. My code contains a simple JavaBean:

    public class Person implements Externalizable {
        private int id;
        private String name;
        private double salary;
        ....................
    }


Apply a schema to an RDD and register table.

    JavaRDD<Person> rdds = ...
    rdds.cache();

    DataFrame dataFrame = sqlContext.createDataFrame(rdds, Person.class);
    dataFrame.registerTempTable("person");

    sqlContext.cacheTable("person");


Run sql query.

    sqlContext.sql("SELECT id, name, salary FROM person WHERE salary >= YYY
AND salary <= XXX").collectAsList()


I launch standalone cluster which contains 4 workers. Each node runs on
machine with 8 CPU and 15 Gb memory. When I run the query on the environment
over RDD which contains 1 million persons it takes 1 minute. Somebody can
tell me how to tuning the performance?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-performance-issue-tp22627.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to