Hi,michael : I think Arthur.hk.chan isn't here now,I Can Show something: 1)my spark version is 1.0.1 2) when I use multiple join ,like this: sql("SELECT * FROM youhao_data left join youhao_age on (youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on (youhao_age.rowkey=youhao_totalKiloMeter.rowkey)") youhao_data,youhao_age,youhao_totalKiloMeter were registerAsTable 。
I take the Exception: Exception in thread "main" java.lang.RuntimeException: [1.90] failure: ``UNION'' expected but `left' found SELECT * FROM youhao_data left join youhao_age on (youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on (youhao_age.rowkey=youhao_totalKiloMeter.rowkey) ^ at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60) at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69) at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:181) at org.apache.spark.examples.sql.SparkSQLHBaseRelation$.main(SparkSQLHBaseRelation.scala:140) at org.apache.spark.examples.sql.SparkSQLHBaseRelation.main(SparkSQLHBaseRelation.scala) boyingk...@163.com From: Michael Armbrust Date: 2014-09-11 00:28 To: arthur.hk.c...@gmail.com CC: arunshell87; u...@spark.incubator.apache.org Subject: Re: Spark SQL -- more than two tables for join What version of Spark SQL are you running here? I think a lot of your concerns have likely been addressed in more recent versions of the code / documentation. (Spark 1.1 should be published in the next few days) In particular, for serious applications you should use a HiveContext and HiveQL as this is a much more complete implementation of a SQL Parser. The one in SQL context is only suggested if the Hive dependencies conflict with your application. 1) spark sql does not support multiple join This is not true. What problem were you running into? 2) spark left join: has performance issue Can you describe your data and query more? 3) spark sql’s cache table: does not support two-tier query I'm not sure what you mean here. 4) spark sql does not support repartition You can repartition SchemaRDDs in the same way as normal RDDs.