Re: Re: Spark SQL -- more than two tables for join

boyingk...@163.com Wed, 10 Sep 2014 18:30:06 -0700

Hi,michael :

I think Arthur.hk.chan isn't here now，I Can Show something:
1)my spark version is 1.0.1
2) when I use multiple join ，like this:
sql("SELECT * FROM youhao_data left join youhao_age on 
(youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on 
(youhao_age.rowkey=youhao_totalKiloMeter.rowkey)") 
      
       youhao_data,youhao_age,youhao_totalKiloMeter  were registerAsTable 。


I take the Exception:
Exception in thread "main" java.lang.RuntimeException: [1.90] failure: 
``UNION'' expected but `left' found

SELECT * FROM youhao_data left join youhao_age on 
(youhao_data.rowkey=youhao_age.rowkey) left join youhao_totalKiloMeter on 
(youhao_age.rowkey=youhao_totalKiloMeter.rowkey)
                                                                                
         ^
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.catalyst.SqlParser.apply(SqlParser.scala:60)
at org.apache.spark.sql.SQLContext.parseSql(SQLContext.scala:69)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:181)
at 
org.apache.spark.examples.sql.SparkSQLHBaseRelation$.main(SparkSQLHBaseRelation.scala:140)
at 
org.apache.spark.examples.sql.SparkSQLHBaseRelation.main(SparkSQLHBaseRelation.scala)



boyingk...@163.com

From: Michael Armbrust
Date: 2014-09-11 00:28
To: arthur.hk.c...@gmail.com
CC: arunshell87; u...@spark.incubator.apache.org
Subject: Re: Spark SQL -- more than two tables for join
What version of Spark SQL are you running here?  I think a lot of your concerns 
have likely been addressed in more recent versions of the code / documentation. 
 (Spark 1.1 should be published in the next few days)


In particular, for serious applications you should use a HiveContext and HiveQL 
as this is a much more complete implementation of a SQL Parser.  The one in SQL 
context is only suggested if the Hive dependencies conflict with your 
application.

1)  spark sql does not support multiple join



This is not true.  What problem were you running into?

2)  spark left join: has performance issue



Can you describe your data and query more?

3)  spark sql’s cache table: does not support two-tier query



I'm not sure what you mean here.

4)  spark sql does not support repartition


You can repartition SchemaRDDs in the same way as normal RDDs.

Re: Re: Spark SQL -- more than two tables for join

Reply via email to