Would you mind to share DDLs of all involved tables? What format are
these tables stored in? Is this issue specific to this query? I guess
Hive, Shark and Spark SQL all read from the same HDFS dataset?
On 10/27/14 3:45 PM, lyf刘钰帆 wrote:
Hi,
I am using SparkSQL 1.1.0 with cdh 4.6.0 recently, however, the
SparkSQL will print wired results than Hive(shark)
The SQL:
select c.theyear, sum(b.amount) from tblA a join tblB b on a.number =
b.number join tbldate c on a.dateid = c.dateid group by c.theyear
order by theyear;
*where *
dateID string,
theyear string,
amount int
number STRING,
Result by Shark:
+----------+-----------+
| theyear | _c1 |
+----------+-----------+
| 2004 | *1403018* |
| 2005 | 5557850 |
| 2006 | 7203061 |
| 2007 | 11300432 |
| 2008 | 12109328 |
| 2009 | 5365447 |
| 2010 | 188944 |
+----------+-----------+
Result by Hive:
theyear _c1
2004 *1403018*
2005 5557850
2006 7203061
2007 11300432
2008 12109328
2009 5365447
2010 188944
Result by SparkSQL:
+----------+-----------+
| theyear | c_1 |
+----------+-----------+
| 2004 | *3265696* |
| 2005 | 13247234 |
| 2006 | 13670416 |
| 2007 | 16711974 |
| 2008 | 14670698 |
| 2009 | 6322137 |
| 2010 | 210924 |
+----------+-----------+
Best regards
Patrick Liu刘 钰帆| 1#3F122