Would you mind to share DDLs of all involved tables? What format are these tables stored in? Is this issue specific to this query? I guess Hive, Shark and Spark SQL all read from the same HDFS dataset?

On 10/27/14 3:45 PM, lyf刘钰帆 wrote:

Hi,

I am using SparkSQL 1.1.0 with cdh 4.6.0 recently, however, the SparkSQL will print wired results than Hive(shark)

The SQL:

select c.theyear, sum(b.amount) from tblA a join tblB b on a.number = b.number join tbldate c on a.dateid = c.dateid group by c.theyear order by theyear;

*where *

dateID string,

theyear string,

amount int

number STRING,

Result by Shark:

+----------+-----------+

| theyear  |    _c1    |

+----------+-----------+

| 2004     | *1403018* |

| 2005     | 5557850   |

| 2006     | 7203061   |

| 2007     | 11300432  |

| 2008     | 12109328  |

| 2009     | 5365447   |

| 2010     | 188944    |

+----------+-----------+

Result by Hive:

theyear    _c1

2004 *1403018*

2005         5557850

2006         7203061

2007         11300432

2008         12109328

2009         5365447

2010         188944

Result by SparkSQL:

+----------+-----------+

| theyear  |    c_1    |

+----------+-----------+

| 2004     | *3265696* |

| 2005     | 13247234  |

| 2006     | 13670416  |

| 2007     | 16711974  |

| 2008     | 14670698  |

| 2009     | 6322137   |

| 2010     | 210924    |

+----------+-----------+

Best regards

Patrick Liu刘 钰帆| 1#3F122


Reply via email to