Spark sql with large sql syntax job failed with outofmemory error and grows beyond 64k warn

2016-04-24 Thread FangFang Chen
Hi all, With large sql command, job failed with following error. Please give your suggestion on how to resolve it. Thanks Sql file size: 676k Log: 16/04/25 10:55:00 WARN TaskSetManager: Lost task 84.0 in stage 0.0 (TID 6, BJHC-HADOOP-HERA-17493.jd.local):

回复:Re: 回复:Spark sql and hive into different result with same sql

2016-04-21 Thread FangFang Chen
type from decimal to decimal with precision. Thanks 发自 网易邮箱大师 在2016年04月20日 20:47,Ted Yu 写道: Do you mind trying out build from master branch ? 1.5.3 is a bit old. On Wed, Apr 20, 2016 at 5:25 AM, FangFang Chen <lulynn_2015_sp...@163.com> wrote: I found spark sql lost precision,

回复:回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
int data>=0.5 then to 1. Is this a bug or some configuration thing? Please give some suggestions. Thanks 发自 网易邮箱大师 在2016年04月20日 18:45,FangFang Chen 写道: The output is: Spark SQ:6828127 Hive:6980574.1269 发自 网易邮箱大师 在2016年04月20日 18:06,FangFang Chen 写道: Hi all, Please give some suggestions.

回复:Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
The output is: Spark SQ:6828127 Hive:6980574.1269 发自 网易邮箱大师 在2016年04月20日 18:06,FangFang Chen 写道: Hi all, Please give some suggestions. Thanks With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns. Select sum(column) from table; column

Spark sql and hive into different result with same sql

2016-04-20 Thread FangFang Chen
Hi all, Please give some suggestions. Thanks With following same sql, spark sql and hive give different result. The sql is sum(decimal(38,18)) columns. Select sum(column) from table; column is defined as decimal(38,18). Spark version:1.5.3 Hive version:2.0.0 发自 网易邮箱大师

dataframe.groupby.agg vs sql("select from groupby)")

2016-03-10 Thread FangFang Chen
hi, Based on my testing, the memory cost is very different for 1. sql("select * from ...").groupby.agg 2. sql("select ... From ... Groupby ..."). For table.partition sized more than 500g, 2# run good, while outofmemory happened in 1#. I am using the same spark configurations. Could somebody