See
https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html
you most probably do not require exact counts.
Am Di., 11. Dez. 2018 um 02:09 Uhr schrieb 15313776907 <15313776...@163.com
>:
> i think you can add executer memory
>
> 15313776907
>
i think you can add executer memory
| |
15313776907
|
|
邮箱:15313776...@163.com
|
签名由 网易邮箱大师 定制
On 12/11/2018 08:28, lsn24 wrote:
Hello,
I have a requirement where I need to get total count of rows and total
count of failedRows based on a grouping.
The code looks like below:
Hello,
I have a requirement where I need to get total count of rows and total
count of failedRows based on a grouping.
The code looks like below:
myDataset.createOrReplaceTempView("temp_view");
Dataset countDataset = sparkSession.sql("Select
I found out the problem. Grouping by a constant column value is indeed
impossible.
The reason it was working in my project is that I gave the constant
column an alias that exists in the schema of the dataframe. The dataframe
contained a data_timestamp representing an hour, and I added to the
Hi,
Facing a bug with group by in SparkSQL (version 1.4).
Registered a JavaRDD with object containing integer fields as a table.
Then I'm trying to do a group by, with a constant value in the group by
fields:
SELECT primary_one, primary_two, 10 as num, SUM(measure) as total_measures
FROM tbl
Hi,
i am trying to issue a sql query against a parquet file and am getting
errors and would like some help to figure out what is going on.
The sql :
select timestamp, count(rid), qi.clientname from records where timestamp
0 group by qi.clientname
I am getting the following error:
You can't use columns (timestamp) that aren't in the GROUP BY clause.
Spark 1.2+ give you a better error message for this case.
On Fri, Feb 6, 2015 at 3:12 PM, Mohnish Kodnani mohnish.kodn...@gmail.com
wrote:
Hi,
i am trying to issue a sql query against a parquet file and am getting
errors
Doh :) Thanks.. seems like brain freeze.
On Fri, Feb 6, 2015 at 3:22 PM, Michael Armbrust mich...@databricks.com
wrote:
You can't use columns (timestamp) that aren't in the GROUP BY clause.
Spark 1.2+ give you a better error message for this case.
On Fri, Feb 6, 2015 at 3:12 PM, Mohnish