subject:"\[jira\] \[Updated\] \(SPARK\-19102\) Accuracy error of spark SQL results"

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2019-05-20 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-19102:
-
Labels: bulk-closed  (was: )

> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
>Priority: Major
>  Labels: bulk-closed
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM 
> hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but 
> actually they are not ,that's the problem.
> code:
> 
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.select("sumprice").show(50);
>   cube6.select("sumprice").show(50);
> 
> my  data has only one row and four column in the attach file：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-18 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the first 
sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat 
GROUP BY areacode1" , the second sql is "SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1",the result of sumprice should be equal,but actually they 
are not ,that's the problem.

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.select("sumprice").show(50);
cube6.select("sumprice").show(50);

my  data has only one row and four column in the attach file：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)

  was:
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the first 
sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat 
GROUP BY areacode1" , the second sql is "SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1",the result of sumprice should be equal,but actually they 
are not ,that's the problem.

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.select("sumprice").show(50);
cube6.select("sumprice").show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM 
> hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but 
> actually they are not ,that's the problem.
> code:
> 
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.select("sumprice").show(50);
>   cube6.select("sumprice").show(50);
> 
> my  data has only

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-18 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the first 
sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM hd_salesflat 
GROUP BY areacode1" , the second sql is "SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1",the result of sumprice should be equal,but actually they 
are not ,that's the problem.

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.select("sumprice").show(50);
cube6.select("sumprice").show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)

  was:
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the bug is 
only reappear  in the format like sum(a * b),count (distinct  c)

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.select("sumprice").show(50);
cube6.select("sumprice").show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> first sql is "SELECT areacode1, SUM(quantity*unitprice) AS sumprice FROM 
> hd_salesflat GROUP BY areacode1" , the second sql is "SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1",the result of sumprice should be equal,but 
> actually they are not ,that's the problem.
> code:
> 
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.select("sumprice").show(50);
>   cube6.select("sumprice").show(50);
> 
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-10 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19102:
-
Component/s: (was: Spark Core)

> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> bug is only reappear  in the format like sum(a * b),count (distinct  c)
> code:
> 
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.select("sumprice").show(50);
>   cube6.select("sumprice").show(50);
> 
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-08 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the bug is 
only reappear  in the format like sum(a * b),count (distinct  c)

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.select("sumprice").show(50);
cube6.select("sumprice").show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)

  was:
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the bug is 
only reappear  in the format like sum(a * b),count (distinct  c)

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> bug is only reappear  in the format like sum(a * b),count (distinct  c)
> code:
> 
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.select("sumprice").show(50);
>   cube6.select("sumprice").show(50);
> 
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the bug is 
only reappear  in the format like sum(a * b),count (distinct  c)

code:

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)

  was:
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the bug is 
only reappear  in the format like sum(a * b),count (distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> bug is only reappear  in the format like sum(a * b),count (distinct  c)
> code:
> 
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.show(50);
>   cube6.show(50);
> 
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is cube6's  second column named sumprice is 1 times bigger than 
the cube5's  second column named sumprice,but  they should be equal .the bug is 
only reappear  in the format like sum(a * b),count (distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)

  was:
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is cube6's  second column named sumprice is 1 times bigger 
> than the cube5's  second column named sumprice,but  they should be equal .the 
> bug is only reappear  in the format like sum(a * b),count (distinct  c)
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.show(50);
>   cube6.show(50);
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Attachment: a.zip

the attach file is my data,the data is parquet format

> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
> Attachments: a.zip
>
>
> the problem is the  result  of the code blow that the second column's value   
> is not the same.the second  sql result is 1 times bigger than the   first 
> sql result.the bug is only reappear  in the format like sum(a * b),count 
> (distinct  c)
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.show(50);
>   cube6.show(50);
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

data schema：
 |-- areacode1: string (nullable = true)
 |-- quantity: decimal(20,4) (nullable = true)
 |-- unitprice: decimal(20,4) (nullable = true)
 |-- transno: string (nullable = true)

  was:
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
>
> the problem is the  result  of the code blow that the second column's value   
> is not the same.the second  sql result is 1 times bigger than the   first 
> sql result.the bug is only reappear  in the format like sum(a * b),count 
> (distinct  c)
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.show(50);
>   cube6.show(50);
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN
> data schema：
>  |-- areacode1: string (nullable = true)
>  |-- quantity: decimal(20,4) (nullable = true)
>  |-- unitprice: decimal(20,4) (nullable = true)
>  |-- transno: string (nullable = true)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my  data：
transno | quantity | unitprice | areacode1
76317828|  1.  |  25.  |  HDCN

  was:
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);


> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
>
> the problem is the  result  of the code blow that the second column's value   
> is not the same.the second  sql result is 1 times bigger than the   first 
> sql result.the bug is only reappear  in the format like sum(a * b),count 
> (distinct  c)
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.show(50);
>   cube6.show(50);
> my  data：
> transno | quantity | unitprice | areacode1
> 76317828|  1.  |  25.  |  HDCN



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

  was:
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my data ：
| transno|   
lineno|productid|netamount|netamountoperation|serviceamount|quantity|unitprice|taxamount|discountamount|discountamountoperation|saleshour|businessdate|salesdate|week|holidayname|holidayid|financialyear|financialmonth|
  
dateticket|calendaryear|calendarmonth|calendarmonthchr|memberno|salestype|covers|grossamountticket|netamountticket|netamountoperationticket|points|discountamountticket|discountamounroperationticket|serviceamountticket|invoicecount|taxamountticket|
  
shopno|shopid|tableno|areacode1|areaname1|areacode2|areaname2|areacode3|areaname3|areacode4|areaname4|
   orgno|orgtype|  hdsino|shopname|   
shopenname|shopbrname|commercial1|com1name|commercial2|   
com2name|shoptype1|shoptype1name|shoptype2|shoptype2name|taxtype|floorlocation| 
   
m2|deliverareano|deliverareaname|parentorgno|cityno|country|menutype|menutypename|costcenterno|costcentername|pricearea|priceareaname|
opendate|openyear|shopcategory|timeperiod|   closedate| 
sapshopno|cg5no|countryname|province|provincename|cityname|countrycode|categoryno|categoryname|categoryno2|categoryname2|categoryno3|categoryname3|categoryno4|categoryname4|productno|productname|productenname|salesprice|vouchertype|
   startdate| 
enddate|flavor|basicunit|discountno|discountdetailamountoperation|disdesctiption|promotionno|salestag|salestagname|usertype|usertypevalue|usercd|
grossavg|   netoperationavg|
netavg|dineincount|dayamttotal|daynetamttotal|daynetamtopttotal|daytctotal|tablecount|
++-+-+-+--+-++-+-+--+---+-++-++---+-+-+--+++-+++-+--+-+---++--++-+---++---++--+---+-+-+-+-+-+-+-+-++---+++-+--+---++---+---+-+-+-+-+---+-+--+-+---+---+--+---++++--+-+-++++--++--+-+---++++---+--++---+-+---+-+---+-+-+---+-+--+---+++--+-+--+-+--+---++++-+--++--+--+---+---+--+-+--+--+
|76317828|121082663| 1392|  25.|   25.| null|  
1.|  25.|   1.4200|0.|

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);

my data ：
| transno|   
lineno|productid|netamount|netamountoperation|serviceamount|quantity|unitprice|taxamount|discountamount|discountamountoperation|saleshour|businessdate|salesdate|week|holidayname|holidayid|financialyear|financialmonth|
  
dateticket|calendaryear|calendarmonth|calendarmonthchr|memberno|salestype|covers|grossamountticket|netamountticket|netamountoperationticket|points|discountamountticket|discountamounroperationticket|serviceamountticket|invoicecount|taxamountticket|
  
shopno|shopid|tableno|areacode1|areaname1|areacode2|areaname2|areacode3|areaname3|areacode4|areaname4|
   orgno|orgtype|  hdsino|shopname|   
shopenname|shopbrname|commercial1|com1name|commercial2|   
com2name|shoptype1|shoptype1name|shoptype2|shoptype2name|taxtype|floorlocation| 
   
m2|deliverareano|deliverareaname|parentorgno|cityno|country|menutype|menutypename|costcenterno|costcentername|pricearea|priceareaname|
opendate|openyear|shopcategory|timeperiod|   closedate| 
sapshopno|cg5no|countryname|province|provincename|cityname|countrycode|categoryno|categoryname|categoryno2|categoryname2|categoryno3|categoryname3|categoryno4|categoryname4|productno|productname|productenname|salesprice|vouchertype|
   startdate| 
enddate|flavor|basicunit|discountno|discountdetailamountoperation|disdesctiption|promotionno|salestag|salestagname|usertype|usertypevalue|usercd|
grossavg|   netoperationavg|
netavg|dineincount|dayamttotal|daynetamttotal|daynetamtopttotal|daytctotal|tablecount|
++-+-+-+--+-++-+-+--+---+-++-++---+-+-+--+++-+++-+--+-+---++--++-+---++---++--+---+-+-+-+-+-+-+-+-++---+++-+--+---++---+---+-+-+-+-+---+-+--+-+---+---+--+---++++--+-+-++++--++--+-+---++++---+--++---+-+---+-+---+-+-+---+-+--+---+++--+-+--+-+--+---++++-+--++--+--+---+---+--+-+--+--+
|76317828|121082663| 1392|  25.|   25.| null|  
1.|  25.|   1.4200|0.| 0.|5|
20160920| 20160920| Tue|   | null| 2017| 
4|2016-09-20 17:03:...|2016|9| Sep| 1329651|
 SALE| 1|  25.|25.| 25.|  null| 
 0.|   0.| 0.|  
 0| 1.4200|CNSHA006|   202|   | HDCN|   哈根达斯中国| CN01| 
大华东区|   CN0001| 上海大区| CN01| 上海1区|CNSHA006|  1|HDAS0251|   
上海南东店|NAN DONG SHOP|  SHND| 01|  市级商业中心|  2|High Street|
1| Flagship|1|  无户外| 10|1|298.00|   
DL0141|  上海A天天|   CN01|   SHA| CN|   1|Full|
CN8X|  上海本地|   MK0004|

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

2017-01-06 Thread XiaodongCui (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiaodongCui updated SPARK-19102:

Description: 
the problem is the  result  of the code blow that the second column's value   
is not the same.the second  sql result is 1 times bigger than the first 
sql result.the bug is only reappear  in the format like sum(a * b),count 
(distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);



  was:
the problem is the  result  of the code blow that is not the same.the second  
sql result is 1 times bigger than the first sql result.the bug is only 
reappear  in the format like sum(a * b),count (distinct  c)

DataFrame 
df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
df1.registerTempTable("hd_salesflat");
DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM hd_salesflat 
GROUP BY areacode1");
cube5.show(50);
cube6.show(50);




> Accuracy error of spark SQL results
> ---
>
> Key: SPARK-19102
> URL: https://issues.apache.org/jira/browse/SPARK-19102
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.0, 1.6.1
> Environment: Spark 1.6.0, Hadoop 2.6.0,JDK 1.8,CentOS6.6
>Reporter: XiaodongCui
>
> the problem is the  result  of the code blow that the second column's value   
> is not the same.the second  sql result is 1 times bigger than the   first 
> sql result.the bug is only reappear  in the format like sum(a * b),count 
> (distinct  c)
>   DataFrame 
> df1=sqlContext.read().parquet("hdfs://cdh01:8020/sandboxdata_A/test/a");
>   df1.registerTempTable("hd_salesflat");
>   DataFrame cube5 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice FROM hd_salesflat GROUP BY areacode1");
>   DataFrame cube6 = sqlContext.sql("SELECT areacode1, 
> SUM(quantity*unitprice) AS sumprice, COUNT(DISTINCT transno)  FROM 
> hd_salesflat GROUP BY areacode1");
>   cube5.show(50);
>   cube6.show(50);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

[jira] [Updated] (SPARK-19102) Accuracy error of spark SQL results

13 matches

Site Navigation

Mail list logo

Footer information