[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-17 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938196#comment-13938196
 ] 

Jitendra Nath Pandey commented on HIVE-6664:


Ran tests locally. Only failures were show_create_table_serde.q and 
metadata_only_queries_with_filters.q which are unrelated to this patch.

 Vectorized variance computation differs from row mode computation.
 --

 Key: HIVE-6664
 URL: https://issues.apache.org/jira/browse/HIVE-6664
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-6664.1.patch, HIVE-6664.1.patch, HIVE-6664.1.patch


 Following query can show the difference:
 select  var_samp(ss_sales_price), var_pop(ss_sales_price), 
 stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.
 The reason for the difference is that row mode converts the decimal value to 
 double upfront to calculate sum of values, when computing variance. But the 
 vector mode performs local aggregate sum as decimal and converts into double 
 only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-17 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938407#comment-13938407
 ] 

Jitendra Nath Pandey commented on HIVE-6664:


I have committed this to trunk.

[~rhbutani] This bug affects hive-0.13 and causes different results than 
row-mode execution. This should be fixed in branch-0.13 as well.


 Vectorized variance computation differs from row mode computation.
 --

 Key: HIVE-6664
 URL: https://issues.apache.org/jira/browse/HIVE-6664
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-6664.1.patch, HIVE-6664.1.patch, HIVE-6664.1.patch


 Following query can show the difference:
 select  var_samp(ss_sales_price), var_pop(ss_sales_price), 
 stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.
 The reason for the difference is that row mode converts the decimal value to 
 double upfront to calculate sum of values, when computing variance. But the 
 vector mode performs local aggregate sum as decimal and converts into double 
 only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936074#comment-13936074
 ] 

Hive QA commented on HIVE-6664:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12634667/HIVE-6664.1.patch

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1803/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1803/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
 [copy] Copying 5 files to 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-hwi ---
[INFO] Compiling 2 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-hwi ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive ODBC 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes 
= [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
 [copy] Copying 5 files to 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-odbc ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc ---
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator 
---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 

[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-15 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936320#comment-13936320
 ] 

Hive QA commented on HIVE-6664:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12634938/HIVE-6664.1.patch

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1843/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1843/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources
[INFO] Copying 3 resources
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
 [copy] Copying 5 files to 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ 
hive-hwi ---
[INFO] Compiling 2 source files to 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi ---
[INFO] Building jar: 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-hwi ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive ODBC 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc ---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes 
= [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse
[mkdir] Created dir: 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
 [copy] Copying 5 files to 
/data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ 
hive-odbc ---
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc ---
[INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom
[INFO] 
[INFO] 
[INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator 
---
[INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims 
(includes = [datanucleus.log, derby.log], excludes = [])
[INFO] 
[INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
hive-shims-aggregator ---
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ 
hive-shims-aggregator ---
[INFO] Executing tasks

main:
[mkdir] Created dir: 

[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-14 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934793#comment-13934793
 ] 

Jitendra Nath Pandey commented on HIVE-6664:


Review board : https://reviews.apache.org/r/19216/

 Vectorized variance computation differs from row mode computation.
 --

 Key: HIVE-6664
 URL: https://issues.apache.org/jira/browse/HIVE-6664
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-6664.1.patch


 Following query can show the difference:
 select  var_samp(ss_sales_price), var_pop(ss_sales_price), 
 stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.
 The reason for the difference is that row mode converts the decimal value to 
 double upfront to calculate sum of values, when computing variance. But the 
 vector mode performs local aggregate sum as decimal and converts into double 
 only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-14 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935852#comment-13935852
 ] 

Eric Hanson commented on HIVE-6664:
---

+1

 Vectorized variance computation differs from row mode computation.
 --

 Key: HIVE-6664
 URL: https://issues.apache.org/jira/browse/HIVE-6664
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-6664.1.patch


 Following query can show the difference:
 select  var_samp(ss_sales_price), var_pop(ss_sales_price), 
 stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.
 The reason for the difference is that row mode converts the decimal value to 
 double upfront to calculate sum of values, when computing variance. But the 
 vector mode performs local aggregate sum as decimal and converts into double 
 only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.

2014-03-14 Thread Eric Hanson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935860#comment-13935860
 ] 

Eric Hanson commented on HIVE-6664:
---

In general, sum/avg/variance aggregate results that involve floating point 
arithmetic in the sum calculation will return different answers depending on 
execution order. This is due the nature of floating point arithmetic, where it 
is easy to show examples where (a + b) + c  a + (b + c). So it is probably 
not critical that row-mode and vector mode have results that are compatible to 
the last decimal place. However, the change here is simple enough and it makes 
for better compatibility without any serious drawbacks for performance, so I 
think this is fine.

 Vectorized variance computation differs from row mode computation.
 --

 Key: HIVE-6664
 URL: https://issues.apache.org/jira/browse/HIVE-6664
 Project: Hive
  Issue Type: Bug
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Attachments: HIVE-6664.1.patch


 Following query can show the difference:
 select  var_samp(ss_sales_price), var_pop(ss_sales_price), 
 stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales.
 The reason for the difference is that row mode converts the decimal value to 
 double upfront to calculate sum of values, when computing variance. But the 
 vector mode performs local aggregate sum as decimal and converts into double 
 only at flush.



--
This message was sent by Atlassian JIRA
(v6.2#6252)