[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938196#comment-13938196 ] Jitendra Nath Pandey commented on HIVE-6664: Ran tests locally. Only failures were show_create_table_serde.q and metadata_only_queries_with_filters.q which are unrelated to this patch. Vectorized variance computation differs from row mode computation. -- Key: HIVE-6664 URL: https://issues.apache.org/jira/browse/HIVE-6664 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6664.1.patch, HIVE-6664.1.patch, HIVE-6664.1.patch Following query can show the difference: select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales. The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938407#comment-13938407 ] Jitendra Nath Pandey commented on HIVE-6664: I have committed this to trunk. [~rhbutani] This bug affects hive-0.13 and causes different results than row-mode execution. This should be fixed in branch-0.13 as well. Vectorized variance computation differs from row mode computation. -- Key: HIVE-6664 URL: https://issues.apache.org/jira/browse/HIVE-6664 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6664.1.patch, HIVE-6664.1.patch, HIVE-6664.1.patch Following query can show the difference: select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales. The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936074#comment-13936074 ] Hive QA commented on HIVE-6664: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12634667/HIVE-6664.1.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1803/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1803/console Messages: {noformat} This message was trimmed, see log for full details [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hwi --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-hwi --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive ODBC 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-odbc --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims-aggregator --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims-aggregator --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-aggregator --- [INFO] Executing tasks main: [mkdir] Created dir:
[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936320#comment-13936320 ] Hive QA commented on HIVE-6664: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12634938/HIVE-6664.1.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1843/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/1843/console Messages: {noformat} This message was trimmed, see log for full details [INFO] Using 'UTF-8' encoding to copy filtered resources. [INFO] skip non existing resourceDirectory /data/hive-ptest/working/apache-svn-trunk-source/hwi/src/test/resources [INFO] Copying 3 resources [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-hwi --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @ hive-hwi --- [INFO] Compiling 2 source files to /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/test-classes [INFO] [INFO] --- maven-surefire-plugin:2.16:test (default-test) @ hive-hwi --- [INFO] Tests are skipped. [INFO] [INFO] --- maven-jar-plugin:2.2:jar (default-jar) @ hive-hwi --- [INFO] Building jar: /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-hwi --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-hwi --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/target/hive-hwi-0.14.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/hwi/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-hwi/0.14.0-SNAPSHOT/hive-hwi-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive ODBC 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-odbc --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/odbc (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-odbc --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-odbc --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-odbc --- [INFO] Executing tasks main: [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/warehouse [mkdir] Created dir: /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [copy] Copying 5 files to /data/hive-ptest/working/apache-svn-trunk-source/odbc/target/tmp/conf [INFO] Executed tasks [INFO] [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @ hive-odbc --- [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-odbc --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/odbc/pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-odbc/0.14.0-SNAPSHOT/hive-odbc-0.14.0-SNAPSHOT.pom [INFO] [INFO] [INFO] Building Hive Shims Aggregator 0.14.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-shims-aggregator --- [INFO] Deleting /data/hive-ptest/working/apache-svn-trunk-source/shims (includes = [datanucleus.log, derby.log], excludes = []) [INFO] [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ hive-shims-aggregator --- [INFO] [INFO] --- maven-antrun-plugin:1.7:run (define-classpath) @ hive-shims-aggregator --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.7:run (setup-test-dirs) @ hive-shims-aggregator --- [INFO] Executing tasks main: [mkdir] Created dir:
[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934793#comment-13934793 ] Jitendra Nath Pandey commented on HIVE-6664: Review board : https://reviews.apache.org/r/19216/ Vectorized variance computation differs from row mode computation. -- Key: HIVE-6664 URL: https://issues.apache.org/jira/browse/HIVE-6664 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6664.1.patch Following query can show the difference: select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales. The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935852#comment-13935852 ] Eric Hanson commented on HIVE-6664: --- +1 Vectorized variance computation differs from row mode computation. -- Key: HIVE-6664 URL: https://issues.apache.org/jira/browse/HIVE-6664 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6664.1.patch Following query can show the difference: select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales. The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6664) Vectorized variance computation differs from row mode computation.
[ https://issues.apache.org/jira/browse/HIVE-6664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13935860#comment-13935860 ] Eric Hanson commented on HIVE-6664: --- In general, sum/avg/variance aggregate results that involve floating point arithmetic in the sum calculation will return different answers depending on execution order. This is due the nature of floating point arithmetic, where it is easy to show examples where (a + b) + c a + (b + c). So it is probably not critical that row-mode and vector mode have results that are compatible to the last decimal place. However, the change here is simple enough and it makes for better compatibility without any serious drawbacks for performance, so I think this is fine. Vectorized variance computation differs from row mode computation. -- Key: HIVE-6664 URL: https://issues.apache.org/jira/browse/HIVE-6664 Project: Hive Issue Type: Bug Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Attachments: HIVE-6664.1.patch Following query can show the difference: select var_samp(ss_sales_price), var_pop(ss_sales_price), stddev_pop(ss_sales_price), stddev_samp(ss_sales_price) from store_sales. The reason for the difference is that row mode converts the decimal value to double upfront to calculate sum of values, when computing variance. But the vector mode performs local aggregate sum as decimal and converts into double only at flush. -- This message was sent by Atlassian JIRA (v6.2#6252)