[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884594#comment-15884594 ] Teddy Choi commented on HIVE-15743: --- [~sershe], please proceed. Thank you. > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, > HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881743#comment-15881743 ] Sergey Shelukhin commented on HIVE-15743: - Hmm... should this be committed? > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, > HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854630#comment-15854630 ] Sergey Shelukhin commented on HIVE-15743: - +1 cc [~gopalv] > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, > HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853106#comment-15853106 ] Hive QA commented on HIVE-15743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12851060/HIVE-15743.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10225 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table] (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple] (batchId=153) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=230) org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption (batchId=277) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3381/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3381/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3381/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12851060 - PreCommit-HIVE-Build > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, > HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851921#comment-15851921 ] Hive QA commented on HIVE-15743: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12850813/HIVE-15743.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10212 tests executed *Failed tests:* {noformat} TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) (batchId=235) org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys] (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver (batchId=162) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=223) org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption (batchId=278) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3357/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3357/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3357/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12850813 - PreCommit-HIVE-Build > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, > HIVE-15743.3.patch, tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849604#comment-15849604 ] Teddy Choi commented on HIVE-15743: --- [~gopalv], that's a practical idea! > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15842355#comment-15842355 ] Gopal V commented on HIVE-15743: The CPU cache counters for this issue is available here http://people.apache.org/~gopalv/llap-perf.tar.bz2 > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840955#comment-15840955 ] Matt McCline commented on HIVE-15743: - Wow that is a high percentage! Ya, I looked at some of those Java runtime classes recently (when I was doing FastDecimal) and it is doable. All numeric related characters are UTF-8. There is considerable magic code around exponents, etc in them though but because of that I doubt few people are brave enough to change it. So, our transformed version would be "relatively safe". > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse
[ https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840937#comment-15840937 ] Sergey Shelukhin commented on HIVE-15743: - cc [~mmccline] We can probably just c/p parts of FloatingDecimal - merge the parsing and doubleValue, and change them to operate on byte array. It only needs to recognize like 6-8 letters aside from normal numeric stuff, so we should be safe since we always use utf8 > vectorized text parsing: speed up double parse > -- > > Key: HIVE-15743 > URL: https://issues.apache.org/jira/browse/HIVE-15743 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: tpch-without.png > > > {noformat} > Double.parseDouble( > new String(bytes, fieldStart, fieldLength, > StandardCharsets.UTF_8));{noformat} > This takes ~25% of the query time in some cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)