[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-02-25 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884594#comment-15884594
 ] 

Teddy Choi commented on HIVE-15743:
---

[~sershe], please proceed. Thank you.

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, 
> HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-02-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15881743#comment-15881743
 ] 

Sergey Shelukhin commented on HIVE-15743:
-

Hmm... should this be committed?

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, 
> HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-02-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15854630#comment-15854630
 ] 

Sergey Shelukhin commented on HIVE-15743:
-

+1 cc [~gopalv]

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, 
> HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-02-04 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853106#comment-15853106
 ] 

Hive QA commented on HIVE-15743:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851060/HIVE-15743.4.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10225 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
 (batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_varchar_simple]
 (batchId=153)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
 (batchId=277)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3381/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3381/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3381/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851060 - PreCommit-HIVE-Build

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, 
> HIVE-15743.3.patch, HIVE-15743.4.patch, tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-02-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15851921#comment-15851921
 ] 

Hive QA commented on HIVE-15743:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12850813/HIVE-15743.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10212 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=162)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.llap.daemon.impl.TestTaskExecutorService.testWaitQueuePreemption
 (batchId=278)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3357/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3357/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3357/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12850813 - PreCommit-HIVE-Build

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, 
> HIVE-15743.3.patch, tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-02-01 Thread Teddy Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15849604#comment-15849604
 ] 

Teddy Choi commented on HIVE-15743:
---

[~gopalv], that's a practical idea!

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Teddy Choi
> Attachments: HIVE-15743.1.patch, HIVE-15743.2.patch, tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15842355#comment-15842355
 ] 

Gopal V commented on HIVE-15743:


The CPU cache counters for this issue is available here 

http://people.apache.org/~gopalv/llap-perf.tar.bz2

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840955#comment-15840955
 ] 

Matt McCline commented on HIVE-15743:
-

Wow that is a high percentage!

Ya, I looked at some of those Java runtime classes recently (when I was doing 
FastDecimal) and it is doable.  All numeric related characters are UTF-8.   
There is considerable magic code around exponents, etc in them though but 
because of that I doubt few people are brave enough to change it.  So, our 
transformed version would be "relatively safe".

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-15743) vectorized text parsing: speed up double parse

2017-01-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15840937#comment-15840937
 ] 

Sergey Shelukhin commented on HIVE-15743:
-

cc [~mmccline]

We can probably just c/p parts of FloatingDecimal - merge the parsing and 
doubleValue, and change them to operate on byte array. It only needs to 
recognize like 6-8 letters aside from normal numeric stuff, so we should be 
safe since we always use utf8

> vectorized text parsing: speed up double parse
> --
>
> Key: HIVE-15743
> URL: https://issues.apache.org/jira/browse/HIVE-15743
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
> Attachments: tpch-without.png
>
>
> {noformat}
> Double.parseDouble(
> new String(bytes, fieldStart, fieldLength, 
> StandardCharsets.UTF_8));{noformat}
> This takes ~25% of the query time in some cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)