[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-07 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471496#comment-15471496
 ] 

Matt McCline commented on HIVE-14451:
-

Committed to master.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch, HIVE-14451.04.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-06 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468868#comment-15468868
 ] 

Matt McCline commented on HIVE-14451:
-

[~sershe] Thank you for your review!

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch, HIVE-14451.04.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468086#comment-15468086
 ] 

Sergey Shelukhin commented on HIVE-14451:
-

+1

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch, HIVE-14451.04.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-03 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15462332#comment-15462332
 ] 

Matt McCline commented on HIVE-14451:
-

Test failures are unrelated.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch, HIVE-14451.04.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-03 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15462291#comment-15462291
 ] 

Hive QA commented on HIVE-14451:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826959/HIVE-14451.04.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10438 tests 
executed
*Failed tests:*
{noformat}
TestBeeLineWithArgs - did not produce a TEST-*.xml file
TestHiveCli - did not produce a TEST-*.xml file
TestSparkNegativeCliDriver - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1107/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1107/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1107/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826959 - PreCommit-HIVE-MASTER-Build

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch, HIVE-14451.04.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-02 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15460028#comment-15460028
 ] 

Matt McCline commented on HIVE-14451:
-

Thank you for the review comments.

Yes, this will apply to both the BytesBytesMultiHashMap (so-called Optimized) 
and the fast hash map since the native Vector MapJoin operators are general to 
both.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-02 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459958#comment-15459958
 ] 

Sergey Shelukhin commented on HIVE-14451:
-

Some comments on RB, mostly about documentation.
One question - is this supposed to also apply to BytesBytes... hashtable? 

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, 
> HIVE-14451.03.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-09-01 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455285#comment-15455285
 ] 

Hive QA commented on HIVE-14451:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12826543/HIVE-14451.02.patch

{color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10436 tests 
executed
*Failed tests:*
{noformat}
TestBeeLineWithArgs - did not produce a TEST-*.xml file
TestHiveCli - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1069/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1069/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1069/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12826543 - PreCommit-HIVE-MASTER-Build

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-08-31 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454112#comment-15454112
 ] 

Matt McCline commented on HIVE-14451:
-

There are 2 improvements in the patch.

First, when the input bytes being deserialized are immutable and it is safe to 
retain references (e.g. hash table entry), the VectorDeserializeRow has an 
alternate deserializeByRef method than can be called.  This avoids an 
unnecessary buffer copy operation.

Also, when BinarySortable and LazySimple have to "unescape" data in the input 
buffer to produce the string/char/varchar/binary result, a preallocation scheme 
is used where the (scratch) buffer in BytesColumnVector is made available to be 
used directly as the target buffer.  This avoids an extra buffer copy operation.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-08-31 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454101#comment-15454101
 ] 

Matt McCline commented on HIVE-14451:
-

This should improve performance for TEXT (LazySimple) and non-TEXT 
(BinarySortable) [~gopalv] [~ndembla]

Thank you Gopal for the observing this improvement possibility.



> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-08-26 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440776#comment-15440776
 ] 

Hive QA commented on HIVE-14451:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12825631/HIVE-14451.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10463 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas]
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char]
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning]
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3]
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1014/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1014/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1014/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12825631 - PreCommit-HIVE-MASTER-Build

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-08-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439667#comment-15439667
 ] 

Matt McCline commented on HIVE-14451:
-

Instead of writing the new bytes data to a Text object, I think I'll add an 
option to append that data to a caller provided byte[] buffer and nextFree 
offset.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-08-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438648#comment-15438648
 ] 

Matt McCline commented on HIVE-14451:
-

Includes logic described by HIVE-14452.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow

2016-08-26 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438646#comment-15438646
 ] 

Matt McCline commented on HIVE-14451:
-

Giving this a shot.

Ran: mvn test -Dtest=TestVectorSerDeRow

Tests probably need to add escaped strings.  And the tests should call new 
deserializeByRef method.

> Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
> --
>
> Key: HIVE-14451
> URL: https://issues.apache.org/jira/browse/HIVE-14451
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Reporter: Gopal V
>Assignee: Matt McCline
> Attachments: HIVE-14451.01.patch
>
>
> In a majority of cases, when using the OptimizedHashMap, the references to 
> the byte[] are immutable. 
> The hashmap result always allocates on boundary conditions, but never mutates 
> a previous buffer.
> Copying Strings out of the hashtable is entirely wasteful and it would be 
> easy to know when the currentBytes is a borrowed slice from the original 
> input.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)