[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15471496#comment-15471496 ] Matt McCline commented on HIVE-14451: - Committed to master. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch, HIVE-14451.04.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468868#comment-15468868 ] Matt McCline commented on HIVE-14451: - [~sershe] Thank you for your review! > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch, HIVE-14451.04.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15468086#comment-15468086 ] Sergey Shelukhin commented on HIVE-14451: - +1 > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch, HIVE-14451.04.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15462332#comment-15462332 ] Matt McCline commented on HIVE-14451: - Test failures are unrelated. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch, HIVE-14451.04.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15462291#comment-15462291 ] Hive QA commented on HIVE-14451: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12826959/HIVE-14451.04.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10438 tests executed *Failed tests:* {noformat} TestBeeLineWithArgs - did not produce a TEST-*.xml file TestHiveCli - did not produce a TEST-*.xml file TestSparkNegativeCliDriver - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1107/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1107/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1107/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12826959 - PreCommit-HIVE-MASTER-Build > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch, HIVE-14451.04.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15460028#comment-15460028 ] Matt McCline commented on HIVE-14451: - Thank you for the review comments. Yes, this will apply to both the BytesBytesMultiHashMap (so-called Optimized) and the fast hash map since the native Vector MapJoin operators are general to both. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15459958#comment-15459958 ] Sergey Shelukhin commented on HIVE-14451: - Some comments on RB, mostly about documentation. One question - is this supposed to also apply to BytesBytes... hashtable? > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch, > HIVE-14451.03.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455285#comment-15455285 ] Hive QA commented on HIVE-14451: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12826543/HIVE-14451.02.patch {color:green}SUCCESS:{color} +1 due to 4 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10436 tests executed *Failed tests:* {noformat} TestBeeLineWithArgs - did not produce a TEST-*.xml file TestHiveCli - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] org.apache.hive.jdbc.TestJdbcWithMiniHS2.testAddJarConstructorUnCaching {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1069/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1069/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1069/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12826543 - PreCommit-HIVE-MASTER-Build > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454112#comment-15454112 ] Matt McCline commented on HIVE-14451: - There are 2 improvements in the patch. First, when the input bytes being deserialized are immutable and it is safe to retain references (e.g. hash table entry), the VectorDeserializeRow has an alternate deserializeByRef method than can be called. This avoids an unnecessary buffer copy operation. Also, when BinarySortable and LazySimple have to "unescape" data in the input buffer to produce the string/char/varchar/binary result, a preallocation scheme is used where the (scratch) buffer in BytesColumnVector is made available to be used directly as the target buffer. This avoids an extra buffer copy operation. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454101#comment-15454101 ] Matt McCline commented on HIVE-14451: - This should improve performance for TEXT (LazySimple) and non-TEXT (BinarySortable) [~gopalv] [~ndembla] Thank you Gopal for the observing this improvement possibility. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch, HIVE-14451.02.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440776#comment-15440776 ] Hive QA commented on HIVE-14451: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12825631/HIVE-14451.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10463 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.org.apache.hadoop.hive.cli.TestCliDriver org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_mapjoin] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ctas] org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_join_part_col_char] org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[acid_bucket_pruning] org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1014/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/1014/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-1014/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12825631 - PreCommit-HIVE-MASTER-Build > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15439667#comment-15439667 ] Matt McCline commented on HIVE-14451: - Instead of writing the new bytes data to a Text object, I think I'll add an option to append that data to a caller provided byte[] buffer and nextFree offset. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438648#comment-15438648 ] Matt McCline commented on HIVE-14451: - Includes logic described by HIVE-14452. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14451) Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow
[ https://issues.apache.org/jira/browse/HIVE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438646#comment-15438646 ] Matt McCline commented on HIVE-14451: - Giving this a shot. Ran: mvn test -Dtest=TestVectorSerDeRow Tests probably need to add escaped strings. And the tests should call new deserializeByRef method. > Vectorization: Add byRef mode for borrowed Strings in VectorDeserializeRow > -- > > Key: HIVE-14451 > URL: https://issues.apache.org/jira/browse/HIVE-14451 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-14451.01.patch > > > In a majority of cases, when using the OptimizedHashMap, the references to > the byte[] are immutable. > The hashmap result always allocates on boundary conditions, but never mutates > a previous buffer. > Copying Strings out of the hashtable is entirely wasteful and it would be > easy to know when the currentBytes is a borrowed slice from the original > input. -- This message was sent by Atlassian JIRA (v6.3.4#6332)