[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS7, Line 575: // (in this case the tuple contains only a nullable double) : // define void @SetNotNull({ i8, double }* %tuple) { : // entry: : // %null_byte_ptr = getelementptr inbounds { i8, double }* %tuple, i32 0, i32 0 : // %null_byte = load i8* %null_byte_ptr : // %0 = and i8 %null_byte, -2 : // store i8 %0, i8* %null_byte_ptr : // ret void > this looks outdated. Thanks, I'll post a follow-on to update this. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Dan Hecht has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS7, Line 575: // (in this case the tuple contains only a nullable double) : // define void @SetNotNull({ i8, double }* %tuple) { : // entry: : // %null_byte_ptr = getelementptr inbounds { i8, double }* %tuple, i32 0, i32 0 : // %null_byte = load i8* %null_byte_ptr : // %0 = and i8 %null_byte, -2 : // store i8 %0, i8* %null_byte_ptr : // ret void this looks outdated. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. There are two motivations for this change: 1. Reduce memory consumption. 2. Pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. These follow-on changes are planned: 1. Planner needs to mark slots non-nullable if they correspond to a non-nullable Kudu column. 2. Update Kudu scan node to copy tuples with memcpy. 3. Kudu client needs to support transferring ownership of the tuple memory (maybe do direct and indirect buffers separately). 4. Update Kudu scan node to use memory transfer instead of copy Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Reviewed-on: http://gerrit.cloudera.org:8080/4673 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 26 files changed, 401 insertions(+), 311 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 603: builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, "null_byte_ptr"); > i understand that, but i was wondering what the motivation was. If we wanted to continue to access the null byte as a struct field, then we would need to compute and store that index somewhere because it's not the same as the null_byte_offset. It just happened to be the same before (but now is different). Only keeping track of one offset seemed simpler. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 603: builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, "null_byte_ptr"); > Before we were accessing the null byte as a struct field via GetStructGEP() i understand that, but i was wondering what the motivation was. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 8: Code-Review+2 Carry +2 from Marcel -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Hello Marcel Kornacker, Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4673 to look at the new patch set (#8). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. There are two motivations for this change: 1. Reduce memory consumption. 2. Pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. These follow-on changes are planned: 1. Planner needs to mark slots non-nullable if they correspond to a non-nullable Kudu column. 2. Update Kudu scan node to copy tuples with memcpy. 3. Kudu client needs to support transferring ownership of the tuple memory (maybe do direct and indirect buffers separately). 4. Update Kudu scan node to use memory transfer instead of copy Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 26 files changed, 401 insertions(+), 311 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/8 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 603: builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, "null_byte_ptr"); > why is this change necessary? Before we were accessing the null byte as a struct field via GetStructGEP(), i.e. assuming that null_indicator_offset_.byte_offset indicated the field of the tuple struct we want to access. I've changed this to cast the tuple to an int8_t* and then access the bull byte via the offset (and not as a field in the tuple struct) using CreateInboundsGEP(). -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 603: builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, "null_byte_ptr"); why is this change necessary? -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 7: Code-Review+1 Thanks! -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Hello Matthew Jacobs, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4673 to look at the new patch set (#7). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. There are two motivations for this change: 1. Reduce memory consumption. 2. Pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. These follow-on changes are planned: 1. Planner needs to mark slots non-nullable if they correspond to a non-nullable Kudu column. 2. Update Kudu scan node to copy tuples with memcpy. 3. Kudu client needs to support transferring ownership of the tuple memory (maybe do direct and indirect buffers separately). 4. Update Kudu scan node to use memory transfer instead of copy Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 26 files changed, 402 insertions(+), 312 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/7 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/4673/6//COMMIT_MSG Commit Message: Line 9: The main motivation of this change is to pave the way for full > There are 2 main motivations of this change, no? Done Line 12: that goal. > Can you mention what is left after this? Done -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 6: Code-Review+1 (2 comments) Thanks! LGTM, just a few comments for the commit msg. I didn't look at the codegen code changes in detail. http://gerrit.cloudera.org:8080/#/c/4673/6//COMMIT_MSG Commit Message: Line 9: The main motivation of this change is to pave the way for full There are 2 main motivations of this change, no? 1) More compact memory layout. 2) Zero copy (or at least avoiding conversions) with Kudu scans... (as you already have here) Line 12: that goal. Can you mention what is left after this? My understanding is that it is (correct me if I'm wrong): 1) planner needs to set null flags for nullable columns 2) kudu scan node can now _copy_ mem, i.e. rip out conversion code 2) kudu client needs to support mem xfer 3) kudu scan node updated to use mem xfer instead of copying (KUDU-1695) -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 6: Code-Review+1 Carry Tim's +1 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/benchmarks/row-batch-serialize-benchmark.cc File be/src/benchmarks/row-batch-serialize-benchmark.cc: Line 97: scoped_ptr fe; > static? Done http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/runtime/row-batch-test.cc File be/src/runtime/row-batch-test.cc: Line 35: scoped_ptr fe; > static Done http://gerrit.cloudera.org:8080/#/c/4673/5/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 753: // For creating a test descriptor table. The tuples and their > Comment wrapping seems weird. Done http://gerrit.cloudera.org:8080/#/c/4673/5/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 205:* Creates a thrift descriptor table for testing. Each entry in 'slotTypes' > Comment wrapping seems weird. Done -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4673 to look at the new patch set (#6). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. The main motivation of this change is to pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink-test.cc M be/src/exec/kudu-testutil.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 28 files changed, 407 insertions(+), 315 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/6 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 5: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/benchmarks/row-batch-serialize-benchmark.cc File be/src/benchmarks/row-batch-serialize-benchmark.cc: Line 97: scoped_ptr fe; static? http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/runtime/row-batch-test.cc File be/src/runtime/row-batch-test.cc: Line 35: scoped_ptr fe; static http://gerrit.cloudera.org:8080/#/c/4673/5/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: Line 753: // For creating a test descriptor table. The tuples and their Comment wrapping seems weird. http://gerrit.cloudera.org:8080/#/c/4673/5/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java: Line 205:* Creates a thrift descriptor table for testing. Each entry in 'slotTypes' Comment wrapping seems weird. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has uploaded a new patch set (#5). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. The main motivation of this change is to pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink-test.cc M be/src/exec/kudu-testutil.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 28 files changed, 407 insertions(+), 315 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/5 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 303: // TODO: Before committing this patch, modify the other example IR functions. > IMO we shouldn't feel obliged to keep the sample IR consistent everywhere, Works for me. I agree that updating all example IR is a pretty time consuming task. http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/exec/row-batch-list-test.cc File be/src/exec/row-batch-list-test.cc: Line 44: scoped_ptr fe; > Let's make this a member of RowBatchListTest and lazy-initialise in SetUp() Done http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/collection-value-builder-test.cc File be/src/runtime/collection-value-builder-test.cc: Line 30: scoped_ptr fe; > See my comment in the other test. Made this one static as discussed. http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/row-batch-serialize-test.cc File be/src/runtime/row-batch-serialize-test.cc: Line 44: scoped_ptr fe; > Same here Moved into RowBatchSerializeTest http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/testutil/desc-tbl-builder.cc File be/src/testutil/desc-tbl-builder.cc: Line 47: DescriptorTbl* DescriptorTblBuilder::Build() { > Thanks, I think this helps with test coverage too. Seems generally saner to do it this way, thanks for the suggestion. http://gerrit.cloudera.org:8080/#/c/4673/4/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: PS4, Line 755: TTestDescriptorTable > TTestDescriptorTableParams? Made this TBuildTestTableDescriptorParams to be consistent with out other naming. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 4: (6 comments) Looking pretty good http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 303: // TODO: Before committing this patch, modify the other example IR functions. > Yes. The tuple struct changed, so I'd assume there are other example IR tha IMO we shouldn't feel obliged to keep the sample IR consistent everywhere, just if we're making a change to the function that the comment belongs to. Otherwise the overhead is pretty unreasonable. E.g. for the LLVM 3.8.0 upgrade I would have had to update every comment. http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/exec/row-batch-list-test.cc File be/src/exec/row-batch-list-test.cc: Line 44: scoped_ptr fe; Let's make this a member of RowBatchListTest and lazy-initialise in SetUp(). It's not worth the potential problems with global destructors. http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/collection-value-builder-test.cc File be/src/runtime/collection-value-builder-test.cc: Line 30: scoped_ptr fe; See my comment in the other test. http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/row-batch-serialize-test.cc File be/src/runtime/row-batch-serialize-test.cc: Line 44: scoped_ptr fe; Same here http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/testutil/desc-tbl-builder.cc File be/src/testutil/desc-tbl-builder.cc: Line 47: DescriptorTbl* DescriptorTblBuilder::Build() { Thanks, I think this helps with test coverage too. http://gerrit.cloudera.org:8080/#/c/4673/4/common/thrift/Frontend.thrift File common/thrift/Frontend.thrift: PS4, Line 755: TTestDescriptorTable TTestDescriptorTableParams? I initially thought this was the descriptor table itself. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has uploaded a new patch set (#4). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. The main motivation of this change is to pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink-test.cc M be/src/exec/kudu-testutil.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 28 files changed, 408 insertions(+), 315 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/4 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: Line 661: // necessary because the fields are already aligned, so LLVM should not > My understanding is that here we only care about whether LLVM will add padd Updated comment. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has uploaded a new patch set (#3). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. The main motivation of this change is to pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. Null flags are omitted for non-nullable slots. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/benchmarks/row-batch-serialize-benchmark.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/kudu-table-sink-test.cc M be/src/exec/kudu-testutil.h M be/src/exec/row-batch-list-test.cc M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/collection-value-builder-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/row-batch-serialize-test.cc M be/src/runtime/row-batch-test.cc M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/service/frontend.cc M be/src/service/frontend.h M be/src/testutil/desc-tbl-builder.cc M be/src/testutil/desc-tbl-builder.h M common/thrift/Frontend.thrift M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/main/java/org/apache/impala/service/JniFrontend.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 28 files changed, 406 insertions(+), 315 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/3 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 303: // TODO: Before committing this patch, modify the other example IR functions. > Still relevant? Yes. The tuple struct changed, so I'd assume there are other example IR that need to be corrected. http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: Line 148: uint8_t* ptr = reinterpret_cast(tuple); > I feel like this code is being overly clever with the pointer arithmetic. W Done Line 214: uint8_t* ptr = reinterpret_cast(tuple); > Same here Done http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS2, Line 603: CreateGEP > We should actually use CreateInBoundsGEP() here (code elsewhere doesn't do Done Line 661: // necessary because the fields are already aligned, so LLVM should not > Is this true? The commit message says that there's no padding, which implie My understanding is that here we only care about whether LLVM will add padding to force alignment within a single struct (not across structs). Since we order the slots by descending size and only have power-of-two slot sizes I think that all slots should be aligned. Or am I missing something? We only lose the alignment once we store several tuples back-to-back. Accesses to the 2nd tuple's slots may not be aligned. http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/testutil/desc-tbl-builder.cc File be/src/testutil/desc-tbl-builder.cc: Line 112: int null_bit = i % 8; > So this allocates NULL bits for non-nullable tuples but doesn't actually se This builder is just for BE unit tests and has no way of declaring non-nullable slots. Anyway, I made the change to have the FE compute the mem layouts. Either solution is fine with me, so let me know which version you prefer. http://gerrit.cloudera.org:8080/#/c/4673/2/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 61: * Slots are placed in descending order by size with trailing bytes to store null flags. > Can you mention that non-nullable slots don't get null bits allocated? Done Line 250: numNullBytes_ = (numNullableSlots + 7) / 8; > This doesn't match the logic in the backend, which includes all slots in th Always computed in FE now. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc File be/src/exec/hdfs-scanner.cc: Line 303: // TODO: Before committing this patch, modify the other example IR functions. Still relevant? http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/buffered-tuple-stream-test.cc File be/src/runtime/buffered-tuple-stream-test.cc: Line 148: uint8_t* ptr = reinterpret_cast(tuple); I feel like this code is being overly clever with the pointer arithmetic. We should probably just use slot_desc->tuple_offset(). Line 214: uint8_t* ptr = reinterpret_cast(tuple); Same here http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/descriptors.cc File be/src/runtime/descriptors.cc: PS2, Line 603: CreateGEP We should actually use CreateInBoundsGEP() here (code elsewhere doesn't do that, but it should). CreateGEP() prevent LLVM from assuming much about the resulting pointer. I.e. can't assume that it is non-NULL or points to the same memory allocation. Line 661: // necessary because the fields are already aligned, so LLVM should not Is this true? The commit message says that there's no padding, which implies that the fields can't always be aligned. http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/testutil/desc-tbl-builder.cc File be/src/testutil/desc-tbl-builder.cc: Line 112: int null_bit = i % 8; So this allocates NULL bits for non-nullable tuples but doesn't actually set them? This doesn't seem to match the frontend. It would be nice actually if we just got the null bit/byte layout info from the frontend so they logic wasn't duplicated. http://gerrit.cloudera.org:8080/#/c/4673/2/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java: Line 61: * Slots are placed in descending order by size with trailing bytes to store null flags. Can you mention that non-nullable slots don't get null bits allocated? Line 250: numNullBytes_ = (numNullableSlots + 7) / 8; This doesn't match the logic in the backend, which includes all slots in the calculation. -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.
Alex Behm has uploaded a new patch set (#2). Change subject: IMPALA-2789: More compact mem layout with null bits at the end. .. IMPALA-2789: More compact mem layout with null bits at the end. The main motivation of this change is to pave the way for full memory layout compatibility between Impala and Kudu to eventually enable zero-copy scans. This patch is a only first step towards that goal. New Memory Layout Slots are placed in descending order by size with trailing bytes to store null flags. There is no padding between tuples when stored back-to-back in a row batch. Example: select bool_col, int_col, string_col, smallint_col from functional.alltypes Slots: string_col|int_col|smallint_col|bool_col|null_byte Offsets: 0 16 20 22 23 The main change is to move the null indicators to the end of tuples. The new memory layout is fully packed with no padding in between slots or tuples. Performance: Our standard cluster perf tests showed no significant difference in query response times as well as consumed cycles, and a slight reduction in peak memory consumption. Testing: An exhaustive test run passed. Ran a few select tests like TPC-H/DS with ASAN locally. Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-scanner.h M be/src/exec/kudu-scanner.cc M be/src/exec/kudu-scanner.h M be/src/exec/text-converter.cc M be/src/runtime/buffered-tuple-stream-test.cc M be/src/runtime/descriptors.cc M be/src/runtime/descriptors.h M be/src/runtime/tuple.cc M be/src/runtime/tuple.h M be/src/testutil/desc-tbl-builder.cc M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java 16 files changed, 194 insertions(+), 186 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/2 -- To view, visit http://gerrit.cloudera.org:8080/4673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm