[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-17 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS7, Line 575: // (in this case the tuple contains only a nullable double)
 : // define void @SetNotNull({ i8, double }* %tuple) {
 : // entry:
 : //   %null_byte_ptr = getelementptr inbounds { i8, double }* 
%tuple, i32 0, i32 0
 : //   %null_byte = load i8* %null_byte_ptr
 : //   %0 = and i8 %null_byte, -2
 : //   store i8 %0, i8* %null_byte_ptr
 : //   ret void
> this looks outdated.
Thanks, I'll post a follow-on to update this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-17 Thread Dan Hecht (Code Review)
Dan Hecht has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS7, Line 575: // (in this case the tuple contains only a nullable double)
 : // define void @SetNotNull({ i8, double }* %tuple) {
 : // entry:
 : //   %null_byte_ptr = getelementptr inbounds { i8, double }* 
%tuple, i32 0, i32 0
 : //   %null_byte = load i8* %null_byte_ptr
 : //   %0 = and i8 %null_byte, -2
 : //   store i8 %0, i8* %null_byte_ptr
 : //   ret void
this looks outdated.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dan Hecht 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 8: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


IMPALA-2789: More compact mem layout with null bits at the end.

There are two motivations for this change:
1. Reduce memory consumption.
2. Pave the way for full memory layout compatibility between
   Impala and Kudu to eventually enable zero-copy scans. This
   patch is a only first step towards that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

These follow-on changes are planned:
1. Planner needs to mark slots non-nullable if they correspond
   to a non-nullable Kudu column.
2. Update Kudu scan node to copy tuples with memcpy.
3. Kudu client needs to support transferring ownership of the
   tuple memory (maybe do direct and indirect buffers separately).
4. Update Kudu scan node to use memory transfer instead of copy

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Reviewed-on: http://gerrit.cloudera.org:8080/4673
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
26 files changed, 401 insertions(+), 311 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 603:   builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, 
"null_byte_ptr");
> i understand that, but i was wondering what the motivation was.
If we wanted to continue to access the null byte as a struct field, then we 
would need to compute and store that index somewhere because it's not the same 
as the null_byte_offset. It just happened to be the same before (but now is 
different). Only keeping track of one offset seemed simpler.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 603:   builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, 
"null_byte_ptr");
> Before we were accessing the null byte as a struct field via GetStructGEP()
i understand that, but i was wondering what the motivation was.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 8: Code-Review+2

Carry +2 from Marcel

-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Alex Behm (Code Review)
Hello Marcel Kornacker, Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4673

to look at the new patch set (#8).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

There are two motivations for this change:
1. Reduce memory consumption.
2. Pave the way for full memory layout compatibility between
   Impala and Kudu to eventually enable zero-copy scans. This
   patch is a only first step towards that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

These follow-on changes are planned:
1. Planner needs to mark slots non-nullable if they correspond
   to a non-nullable Kudu column.
2. Update Kudu scan node to copy tuples with memcpy.
3. Kudu client needs to support transferring ownership of the
   tuple memory (maybe do direct and indirect buffers separately).
4. Update Kudu scan node to use memory transfer instead of copy

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
26 files changed, 401 insertions(+), 311 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 603:   builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, 
"null_byte_ptr");
> why is this change necessary?
Before we were accessing the null byte as a struct field via GetStructGEP(), 
i.e. assuming that null_indicator_offset_.byte_offset indicated the field of 
the tuple struct we want to access.

I've changed this to cast the tuple to an int8_t* and then access the bull byte 
via the offset (and not as a field in the tuple struct) using 
CreateInboundsGEP().


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-16 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/7/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 603:   builder.CreateInBoundsGEP(tuple_int8_ptr, null_byte_offset, 
"null_byte_ptr");
why is this change necessary?


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-12 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 7: Code-Review+1

Thanks!

-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-12 Thread Alex Behm (Code Review)
Hello Matthew Jacobs, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4673

to look at the new patch set (#7).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

There are two motivations for this change:
1. Reduce memory consumption.
2. Pave the way for full memory layout compatibility between
   Impala and Kudu to eventually enable zero-copy scans. This
   patch is a only first step towards that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

These follow-on changes are planned:
1. Planner needs to mark slots non-nullable if they correspond
   to a non-nullable Kudu column.
2. Update Kudu scan node to copy tuples with memcpy.
3. Kudu client needs to support transferring ownership of the
   tuple memory (maybe do direct and indirect buffers separately).
4. Update Kudu scan node to use memory transfer instead of copy

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
26 files changed, 402 insertions(+), 312 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-12 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4673/6//COMMIT_MSG
Commit Message:

Line 9: The main motivation of this change is to pave the way for full
> There are 2 main motivations of this change, no?
Done


Line 12: that goal.
> Can you mention what is left after this? 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-12 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 6: Code-Review+1

(2 comments)

Thanks! LGTM, just a few comments for the commit msg. I didn't look at the 
codegen code changes in detail.

http://gerrit.cloudera.org:8080/#/c/4673/6//COMMIT_MSG
Commit Message:

Line 9: The main motivation of this change is to pave the way for full
There are 2 main motivations of this change, no?
1) More compact memory layout.
2) Zero copy (or at least avoiding conversions) with Kudu scans... (as you 
already have here)


Line 12: that goal.
Can you mention what is left after this? 
My understanding is that it is (correct me if I'm wrong):
1) planner needs to set null flags for nullable columns
2) kudu scan node can now _copy_ mem, i.e. rip out conversion code
2) kudu client needs to support mem xfer
3) kudu scan node updated to use mem xfer instead of copying
(KUDU-1695)


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 6: Code-Review+1

Carry Tim's +1

-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

Line 97: scoped_ptr fe;
> static?
Done


http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/runtime/row-batch-test.cc
File be/src/runtime/row-batch-test.cc:

Line 35: scoped_ptr fe;
> static
Done


http://gerrit.cloudera.org:8080/#/c/4673/5/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

Line 753: // For creating a test descriptor table. The tuples and their
> Comment wrapping seems weird.
Done


http://gerrit.cloudera.org:8080/#/c/4673/5/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 205:* Creates a thrift descriptor table for testing. Each entry in 
'slotTypes'
> Comment wrapping seems weird.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Hello Tim Armstrong,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4673

to look at the new patch set (#6).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

The main motivation of this change is to pave the way for full
memory layout compatibility between Impala and Kudu to eventually
enable zero-copy scans. This patch is a only first step towards
that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink-test.cc
M be/src/exec/kudu-testutil.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
28 files changed, 407 insertions(+), 315 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 5: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/benchmarks/row-batch-serialize-benchmark.cc
File be/src/benchmarks/row-batch-serialize-benchmark.cc:

Line 97: scoped_ptr fe;
static?


http://gerrit.cloudera.org:8080/#/c/4673/5/be/src/runtime/row-batch-test.cc
File be/src/runtime/row-batch-test.cc:

Line 35: scoped_ptr fe;
static


http://gerrit.cloudera.org:8080/#/c/4673/5/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

Line 753: // For creating a test descriptor table. The tuples and their
Comment wrapping seems weird.


http://gerrit.cloudera.org:8080/#/c/4673/5/fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
File fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java:

Line 205:* Creates a thrift descriptor table for testing. Each entry in 
'slotTypes'
Comment wrapping seems weird.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#5).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

The main motivation of this change is to pave the way for full
memory layout compatibility between Impala and Kudu to eventually
enable zero-copy scans. This patch is a only first step towards
that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink-test.cc
M be/src/exec/kudu-testutil.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
28 files changed, 407 insertions(+), 315 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 303: // TODO: Before committing this patch, modify the other example IR 
functions.
> IMO we shouldn't feel obliged to keep the sample IR consistent everywhere, 
Works for me. I agree that updating all example IR is a pretty time consuming 
task.


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/exec/row-batch-list-test.cc
File be/src/exec/row-batch-list-test.cc:

Line 44: scoped_ptr fe;
> Let's make this a member of RowBatchListTest and lazy-initialise in SetUp()
Done


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/collection-value-builder-test.cc
File be/src/runtime/collection-value-builder-test.cc:

Line 30: scoped_ptr fe;
> See my comment in the other test.
Made this one static as discussed.


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/row-batch-serialize-test.cc
File be/src/runtime/row-batch-serialize-test.cc:

Line 44: scoped_ptr fe;
> Same here
Moved into RowBatchSerializeTest


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/testutil/desc-tbl-builder.cc
File be/src/testutil/desc-tbl-builder.cc:

Line 47: DescriptorTbl* DescriptorTblBuilder::Build() {
> Thanks, I think this helps with test coverage too.
Seems generally saner to do it this way, thanks for the suggestion.


http://gerrit.cloudera.org:8080/#/c/4673/4/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

PS4, Line 755: TTestDescriptorTable
> TTestDescriptorTableParams?
Made this TBuildTestTableDescriptorParams to be consistent with out other 
naming.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 4:

(6 comments)

Looking pretty good

http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 303: // TODO: Before committing this patch, modify the other example IR 
functions.
> Yes. The tuple struct changed, so I'd assume there are other example IR tha
IMO we shouldn't feel obliged to keep the sample IR consistent everywhere, just 
if we're making a change to the function that the comment belongs to. Otherwise 
the overhead is pretty unreasonable.

E.g. for the LLVM 3.8.0 upgrade I would have had to update every comment.


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/exec/row-batch-list-test.cc
File be/src/exec/row-batch-list-test.cc:

Line 44: scoped_ptr fe;
Let's make this a member of RowBatchListTest and lazy-initialise in SetUp(). 
It's not worth the potential problems with global destructors.


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/collection-value-builder-test.cc
File be/src/runtime/collection-value-builder-test.cc:

Line 30: scoped_ptr fe;
See my comment in the other test.


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/runtime/row-batch-serialize-test.cc
File be/src/runtime/row-batch-serialize-test.cc:

Line 44: scoped_ptr fe;
Same here


http://gerrit.cloudera.org:8080/#/c/4673/4/be/src/testutil/desc-tbl-builder.cc
File be/src/testutil/desc-tbl-builder.cc:

Line 47: DescriptorTbl* DescriptorTblBuilder::Build() {
Thanks, I think this helps with test coverage too.


http://gerrit.cloudera.org:8080/#/c/4673/4/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

PS4, Line 755: TTestDescriptorTable
TTestDescriptorTableParams?

I initially thought this was the descriptor table itself.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#4).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

The main motivation of this change is to pave the way for full
memory layout compatibility between Impala and Kudu to eventually
enable zero-copy scans. This patch is a only first step towards
that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink-test.cc
M be/src/exec/kudu-testutil.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
28 files changed, 408 insertions(+), 315 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

Line 661:   // necessary because the fields are already aligned, so LLVM should 
not
> My understanding is that here we only care about whether LLVM will add padd
Updated comment.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#3).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

The main motivation of this change is to pave the way for full
memory layout compatibility between Impala and Kudu to eventually
enable zero-copy scans. This patch is a only first step towards
that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. Null flags are omitted for non-nullable slots. There
is no padding between tuples when stored back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/kudu-table-sink-test.cc
M be/src/exec/kudu-testutil.h
M be/src/exec/row-batch-list-test.cc
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/collection-value-builder-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch-test.cc
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/testutil/desc-tbl-builder.cc
M be/src/testutil/desc-tbl-builder.h
M common/thrift/Frontend.thrift
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
28 files changed, 406 insertions(+), 315 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-11 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 303: // TODO: Before committing this patch, modify the other example IR 
functions.
> Still relevant?
Yes. The tuple struct changed, so I'd assume there are other example IR that 
need to be corrected.


http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

Line 148: uint8_t* ptr = reinterpret_cast(tuple);
> I feel like this code is being overly clever with the pointer arithmetic. W
Done


Line 214:   uint8_t* ptr = reinterpret_cast(tuple);
> Same here
Done


http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS2, Line 603: CreateGEP
> We should actually use CreateInBoundsGEP() here (code elsewhere doesn't do 
Done


Line 661:   // necessary because the fields are already aligned, so LLVM should 
not
> Is this true? The commit message says that there's no padding, which implie
My understanding is that here we only care about whether LLVM will add padding 
to force alignment within a single struct (not across structs). Since we order 
the slots by descending size and only have power-of-two slot sizes I think that 
all slots should be aligned. Or am I missing something?

We only lose the alignment once we store several tuples back-to-back. Accesses 
to the 2nd tuple's slots may not be aligned.


http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/testutil/desc-tbl-builder.cc
File be/src/testutil/desc-tbl-builder.cc:

Line 112: int null_bit = i % 8;
> So this allocates NULL bits for non-nullable tuples but doesn't actually se
This builder is just for BE unit tests and has no way of declaring non-nullable 
slots.

Anyway, I made the change to have the FE compute the mem layouts. Either 
solution is fine with me, so let me know which version you prefer.


http://gerrit.cloudera.org:8080/#/c/4673/2/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 61:  * Slots are placed in descending order by size with trailing bytes to 
store null flags.
> Can you mention that non-nullable slots don't get null bits allocated?
Done


Line 250: numNullBytes_ = (numNullableSlots + 7) / 8;
> This doesn't match the logic in the backend, which includes all slots in th
Always computed in FE now.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-10 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..


Patch Set 2:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/exec/hdfs-scanner.cc
File be/src/exec/hdfs-scanner.cc:

Line 303: // TODO: Before committing this patch, modify the other example IR 
functions.
Still relevant?


http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/buffered-tuple-stream-test.cc
File be/src/runtime/buffered-tuple-stream-test.cc:

Line 148: uint8_t* ptr = reinterpret_cast(tuple);
I feel like this code is being overly clever with the pointer arithmetic. We 
should probably just use slot_desc->tuple_offset().


Line 214:   uint8_t* ptr = reinterpret_cast(tuple);
Same here


http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/runtime/descriptors.cc
File be/src/runtime/descriptors.cc:

PS2, Line 603: CreateGEP
We should actually use CreateInBoundsGEP() here (code elsewhere doesn't do 
that, but it should). CreateGEP() prevent LLVM from assuming much about the 
resulting pointer. I.e. can't assume that it is non-NULL or points to the same 
memory allocation.


Line 661:   // necessary because the fields are already aligned, so LLVM should 
not
Is this true? The commit message says that there's no padding, which implies 
that the fields can't always be aligned.


http://gerrit.cloudera.org:8080/#/c/4673/2/be/src/testutil/desc-tbl-builder.cc
File be/src/testutil/desc-tbl-builder.cc:

Line 112: int null_bit = i % 8;
So this allocates NULL bits for non-nullable tuples but doesn't actually set 
them?

This doesn't seem to match the frontend. It would be nice actually if we just 
got the null bit/byte layout info from the frontend so they logic wasn't 
duplicated.


http://gerrit.cloudera.org:8080/#/c/4673/2/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

Line 61:  * Slots are placed in descending order by size with trailing bytes to 
store null flags.
Can you mention that non-nullable slots don't get null bits allocated?


Line 250: numNullBytes_ = (numNullableSlots + 7) / 8;
This doesn't match the logic in the backend, which includes all slots in the 
calculation.


-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-2789: More compact mem layout with null bits at the end.

2016-10-10 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-2789: More compact mem layout with null bits at the end.
..

IMPALA-2789: More compact mem layout with null bits at the end.

The main motivation of this change is to pave the way for full
memory layout compatibility between Impala and Kudu to eventually
enable zero-copy scans. This patch is a only first step towards
that goal.

New Memory Layout
Slots are placed in descending order by size with trailing bytes to
store null flags. There is no padding between tuples when stored
back-to-back in a row batch.

Example: select bool_col, int_col, string_col, smallint_col
 from functional.alltypes
Slots:   string_col|int_col|smallint_col|bool_col|null_byte
Offsets: 0  16  20   22   23

The main change is to move the null indicators to the end of tuples.
The new memory layout is fully packed with no padding in between
slots or tuples.

Performance:
Our standard cluster perf tests showed no significant difference in
query response times as well as consumed cycles, and a slight
reduction in peak memory consumption.

Testing:
An exhaustive test run passed. Ran a few select tests like TPC-H/DS
with ASAN locally.

Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/exec/text-converter.cc
M be/src/runtime/buffered-tuple-stream-test.cc
M be/src/runtime/descriptors.cc
M be/src/runtime/descriptors.h
M be/src/runtime/tuple.cc
M be/src/runtime/tuple.h
M be/src/testutil/desc-tbl-builder.cc
M fe/src/main/java/org/apache/impala/analysis/DescriptorTable.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
16 files changed, 194 insertions(+), 186 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/73/4673/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6510c75d841bddafa6638f1bd2ac6731a7053f6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm