subject:"\[Impala\-ASF\-CR\] IMPALA\-4123\: Fast bit unpacking"

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Internal Jenkins (Code Review)

Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4123: Fast bit unpacking
..


IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Reviewed-on: http://gerrit.cloudera.org:8080/4494
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 929 insertions(+), 84 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Internal Jenkins (Code Review)

Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 15: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 132: 
> No seed is the same as a single default seed for this type.
Oh ok, this should be ok too then.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 15: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Hello Jim Apple,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4494

to look at the new patch set (#14).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 929 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 13: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 132: 
> Done but didn't seed it since flakiness would only happen if we have a bad 
No seed is the same as a single default seed for this type.


http://gerrit.cloudera.org:8080/#/c/4494/13/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS13, Line 137: 0, numeric_limits::max()
these are the default arguments to the constructor.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#13).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 929 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 12:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 19: #include 
> cstd...
Done


Line 132: 
> rand() is only guaranteed to have 15 non-zero bits - the high 17 bits may a
Done but didn't seed it since flakiness would only happen if we have a bad 
product bug with incorrect results.


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS11, Line 74:   /// Implementation of Unpack32Values() that uses 32-bit 
integer loads to
 :   /// unpack values with the
> Did you check to see that this benchmarks to be actually faster than if BIT
Looks like it might be slightly faster (although within the margin of error) to 
do the switch on bit_width within Unpack32Values() and UnpackUpTo32Values(). So 
I'll go with that.


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 57:   return std::make_pair(in_pos, values_to_read);
> After some reading, I think you can and should throw a static on these cons
For classes it makes sense - if you have a non-static constexpr in a class 
definition then the compiler almost certainly has to allocate storage in the 
memory layout of the object (since you could pass the object into third-party 
code, which could take the address of the member).

I don't think we want to do that for local variables. I don't see any benefit  
- the compiler shouldn't have any problem fully propagating the constants, and 
there's no reason it would allocate storage locally unless we take the address 
of the variable.


Line 140: case i: return Unpack32Values(in, in_bytes, out);
> constexpr BYTES_TO_READ, if you mark BitUtil::RoundUpNumBytes constexpr, or
Done. Marked a few other functions in BitUtil constexpr for consistency.


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-stream-utils.h
File be/src/util/bit-stream-utils.h:

Line 100:   /// 'buffer' is the buffer to read from.  The buffer's length is 
'buffer_len'.
> While you're here, can you add "Does not take ownership of the pointer"?
Done


Line 142:   /// Maximum byte length of a vlq encoded int
> While you're here, can you DISALLOW_COPY_AND_ASSIGN?
I believe we copy them in a couple of places. It's actually harmless and maybe 
useful since you can fork the state of the reader. I added a comment to 
document that it was deliberately left defined.


http://gerrit.cloudera.org:8080/#/c/4494/12/be/src/util/openssl-util-test.cc
File be/src/util/openssl-util-test.cc:

Line 27: using std::uniform_int_distribution;
Switch from boost for consistency


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#12).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 928 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-15 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 57:   constexpr int BATCH_SIZE = 32;
After some reading, I think you can and should throw a static on these 
constexprs.

For instance, 
http://stackoverflow.com/questions/26152096/when-and-why-would-you-use-static-with-constexpr


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-14 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 140:   const int bytes_to_read = BitUtil::RoundUpNumBytes(32 * BIT_WIDTH);
constexpr BYTES_TO_READ, if you mark BitUtil::RoundUpNumBytes constexpr, or if 
you write BIT_WIDTH*32/CHAR_BIT instead


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-14 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 19: #include 
cstd...


Line 132:   std::generate(std::begin(in), std::end(in), rand);
rand() is only guaranteed to have 15 non-zero bits - the high 17 bits may all 
be 0 each time.

More modern rand, and also seeded to prevent flaky tests: 

std::mt19937 pseudorandom(0);
std::uniform_int_distribution uniform;
std::generate(std::begin(in), std::end(in),
  [&pseudorandom, &uniform]() { return uniform(pseudorandom); });


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS11, Line 74:   /// Templated versions of UnpackValues() that can be used if 
BIT_WIDTH is a
 :   /// compile-time constant.
Did you check to see that this benchmarks to be actually faster than if 
BIT_WIDTH is not a template parameter?


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-stream-utils.h
File be/src/util/bit-stream-utils.h:

Line 100:   /// 'buffer' is the buffer to read from.  The buffer's length is 
'buffer_len'.
While you're here, can you add "Does not take ownership of the pointer"?


Line 142:   const uint8_t* buffer_;
While you're here, can you DISALLOW_COPY_AND_ASSIGN?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4494/10/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS10, Line 39: it
> them
Done


PS10, Line 39: Both
> Both buffers
Done


Line 60:   uint8_t* packed = storage.data() + aligned;
> If aligned is true, then packed is not aligned
Done


http://gerrit.cloudera.org:8080/#/c/4494/10/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 41: #define UNPACK_VALUES_CASE(ignore1, i, ignore2) \
> Here is a way to do this without any macros, but it's a bit heavyweight fro
IMO the macro is easier to understand because the macro bears some relation to 
the code being stamped out. Also you have a better chance of debugging it by 
looking at the preprocessed output. I think the macro makes sense in this case 
since we're really just trying to stamp out repetitive code instead of build an 
abstraction.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#11).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 886 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4494/10/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS10, Line 39: it
them


PS10, Line 39: Both
Both buffers


Line 60:   uint8_t* packed = storage.data() + aligned;
If aligned is true, then packed is not aligned


http://gerrit.cloudera.org:8080/#/c/4494/10/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 41: #define UNPACK_VALUES_CASE(ignore1, i, ignore2) \
Here is a way to do this without any macros, but it's a bit heavyweight from a 
syntax POV. I found it was roughly the same speed as the macro approach:

--- a/be/src/util/bit-packing.inline.h
+++ b/be/src/util/bit-packing.inline.h
@@ -31,10 +31,33 @@
 
 namespace impala {
 
+template 
+constexpr std::array), sizeof...(I)> MakeArray(
+std::integer_sequence) {
+  return {&T::template f...};
+}
+
+template 
+struct BitPackStruct {
+  template 
+  static auto f(const uint8_t* __restrict__ in, int64_t in_bytes, int64_t 
num_values,
+  OutType* __restrict__ out) {
+return BitPacking::UnpackValues(in, in_bytes, num_values, 
out);
+  }
+};
+
 template 
 std::pair BitPacking::UnpackValues(int bit_width,
 const uint8_t* __restrict__ in, int64_t in_bytes, int64_t num_values,
 OutType* __restrict__ out) {
+  static constexpr auto UNPACKERS =
+  MakeArray>(std::make_integer_sequence());
+  if (bit_width < 0 || bit_width > 32) {
+DCHECK(false);
+return {nullptr, 0};
+  }
+  return UNPACKERS[bit_width](in, in_bytes, num_values, out);

Similar things can be done below. Line 146 can be done without 
std::make_integer_sequence.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 884 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 7:

(3 comments)

Sorry about that - I wish there was a good way to see whether I've responded to 
all of the comments on different patchset verisons.

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 42: misaligned
> I think it would help the reader here to spell out that misaligned is with 
Done


PS7, Line 132: min(length, 1)
> Still, if we're only allowing those two you could switch it to a bool 'misa
Made the change, then switched to 'aligned' to avoid having to think about a 
double negative.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 69: in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
> Should there be an additional test that would demonstrate that bug?
Running the test under ASAN caught it once I fixed the bug in the test code. 
The bug was exactly the kind of bug I was trying to catch with the test so we 
do have the coverage. It seems weird (although not entirely without merit) to 
add a regression test for a test bug.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 9:

> (11 comments)

It looks like you might have missed my comments on Base and PS7 in this group 
of comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-12 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4494/8//COMMIT_MSG
Commit Message:

PS8, Line 17: 64
> 32, now?
Done


PS8, Line 18: ors at 64-bit boundaries
> What does it mean to have an or at a k-bit boundary?
Reworded to be hopefully clearer.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 273: #include "common/names.h"
> This isn't needed if using namespace std;, yes?
I think I prefer keeping this and removing the "using namespace" declarations 
below.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

Line 123
> Thanks for fixing this. When this is committed, can you file a bug against 
Sure, will do.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/testutil/mem-util.h
File be/src/testutil/mem-util.h:

PS8, Line 31: posix_memalign
> #include stdlib.h. I'd say cstdlib, but posix_memalign doesn't seem to be p
Added cstdlib. I think in practice it's reasonable to assume that cstdlib 
includes the same things as stdlib.h in some form.

The glibc one from the toolchain literally has #include  in it.


Line 45:  private:
> DISALLOW_COPY_AND_ASSIGN
Done


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 39: void UnpackSubset(const uint32_t* in, const uint8_t* packed, int 
num_in_values,
> I think it would aid clarity to add a short description of what the meaning
Added descriptions and elaborated a little on what the function is doing.


PS8, Line 45: uint32_t
> const uint32_t * in
Done


PS8, Line 72: INFO
> Was this just for debugging, or did you mean to leave it in?
Didn't mean to leave it in - removed.


PS8, Line 84: 8
> Why 8 and not 4?
Used CHAR_BIT and added an intermediate variable to make it a bit clearer.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS8, Line 64: //LOG(INFO) << "bit_width " << BIT_WIDTH << "\n"
:   //  << "in_bytes " << in_bytes << " num_values " << num_values 
<< " max_input_values " << max_input_values << "\n"
:// << "values_to_read " << values_to_read << " batches_to_read 
" << batches_to_read << "\n"
:// << "remainder_values " << remainder_values;
> remove
Oops, sorry missed this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-12 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 883 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-11 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 8:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/4494/8//COMMIT_MSG
Commit Message:

PS8, Line 17: 64
32, now?


PS8, Line 18: ors at 64-bit boundaries
What does it mean to have an or at a k-bit boundary?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 273: #include "common/names.h"
This isn't needed if using namespace std;, yes?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

Line 123
Thanks for fixing this. When this is committed, can you file a bug against me 
to fix the other posix_memalign locations?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/testutil/mem-util.h
File be/src/testutil/mem-util.h:

PS8, Line 31: posix_memalign
#include stdlib.h. I'd say cstdlib, but posix_memalign doesn't seem to be part 
of the C standard, so I don't know if it is technically in that header.


Line 45:  private:
DISALLOW_COPY_AND_ASSIGN


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 42: values' va
> In practice it should be 64-bit aligned with TCMalloc beyond a certain allo
I think it would help the reader here to spell out that misaligned is with 
respect to 64.


PS7, Line 132: 
> Other ones should work but I don't think improve coverage - can't see a sce
Still, if we're only allowing those two you could switch it to a bool 
'misaligned'


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 39: void UnpackSubset(const uint32_t* in, const uint8_t* packed, int 
num_in_values,
I think it would aid clarity to add a short description of what the meanings 
are of 'in' and 'packed'


PS8, Line 45: uint32_t
const uint32_t * in


PS8, Line 72: INFO
Was this just for debugging, or did you mean to leave it in?


PS8, Line 84: 8
Why 8 and not 4?

Can you add a more exact ASSERT here about num_to_unpack and it's relationship 
with writer.bytes_written()?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 69:   // First unpack as many full batches as possible.
> Yep, that was a bug.
Should there be an additional test that would demonstrate that bug?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS8, Line 64: //LOG(INFO) << "bit_width " << BIT_WIDTH << "\n"
:   //  << "in_bytes " << in_bytes << " num_values " << num_values 
<< " max_input_values " << max_input_values << "\n"
:// << "values_to_read " << values_to_read << " batches_to_read 
" << batches_to_read << "\n"
:// << "remainder_values " << remainder_values;
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-10 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 7:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 330: int argc, char **argv
> unused
It seems weird to me to omit the arguments to main()


PS7, Line 336: int64_t
> const, or just make this the param to the data constructor and use data.siz
Done


PS7, Line 343: suite.AddBenchmark(Substitute("UnpackScalar", bit_width), 
UnpackBenchmark, ¶ms);
> This line at bit_width 32 is not in the output you display in the comment a
Oops, fat finger error


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 35:  a subset of
: /// 'num_in_values'
> I still find this wording confusing - Which is the to buffer, and which the
I reworded to make it clearer that it's a subarray.


PS7, Line 42: misaligned
> When misalignment is 0, what alignment is the memory guaranteed to have? 0 
In practice it should be 64-bit aligned with TCMalloc beyond a certain 
allocation size. I switched to using posix_memalign so that it's explicitly 
guaranteed.


PS7, Line 49: uint32_t
> const
Done


Line 93: uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, 
int misalignment) {
> const T*
Done


PS7, Line 94: int
> const
Done


PS7, Line 99: vector
> std::aligned_storage?
Ended up adding a helper class.


PS7, Line 102:  sizeof(uint32_t)
> Why is this needed?
Fixed. It really should have been uint8_t.

It turned out this was masking a bug where we overran the buffer with 64-bit 
loads for odd bitwidths. I ended up just switching to doing 32-bit loads 
always. There wasn't a significant performance hit (if anything it looked a bit 
more consistent).


PS7, Line 132: min(length, 1)
> so the only misalignments allowed are 0 and 1? Why so restrictive?
Other ones should work but I don't think improve coverage - can't see a 
scenario where it would work for 1 byte-aligned and 8 byte-aligned but not 
2-byte or 4-byte aligned.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 22: #ifndef IMPALA_UTIL_BIT_PACKING_H
> These usually go above the #includes
Done


PS7, Line 30: values
> 'value'
Done


PS7, Line 48: BitPacking
> Why not put this into use in this patch? In other words, why only the micro
The BitReader need some rework to be able to make use of the batched interface 
- I thought it would be better to break it up into smaller patches. It also 
makes it easier to compare again the older implementation before reworking that.


PS7, Line 67:  bits of data
> You can elide this phrase
Done


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125: const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % 
> I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To 
That doesn't seem right - the nonoverlapping aligned loads will always minimise 
the number of loads required. If you have a series of overlapping unaligned 
loads if you shift each load along by enough bytes that it doesn't overlap with 
the previous you get a series of nonoverlapping aligned loads that read all the 
bits you need. 

For batches of 32, the input is always a multiple of 32 bits, so the aligned 
loads always exactly cover the input bytes. So you can't shift one of the loads 
without creating a gap in the input array that you need to add another load to 
read.

The example has a bug somewhere - those loads only get you bits 0 to 112, which 
only contains the first 11 values.  Reading 11 10-bit values would require 4 
aligned loads, so that might be a case where you could slightly improve things 
by using unaligned loads. It does depend on the # of input values though so 
it's unclear how you'd exploit the # of input values without adding runtime 
overhead.

Anyway, from what I've seen it would be more impactful to put later 
optimisation effort into adding AVX2, which could be 4x faster.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 45: make_
> make_ can be omitted.
Done


PS7, Line 57: const
> constexpr BATCH_SIZE
Done


PS7, Line 69: in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
> so in_bytes does not move if BIT_WIDTH is 7?
Yep, that was a bug.


PS7, Line 90: onstants
> constants
Done


PS7, Line 91: should
> Should this be "must", since  VALUE_IDX is now a template param? Not that t
Yes, that's true - updated.


PS7, Line 115: else
> if branch returns, so you can drop the "else" here.
I found it clearer to have the if/else if/else just to make it obvious that the 
three branches are mutually exclusive. Normally I'd use the early retur

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-10 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 879 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-09 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 7:

(10 comments)

found with help of clang-tidy

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 330: int argc, char **argv
unused


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 22: #ifndef IMPALA_UTIL_BIT_PACKING_H
These usually go above the #includes


PS7, Line 48: BitPacking
Why not put this into use in this patch? In other words, why only the 
microbenchmark?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 45: make_
make_ can be omitted.


PS7, Line 90: onstants
constants


PS7, Line 115: else
if branch returns, so you can drop the "else" here.


PS7, Line 118: else
else if branch returns, so you can drop the else here, if you want to. I'm 
ambivalent.


Line 120: DCHECK_LT(VALUE_IDX, 31) << "Trailing bits after last word.";
Can you explain more why VALUE_IDX < 31? Can you add a static_assert at the top 
with the maximum value it could possibly be?


PS7, Line 126: warning
What template values cause that warning to trigger?


PS7, Line 184: BIT_WIDTH
if BIT_WIDTH is 0, I don't think this is technically allowed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-07 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 7:

(20 comments)

Thanks for your patience

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 336: int64_t
const, or just make this the param to the data constructor and use data.size() 
as the last argument to the params initializer list.


PS7, Line 343: suite.AddBenchmark(Substitute("UnpackScalar", bit_width), 
UnpackBenchmark, ¶ms);
This line at bit_width 32 is not in the output you display in the comment at 
the top.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 35:  a subset of
: /// 'num_in_values'
I still find this wording confusing - Which is the to buffer, and which the 
from? What does it mean to have a subset of an int?


PS7, Line 42: misaligned
When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 
64?


PS7, Line 49: uint32_t
const


Line 93: uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, 
int misalignment) {
const T*


PS7, Line 94: int
const


PS7, Line 99: vector
std::aligned_storage?


PS7, Line 102:  sizeof(uint32_t)
Why is this needed?


PS7, Line 132: min(length, 1)
so the only misalignments allowed are 0 and 1? Why so restrictive?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi
OK, but you said "you have to choose" 8. A batch size of 32 would still work, 
right?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS7, Line 30: values
'value'


PS7, Line 67:  bits of data
You can elide this phrase


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125: const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % 
> That's true, but there's a downside - if the loads are aligned relative to 
I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To 
unpack 16 of them with your method, you need to read 160 bits, which means 5 
(aligned) loads. 12 of the unpacked values can be read without using this third 
(more expensive) branch. In this branch, there are two more ALU ops, so that's 
4*2 = 8 extra ALU ops.

OTOH, using un-aligned loads, you could do only 4 loads and (at bits 0, 24, 48, 
and 80) and no more than two ALU ops for each extraction.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 57: const
constexpr BATCH_SIZE


PS7, Line 69: in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
so in_bytes does not move if BIT_WIDTH is 7?


PS7, Line 91: should
Should this be "must", since  VALUE_IDX is now a template param? Not that the 
explanation is wrong, but now you have forced the caller into efficient 
behavior with your choice.


PS7, Line 178: const
constexpr MAX_BATCH_SIZE


PS7, Line 179: int
const


PS7, Line 183:   // Copy into padded temporary buffer to avoid reading past the 
end of 'in'.
Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values 
is 16, can we just use 'in'?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-06 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 807 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-06 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 5:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS4, Line 296: bool
> const
Done


PS4, Line 303: UnpackValues
> Unpack32Values
Done


PS4, Line 312: + offset
> This could go off the end of out_buffer if NUM_OUT_VALUES is not a multiple
Yeah, that's an assumption to simplify the benchmark code. Added a 
static_assert up the top.


PS4, Line 322: int64_t
> const
Done


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS4, Line 30: compute_mask
> CamelCaseName
Done


PS4, Line 35: smaller
> smaller than what?
clarified.


Line 116:   const int num_in_values = 64 * 1024;
> constexpr, and all caps name
Done


Line 119: in[i] = rand();
> std::generate
Done


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> I don't understand yet why you have to choose a batch size of 8 for a bit w
8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bit of 
the batch aligns to a byte boundary.


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS4, Line 66: bytes
> bits?
Done


PS4, Line 66: values
> values of type OutType
Done


PS4, Line 70: in_bytes
> Can you add to the comment a description of how this parameter is used?
Ended up rewording the comment a bit to fit this in.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 143: undef 
> https://gcc.gnu.org/onlinedocs/gcc/Push_002fPop-Macro-Pragmas.html#Push_002
got you - done.


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125: return lower_bits | (trailing_bits << TRAILING_BITS_SHIFT);
> This comes out to two shifts, a bitwise or, and a bitwise and. If you did u
That's true, but there's a downside - if the loads are aligned relative to the 
start of the buffer, the optimiser can reuse the results of the loads. Doing 
the unaligned load adds an extra load, which is probably comparable in cost to 
the two ALU ops that were saved.


Line 142:   BOOST_PP_REPEAT_FROM_TO(0, 33, UNPACK_VALUE_CALL, ignore);
> So, even after turning this into a templated function, a loop with static b
Even with #pragma unroll and ALWAYS_INLINE.

Inlining happens but the generated code is bad. I think either the unrolling or 
inlining happens too late in GCC's optimisation pipeline for the rest of the 
optimisation to be fully effective. I think part of it is that we have to 
demote the template argument VALUE_IDX to an argument, so the template doesn't 
fully specialise the code when it's instantiated.

Here's an example of the generated assembly. Note that there are a lot of jmps 
and jes in the loop version 
https://gist.github.com/timarmstrong/4f68a74054cfd0df91a89a824ed04961

I added a note in the UnpackValue() comment to explain how it should be stamped 
out.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-06 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#6).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 807 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/6
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS4, Line 296: bool
const


PS4, Line 303: UnpackValues
Unpack32Values


PS4, Line 312: + offset
This could go off the end of out_buffer if NUM_OUT_VALUES is not a multiple of 
32, right?


PS4, Line 322: int64_t
const


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS4, Line 30: compute_mask
CamelCaseName


PS4, Line 35: smaller
smaller than what?


Line 116:   const int num_in_values = 64 * 1024;
constexpr, and all caps name


Line 119: in[i] = rand();
std::generate


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> A batch size of 32 is the smallest that works for all supported bit widths.
I don't understand yet why you have to choose a batch size of 8 for a bit width 
of 3.


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS4, Line 66: bytes
bits?


PS4, Line 66: values
values of type OutType


PS4, Line 70: in_bytes
Can you add to the comment a description of how this parameter is used?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 143: undef 
> I'm not quite sure that I understand the intent or how I'd do that, aside f
https://gcc.gnu.org/onlinedocs/gcc/Push_002fPop-Macro-Pragmas.html#Push_002fPop-Macro-Pragmas


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125: return lower_bits | (trailing_bits << TRAILING_BITS_SHIFT);
This comes out to two shifts, a bitwise or, and a bitwise and. If you did 
unaligned 64-bit reads to fit the full value inside of a single word, it could 
be done in one shift and one bitwise and, yes?


Line 142:   BOOST_PP_REPEAT_FROM_TO(0, 33, UNPACK_VALUE_CALL, ignore);
So, even after turning this into a templated function, a loop with static 
bounds doesn't cause inlining? Maybe leave a note on why you do it this way, in 
that case.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 799 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 2:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 296: reader.Reset(p->data, p->data_len);
> Should offset advance in this loop, then?
The code was a bit obfuscated. The loop only really retries once, so there's no 
need to advance offset. Changed it to be more obvious. Offset is advanced once 
per value read by virtue of the offset calculation above.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 270: #include "common/names.h"
> Why the blank line here?
We separate out common/names.h like this most places because it needs to come 
after all the other includes.


PS3, Line 277: 
> Maybe NUM_OUT_VALUES
Done


PS3, Line 305: * p
> const uint8_t * const data_end ...
Done


Line 334: for (int i = 0; i < data_len; ++i) {
> std::iota
Done


PS3, Line 336: 
> I think you can even elide the parens
Done


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS3, Line 29: compute_mask
> ComputeMask, and in an anonymous namespace
Done


PS3, Line 33: UnpackSubset
> With a comment here, if I'm going to use it in PackUnpack.
Done


PS3, Line 37: with memory that is
> this phrase can be omitted
Done


PS3, Line 39: int
> unsigned? DCHECK <= something?
I don't think there's a particular limit on the number of values we could use 
here.


PS3, Line 39: num_values
> num_in_values?
Done


PS3, Line 39: int
> also unsigned?
I think int should be ok - I normally prefer using signed types by default even 
for variables that logically should always be positive because it's easier to 
reason about the number going negative compared with underflowing (also easier 
to catch the out-of-range value with dchecks).


PS3, Line 45: int
> const, here and elsewhere
Done


PS3, Line 51: & compute_mask(bit_width)
> AKA mask; already hoisted
Done


PS3, Line 58: becase
> because
Done


PS3, Line 58: the input buffer size
> "the input buffer size (num_values)"
Done


PS3, Line 60: // Size buffer exactly so that ASAN can detect buffer 
overruns.
> I think manual canaray checks might be called for here
We rely on ASAN to detect read overruns anyway (which I was more concerned 
about since they're tricky to avoid in the code), so it doesn't seem worth it 
to me on balance to add the extra test code.


PS3, Line 65: ASSERT_EQ
> Can you add more " << error message " to your ASSERTs?
Done


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 36:   static const std::pair UnpackValues(int 
bit_width,
> Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt
It seems like most implementations of cstdint put the typedefs in the global 
namespace anyway, so I think we're ok for now. If that turns out to be a 
problem I think we should create a wrapper header with the using declarations.


PS2, Line 57: Unpack32Values
> So, this boundary condition happens for any multiple of 8, and you chose 32
A batch size of 32 is the smallest that works for all supported bit widths. 
Otherwise you have to choose different batch sizes for different bit widths 
(e.g 32 for bit width 31, 8 for bit width 3, etc), which gets more complicated.

Lemire's bitpacking libraries use 32 everywhere (https://github.com/lemire/) so 
it seemed like a reasonable choice - he obviously put some effort into 
validating the performance.

I could benchmark different numbers but I think at this point I'd be optimising 
for the microbenchmark rather than a real workload.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS3, Line 55: IT_WIDTH from 'in' to 'out'.
> But the last byte of 'in' that was read might only have been partially used
That's correct. I deliberately made the decision that these functions shouldn't 
deal with resuming reads in the middle of bytes, because it makes things way 
more complicated (and probably slower). I expect that callers will either read 
in batches or do their own buffering if they need to read a value at a time.

Expanded on the comment to make this explicit and explain what the caller can 
do.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 56: >(in,
> can be omitted if the branches are swapped
Done


Line 56: case 21: return UnpackValues(in, in_bytes, 
num_values, out);
> long line
Done


PS3, Line 90: 
:   if (r
> I think you accidentally a word
I ended up retrying the template function thing and it works ok this time 
around. I think something else was confounding it before

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 799 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-04 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 3:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 296:   }
> We need to do this to start the reader reading from the start of the buffer
Should offset advance in this loop, then?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 270: 
Why the blank line here?


PS3, Line 277: NUM_VALUES
Maybe NUM_OUT_VALUES


PS3, Line 305: * d
const uint8_t * const data_end ...


Line 334:   data[i] = static_cast(i);
std::iota


PS3, Line 336: (
I think you can even elide the parens


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS3, Line 29: compute_mask
ComputeMask, and in an anonymous namespace


PS3, Line 33: UnpackSubset
With a comment here, if I'm going to use it in PackUnpack.


PS3, Line 37: with memory that is
this phrase can be omitted


PS3, Line 39: int
also unsigned?


PS3, Line 39: int
unsigned? DCHECK <= something?


PS3, Line 39: num_values
num_in_values?


PS3, Line 45: int
const, here and elsewhere


PS3, Line 51: & compute_mask(bit_width)
AKA mask; already hoisted


PS3, Line 58: becase
because


PS3, Line 58: the input buffer size
"the input buffer size (num_values)"


PS3, Line 60: // Size buffer exactly so that ASAN can detect buffer 
overruns.
I think manual canaray checks might be called for here


PS3, Line 65: ASSERT_EQ
Can you add more " << error message " to your ASSERTs?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 36: /// little-endian order). E.g. the values 1, 2, 3, 4 packed with bit 
width 4 results
> We don't do this anywhere else, seems unnecessarily verbose.
Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt 
these in the global namespace, and stdint.h is deprecated. What color do you 
think we should paint this bikeshed? Maybe a using declaration at the top?


PS2, Line 57: Type>
> Added some explanation to the class comment for the magic number. It works 
So, this boundary condition happens for any multiple of 8, and you chose 32 to 
amortize the loop some more?

Did you do any testing of 16 and 64 to see how they compare?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS3, Line 55: after the last byte of 'in' that was read
But the last byte of 'in' that was read might only have been partially used, 
right?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS2, Line 108:  \
 : if (end_bit_offset < load_bit_width) {  
> Reworded this. I did some experiments early on looking at the assembly and 
Did you try always_inline and template parameters? If yes, what else did you 
try?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 56:  == 0
can be omitted if the branches are swapped


Line 56:   const int64_t max_input_values = BIT_WIDTH == 0 ? num_values : 
(in_bytes * CHAR_BIT) / BIT_WIDTH;
long line


PS3, Line 90: early
: // with
I think you accidentally a word


PS3, Line 93: LOAD_TYPE
I think "LOAD_TYPE" could be "LoadType" to more clearly match the lexical 
standard for other type names


PS3, Line 93: i
If 'i' is a compile-time constant, maybe call it "I" or "NTH"?


PS3, Line 96: load_bit_width
I think constexprs should get the static const treatment of all caps


PS3, Line 106: shifted
Can you add a comment describing what this is for?


PS3, Line 108: 32
How is this 32 derived?


PS3, Line 109: end_bit_offset
so, this case al the bits we need are in one word?


PS3, Line 119: Make non-negative to avoid spurious compiler warning
Maybe give it an unsigned type?


PS3, Line 124: Special-case impossible BIT_WIDTH 32 to avoid spurious compiler 
warning
this could use a bit more explanation


PS3, Line 143: define
Check not already defined?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-04 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 2:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/4494/2//COMMIT_MSG
Commit Message:

PS2, Line 12: was
> "want"
Done


PS2, Line 18: possible
> long line
Done


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS2, Line 259: include
> I think cmath, cstdlib, cstdio are the non-deprecated versions for C++
Done


Line 260: #include 
> Can you arrange these in three blocks:
Done


Line 276: #define NUM_VALUES 1024 * 1024
> constexpr int
Done


PS2, Line 281: BenchmarkParams
> You can leave the constructor out and use aggregate initialization:
Done. I didn't realise you could use this with new


Line 290:   BenchmarkParams* p = reinterpret_cast(data);
> maybe const auto p = reinterpret_cast
I added the const. It doesn't seem verbose enough to require auto.


PS2, Line 294: int64_t
> const
Done


PS2, Line 294: i * 32 % NUM_VALUES + j;
> Shouldn't this while thing be modulo NUM_VALUES, not just the non-+j part?
Done


Line 296: reader.Reset(p->data, p->data_len);
> Why Reset?
We need to do this to start the reader reading from the start of the buffer 
again.


PS2, Line 333: uint8_t* data = new uint8_t[data_len];
> unique_ptr to clean up after
Changed to a vector


PS2, Line 334: int
> int64_t
Done


PS2, Line 337: BenchmarkParams
> This can be a local var, right?
Done


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 18: #include 
> cstdint
Done


PS2, Line 29: Unpack bit-packed values
> What does this phrase mean?
I added an explanation to the class comment, probably needed to make the rest 
of the code understandable.


PS2, Line 30: outputted
> unpacked
Done


PS2, Line 30: is
> are
Done


PS2, Line 30: 'out' must have
:   /// enough space for 'num_values'.
> And in must point to at least in_bytes of addressable memory?
Done


PS2, Line 31: 0
> What does a zero bit_width mean in this context?
Added an explanation to the class comment. It means every value is zero.


PS2, Line 33: plus
> "And also"
Done


PS2, Line 32: utType.
:   /// Retu
> extra line break here, please
Done


PS2, Line 33: the byte 
> a pointer to the byte
Done


PS2, Line 33: the number of
:   /// values that were read.
> I can infer that by just subtracting, right? I think the pair might be over
It's the number of logical values that were read. The calculation is a function 
of the input arguments: min(num_values, in_bytes / bit_width) but it seems like 
the kind of thing that is easy to get wrong - that was the reason for making 
in_bytes a parameter rather than letting the caller do the calculation.

Also division by bit_width is slower here than inside the function where we're 
dispatching to bit_width-specialized code anyway.


PS2, Line 36: const
> This const has no effect, I believe.
Done


Line 36:   static const std::pair UnpackValues(int 
bit_width,
> std::uint8_t, etc.
We don't do this anywhere else, seems unnecessarily verbose.


PS2, Line 57: Unpack32Values
> Why not up to 64?
Added some explanation to the class comment for the magic number. It works out 
because for 32-bit packed values, the end of the packed batch always falls on a 
byte boundary, so there's no need to deal with partial bytes between batches.

If we went to larger batches I think the additional benefit from amortising 
loop overheads is fairly minimal and the code will get a lot bigger.


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 35: case 0: return UnpackValues(in, in_bytes, num_values, 
out);
> BOOST_PP_REPEAT_FROM_TO would be worth it here, I think
Done here and elsewhere. I'm a little concerned that it makes it hard to debug 
any compile errors, but hopefully the code shouldn't change much.


PS2, Line 70: 
> can be omitted
In this case it can't be inferred from the arguments.


PS2, Line 70: NULL
> nullptr
Done. Have we actually standardised on this?


PS2, Line 78: DCHECK_GE
> static_assert
Done here and elsewhere. This seems to have made the debug build a lot faster, 
which is great.


PS2, Line 79: 8
> CHAR_BIT, here and below
Done


PS2, Line 80: in_bytes * 8 / BIT_WIDTH
> I think this line would be clearer with more parens
Done


PS2, Line 99:  unsigned integer type
> static_check with http://en.cppreference.com/w/cpp/types/is_unsigned?
Done.


PS2, Line 108: inlined as
 : // soon as possible and subject to all optimizations
> Can you elaborate on this? Are there benchmarks on that?
Reworded this. I did some experiments early on looking at the assembly and 
wasn't able to convince gcc to generate the optimal code without resorting to 
the macro approach.


-- 
To view, visit http:

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-04 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 788 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-03 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 2:

(34 comments)

First pass, didn't get to all of it yet but I thought you might want to see 
what was done

http://gerrit.cloudera.org:8080/#/c/4494/2//COMMIT_MSG
Commit Message:

PS2, Line 12: was
"want"


PS2, Line 18: possible
long line


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS2, Line 259: include
I think cmath, cstdlib, cstdio are the non-deprecated versions for C++


Line 260: #include 
Can you arrange these in three blocks:

1. C standard headers

2. C++ standard headers

3. Other

alphabetized in each block


Line 276: #define NUM_VALUES 1024 * 1024
constexpr int


PS2, Line 281: BenchmarkParams
You can leave the constructor out and use aggregate initialization:

BenchmarkParams bp {p,q,r};


Line 290:   BenchmarkParams* p = reinterpret_cast(data);
maybe const auto p = reinterpret_cast


PS2, Line 294: int64_t
const


PS2, Line 294: i * 32 % NUM_VALUES + j;
Shouldn't this while thing be modulo NUM_VALUES, not just the non-+j part?


Line 296: reader.Reset(p->data, p->data_len);
Why Reset?


PS2, Line 333: uint8_t* data = new uint8_t[data_len];
unique_ptr to clean up after


PS2, Line 334: int
int64_t


PS2, Line 337: BenchmarkParams
This can be a local var, right?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 18: #include 
cstdint


PS2, Line 29: Unpack bit-packed values
What does this phrase mean?


PS2, Line 30: outputted
unpacked


PS2, Line 30: is
are


PS2, Line 31: 0
What does a zero bit_width mean in this context?


PS2, Line 30: 'out' must have
:   /// enough space for 'num_values'.
And in must point to at least in_bytes of addressable memory?


PS2, Line 33: the byte 
a pointer to the byte


PS2, Line 33: plus
"And also"


PS2, Line 32: utType.
:   /// Retu
extra line break here, please


PS2, Line 33: the number of
:   /// values that were read.
I can infer that by just subtracting, right? I think the pair might be overkill


PS2, Line 36: const
This const has no effect, I believe.


Line 36:   static const std::pair UnpackValues(int 
bit_width,
std::uint8_t, etc.


PS2, Line 57: Unpack32Values
Why not up to 64?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 35: case 0: return UnpackValues(in, in_bytes, num_values, 
out);
BOOST_PP_REPEAT_FROM_TO would be worth it here, I think


PS2, Line 70: NULL
nullptr


PS2, Line 70: 
can be omitted


PS2, Line 78: DCHECK_GE
static_assert


PS2, Line 79: 8
CHAR_BIT, here and below


PS2, Line 80: in_bytes * 8 / BIT_WIDTH
I think this line would be clearer with more parens


PS2, Line 99:  unsigned integer type
static_check with http://en.cppreference.com/w/cpp/types/is_unsigned?


PS2, Line 108: inlined as
 : // soon as possible and subject to all optimizations
Can you elaborate on this? Are there benchmarks on that?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-03 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#2).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
was to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be possible
using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
7 files changed, 862 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong

37 matches

Mail list logo