[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-27 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has submitted this change and it was merged.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Reviewed-on: http://gerrit.cloudera.org:8080/6623
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon 
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 200 insertions(+), 48 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 11
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-27 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 10: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-20 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#10).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 200 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-20 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6623/9/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

PS9, Line 261: as
> nit: from
Done


PS9, Line 273: direct.ToString
> maybe HexDump here and below?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-20 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 9:

(2 comments)

lgtm aside from a couple nits

http://gerrit.cloudera.org:8080/#/c/6623/9/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

PS9, Line 261: as
nit: from


PS9, Line 273: direct.ToString
maybe HexDump here and below?


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-18 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6623/8/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

Line 130: void SerializeRowBlock(const RowBlock& block, RowwiseRowBlockPB* 
rowblock_pb,
> warning: function 'kudu::SerializeRowBlock' has a definition with different
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-18 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#9).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 199 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-18 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6623/7/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

PS7, Line 130: const RowBlock& block, RowwiseRowBlockPB* r
> Do we need to add docs for this here if it's only being used by Impala?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-18 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#8).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 199 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-18 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6623/7/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

PS7, Line 130: bool pad_unixtime_micros_for_impala = false
Do we need to add docs for this here if it's only being used by Impala?


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#7).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 173 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

PS4, Line 248: *reinterpret_cast(row.mutable_cell_ptr(0)) = i;
 : Slice col1;
 : // See: FillRowBlockWithTestRows() for the reason why we 
relocate these
 : // to 'test_data_arena_'.
 : CHECK(test_data_arena_.RelocateSlice("hello world col1", 
));
 : *reinterpret_cast(row.mutable_cell_ptr(1)) = col1;
 : *reinterpret_cast(row.mutable_cell_ptr(2)) = i;
 : *reinterpret_cast(row.mutable_cell_ptr(3)) = i;
 : row.cell(3).set_null(false);
 : *reinterpret_cast(row.mutable_cell_ptr(4)) = i;
 : row.cell(4).set_null(true);
> would using RowBuilder be simpler here?
this is the method we use for the other tests, so keeping it similar helps to 
understand this one a bit better IMO. plus rowbuilder has  its own arena 
inside, per row, and requires copies.


PS4, Line 298: col3
> col2
Done


PS4, Line 301: bytes.
 : EXPECT_FALSE(BitmapTest(base_data + 68, 3));
 : // 'col4' is supposed to be null, but should also read 0 
since we memsetted the
 : // memory to 0. It should come at 52 byt
> I think these comments should be merged, since the second gives a bit more 
I moved the null bitmap pointer calculation to the top.


http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 487:   size_t row_size = ContiguousRowHelper::row_size(schema);
> I was actually planning to do this on the next patch up, since this is clie
Done


PS4, Line 691: has_nullable_cols
> Right, but has_nullable_cols only gets set within this if block, which is p
Done


Line 700:   size_t new_base_size = row_stride * num_rows;
> hrm, I would expect this variable to be equal to old_size + row_stride * nu
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 691: has_nullable_cols
> Right, but has_nullable_cols only gets set within this if block, which is p
ah, you're right. indeed good catch. thanks


http://gerrit.cloudera.org:8080/#/c/6623/6/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

Line 479:   // change any indirect data pointers back to "real" pointers 
instead of
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/6623/6/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

Line 127: void SerializeRowBlock(const RowBlock& block, RowwiseRowBlockPB* 
rowblock_pb,
> warning: function 'kudu::SerializeRowBlock' has a definition with different
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#6).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 166 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 691:fa
> We're doing that below, right? if padding is not 0 or if there are nullable
Right, but has_nullable_cols only gets set within this if block, which is 
predicated on pad_unixtime_micros_for_impala, so if that's false, even if there 
are nullable columns, has_nullable_cols will never be set!


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 5:

(2 comments)

MJ: Still need to address some comments, but included the client-side fix for 
the pointer re-writing.

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 487:   if (pad_unixtime_micros_for_impala) {
> ...and the logic below
I was actually planning to do this on the next patch up, since this is client 
side, but maybe it makes more sense to have it here. I'll move it.


PS4, Line 691:fa
> good catch, agree we need to zero the columns in the case of nulls too.
We're doing that below, right? if padding is not 0 or if there are nullable 
cols we memset the whole thing (that's what the benchmarks on the commit 
message refer to)


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-17 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#5).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 165 insertions(+), 33 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 487:   size_t row_size = ContiguousRowHelper::row_size(schema);
> I think this will be wrong if the new mode is set.
...and the logic below

I'm hacking it up now to try to get this working end to end.


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-14 Thread Matthew Jacobs (Code Review)
Matthew Jacobs has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 487:   size_t row_size = ContiguousRowHelper::row_size(schema);
I think this will be wrong if the new mode is set.


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-14 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

PS4, Line 248: *reinterpret_cast(row.mutable_cell_ptr(0)) = i;
 : Slice col1;
 : // See: FillRowBlockWithTestRows() for the reason why we 
relocate these
 : // to 'test_data_arena_'.
 : CHECK(test_data_arena_.RelocateSlice("hello world col1", 
));
 : *reinterpret_cast(row.mutable_cell_ptr(1)) = col1;
 : *reinterpret_cast(row.mutable_cell_ptr(2)) = i;
 : *reinterpret_cast(row.mutable_cell_ptr(3)) = i;
 : row.cell(3).set_null(false);
 : *reinterpret_cast(row.mutable_cell_ptr(4)) = i;
 : row.cell(4).set_null(true);
would using RowBuilder be simpler here?


http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 691: has_nullable_cols
> Do we only care about zeroing out memory if we're padding for Impala?
good catch, agree we need to zero the columns in the case of nulls too.


Line 700:   size_t new_base_size = row_stride * num_rows;
hrm, I would expect this variable to be equal to old_size + row_stride * 
num_rows (the new size) rather than just the change in size. Maybe call it 
"additional_size" or something?


http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.h
File src/kudu/common/wire_protocol.h:

Line 130:bool pad_unixtime_micros_for_impala = false);
doc


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-14 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

PS4, Line 298: col3
col2


PS4, Line 301: bytes.
 : EXPECT_FALSE(BitmapTest(base_data + 68, 3));
 : // 'col4' is supposed to be null, but should also read 0 
since we memsetted the
 : // memory to 0. It should come at 52 byt
I think these comments should be merged, since the second gives a bit more 
context for why it's 68 bytes. Or explain why it's 68 bytes explicitly.


http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

PS4, Line 691: has_nullable_cols
Do we only care about zeroing out memory if we're padding for Impala?


-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-12 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#4).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 134 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-12 Thread David Ribeiro Alves (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6623

to look at the new patch set (#3).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 134 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results

2017-04-12 Thread David Ribeiro Alves (Code Review)
David Ribeiro Alves has uploaded a new patch set (#2).

Change subject: Allow to pad UNIXTIME_MICROS slots in scan results
..

Allow to pad UNIXTIME_MICROS slots in scan results

This changes the wire protocol to, upon request, pad
the slots that contain values for UNIXTIME_MICROS columns
by 8 bytes. This allows Impala to adopt the result of a
Kudu scan as a whole while still having space to transform
UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala
representation of timestamp, which occupies 16, in place
and without copying memory.

This patch doesn't include any de-serialization logic
the reasoning being that Impala will have knowledge of
the data format in order to perform de-serialization
directly on the "raw" direct and indirect data.

This patch doesn't introduce any branching in the
serialization patch. It does move the memset() call
that was performed once per nullable column, per row,
to be performed on the whole block instead. While less
cache friendly, this is also executed less times. The
net gain is not significant, but it does not regress
in the normal case.

Results of running the wire_protocol-test benchmark
in slow mode:

Before (avg 3 runs): 3.076
After  (avg 3 runs): 3.000

The difference is around -2%, which might be in the noise
but demostrates no significant regression.

Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
---
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.h
3 files changed, 134 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6623
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins