[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has submitted this change and it was merged. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Reviewed-on: http://gerrit.cloudera.org:8080/6623 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 200 insertions(+), 48 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Todd Lipcon has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 10: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#10). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 200 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/10 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/6623/9/src/kudu/common/wire_protocol-test.cc File src/kudu/common/wire_protocol-test.cc: PS9, Line 261: as > nit: from Done PS9, Line 273: direct.ToString > maybe HexDump here and below? Done -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Todd Lipcon has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 9: (2 comments) lgtm aside from a couple nits http://gerrit.cloudera.org:8080/#/c/6623/9/src/kudu/common/wire_protocol-test.cc File src/kudu/common/wire_protocol-test.cc: PS9, Line 261: as nit: from PS9, Line 273: direct.ToString maybe HexDump here and below? -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/6623/8/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: Line 130: void SerializeRowBlock(const RowBlock& block, RowwiseRowBlockPB* rowblock_pb, > warning: function 'kudu::SerializeRowBlock' has a definition with different Done -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#9). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 199 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/9 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/6623/7/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: PS7, Line 130: const RowBlock& block, RowwiseRowBlockPB* r > Do we need to add docs for this here if it's only being used by Impala? Done -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#8). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 199 insertions(+), 48 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/8 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Andrew Wong has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/6623/7/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: PS7, Line 130: bool pad_unixtime_micros_for_impala = false Do we need to add docs for this here if it's only being used by Impala? -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#7). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 173 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/7 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol-test.cc File src/kudu/common/wire_protocol-test.cc: PS4, Line 248: *reinterpret_cast(row.mutable_cell_ptr(0)) = i; : Slice col1; : // See: FillRowBlockWithTestRows() for the reason why we relocate these : // to 'test_data_arena_'. : CHECK(test_data_arena_.RelocateSlice("hello world col1", )); : *reinterpret_cast (row.mutable_cell_ptr(1)) = col1; : *reinterpret_cast (row.mutable_cell_ptr(2)) = i; : *reinterpret_cast (row.mutable_cell_ptr(3)) = i; : row.cell(3).set_null(false); : *reinterpret_cast (row.mutable_cell_ptr(4)) = i; : row.cell(4).set_null(true); > would using RowBuilder be simpler here? this is the method we use for the other tests, so keeping it similar helps to understand this one a bit better IMO. plus rowbuilder has its own arena inside, per row, and requires copies. PS4, Line 298: col3 > col2 Done PS4, Line 301: bytes. : EXPECT_FALSE(BitmapTest(base_data + 68, 3)); : // 'col4' is supposed to be null, but should also read 0 since we memsetted the : // memory to 0. It should come at 52 byt > I think these comments should be merged, since the second gives a bit more I moved the null bitmap pointer calculation to the top. http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 487: size_t row_size = ContiguousRowHelper::row_size(schema); > I was actually planning to do this on the next patch up, since this is clie Done PS4, Line 691: has_nullable_cols > Right, but has_nullable_cols only gets set within this if block, which is p Done Line 700: size_t new_base_size = row_stride * num_rows; > hrm, I would expect this variable to be equal to old_size + row_stride * nu Done -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 691: has_nullable_cols > Right, but has_nullable_cols only gets set within this if block, which is p ah, you're right. indeed good catch. thanks http://gerrit.cloudera.org:8080/#/c/6623/6/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: Line 479: // change any indirect data pointers back to "real" pointers instead of > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/6623/6/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: Line 127: void SerializeRowBlock(const RowBlock& block, RowwiseRowBlockPB* rowblock_pb, > warning: function 'kudu::SerializeRowBlock' has a definition with different Done -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#6). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 166 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/6 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Andrew Wong has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 691:fa > We're doing that below, right? if padding is not 0 or if there are nullable Right, but has_nullable_cols only gets set within this if block, which is predicated on pad_unixtime_micros_for_impala, so if that's false, even if there are nullable columns, has_nullable_cols will never be set! -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 5: (2 comments) MJ: Still need to address some comments, but included the client-side fix for the pointer re-writing. http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 487: if (pad_unixtime_micros_for_impala) { > ...and the logic below I was actually planning to do this on the next patch up, since this is client side, but maybe it makes more sense to have it here. I'll move it. PS4, Line 691:fa > good catch, agree we need to zero the columns in the case of nulls too. We're doing that below, right? if padding is not 0 or if there are nullable cols we memset the whole thing (that's what the benchmarks on the commit message refer to) -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#5). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 165 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/5 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Matthew Jacobs has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 487: size_t row_size = ContiguousRowHelper::row_size(schema); > I think this will be wrong if the new mode is set. ...and the logic below I'm hacking it up now to try to get this working end to end. -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Matthew Jacobs has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 487: size_t row_size = ContiguousRowHelper::row_size(schema); I think this will be wrong if the new mode is set. -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Todd Lipcon has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol-test.cc File src/kudu/common/wire_protocol-test.cc: PS4, Line 248: *reinterpret_cast(row.mutable_cell_ptr(0)) = i; : Slice col1; : // See: FillRowBlockWithTestRows() for the reason why we relocate these : // to 'test_data_arena_'. : CHECK(test_data_arena_.RelocateSlice("hello world col1", )); : *reinterpret_cast (row.mutable_cell_ptr(1)) = col1; : *reinterpret_cast (row.mutable_cell_ptr(2)) = i; : *reinterpret_cast (row.mutable_cell_ptr(3)) = i; : row.cell(3).set_null(false); : *reinterpret_cast (row.mutable_cell_ptr(4)) = i; : row.cell(4).set_null(true); would using RowBuilder be simpler here? http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 691: has_nullable_cols > Do we only care about zeroing out memory if we're padding for Impala? good catch, agree we need to zero the columns in the case of nulls too. Line 700: size_t new_base_size = row_stride * num_rows; hrm, I would expect this variable to be equal to old_size + row_stride * num_rows (the new size) rather than just the change in size. Maybe call it "additional_size" or something? http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: Line 130:bool pad_unixtime_micros_for_impala = false); doc -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Andrew Wong has posted comments on this change. Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol-test.cc File src/kudu/common/wire_protocol-test.cc: PS4, Line 298: col3 col2 PS4, Line 301: bytes. : EXPECT_FALSE(BitmapTest(base_data + 68, 3)); : // 'col4' is supposed to be null, but should also read 0 since we memsetted the : // memory to 0. It should come at 52 byt I think these comments should be merged, since the second gives a bit more context for why it's 68 bytes. Or explain why it's 68 bytes explicitly. http://gerrit.cloudera.org:8080/#/c/6623/4/src/kudu/common/wire_protocol.cc File src/kudu/common/wire_protocol.cc: PS4, Line 691: has_nullable_cols Do we only care about zeroing out memory if we're padding for Impala? -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#4). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 134 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/4 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6623 to look at the new patch set (#3). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 134 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/3 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow to pad UNIXTIME MICROS slots in scan results
David Ribeiro Alves has uploaded a new patch set (#2). Change subject: Allow to pad UNIXTIME_MICROS slots in scan results .. Allow to pad UNIXTIME_MICROS slots in scan results This changes the wire protocol to, upon request, pad the slots that contain values for UNIXTIME_MICROS columns by 8 bytes. This allows Impala to adopt the result of a Kudu scan as a whole while still having space to transform UNIXTIME_MICROS values, which occupy 8 bytes, to the Impala representation of timestamp, which occupies 16, in place and without copying memory. This patch doesn't include any de-serialization logic the reasoning being that Impala will have knowledge of the data format in order to perform de-serialization directly on the "raw" direct and indirect data. This patch doesn't introduce any branching in the serialization patch. It does move the memset() call that was performed once per nullable column, per row, to be performed on the whole block instead. While less cache friendly, this is also executed less times. The net gain is not significant, but it does not regress in the normal case. Results of running the wire_protocol-test benchmark in slow mode: Before (avg 3 runs): 3.076 After (avg 3 runs): 3.000 The difference is around -2%, which might be in the noise but demostrates no significant regression. Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 --- M src/kudu/common/wire_protocol-test.cc M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.h 3 files changed, 134 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/6623/2 -- To view, visit http://gerrit.cloudera.org:8080/6623 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I99fc6d3be089d19ebe2e70c938f2405c381578b4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Kudu Jenkins