[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()
Todd Lipcon has submitted this change and it was merged. Change subject: bshuf_block: fix GetFirstKey() and GetLastKey() .. bshuf_block: fix GetFirstKey() and GetLastKey() This fixes a bug where the bitshuffle encoding would return the wrong value for GetFirstKey() and GetLastKey(). The issue was that, in the case of UINT32s, the data_ array was being inspected after 'Finish()' had already collapsed down the elements to their minimum width. Since this collapsing is only done on UINT32 types, this would only affect these functions on dictionary-encoded binary blocks, which use bitshuffle internally for storing the UINT32 code words. However, the "cleanup" commit prior to this one also fixed some incorrect indexing that would affect BIT_SHUFFLE blocks outside the context of UINT32. These functions are only used in the case that the column in question is a non-composite primary key, in order to build the value index as well as to determine the min/max bounds of a written CFile. The result of returning the wrong value for GetLastKey() was that the resulting index blocks would be incorrect, and seeks based on key would end up at the wrong row. This could cause spurious "row not found" errors or even crashes in some cases. Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING on such columns, since we haven't seen reports of this in the wild; I instead discovered this issue while working on KUDU-1751 (changing those encodings to be the default). The patch fixes the issue and also expands test coverage to cover the GetFirstKey/GetLastKey calls. Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 Reviewed-on: http://gerrit.cloudera.org:8080/5192 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert--- M src/kudu/cfile/bshuf_block.cc M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/encoding-test.cc 3 files changed, 49 insertions(+), 11 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/5192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()
Dan Burkert has posted comments on this change. Change subject: bshuf_block: fix GetFirstKey() and GetLastKey() .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/5192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()
Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5192 to look at the new patch set (#5). Change subject: bshuf_block: fix GetFirstKey() and GetLastKey() .. bshuf_block: fix GetFirstKey() and GetLastKey() This fixes a bug where the bitshuffle encoding would return the wrong value for GetFirstKey() and GetLastKey(). The issue was that, in the case of UINT32s, the data_ array was being inspected after 'Finish()' had already collapsed down the elements to their minimum width. Since this collapsing is only done on UINT32 types, this would only affect these functions on dictionary-encoded binary blocks, which use bitshuffle internally for storing the UINT32 code words. However, the "cleanup" commit prior to this one also fixed some incorrect indexing that would affect BIT_SHUFFLE blocks outside the context of UINT32. These functions are only used in the case that the column in question is a non-composite primary key, in order to build the value index as well as to determine the min/max bounds of a written CFile. The result of returning the wrong value for GetLastKey() was that the resulting index blocks would be incorrect, and seeks based on key would end up at the wrong row. This could cause spurious "row not found" errors or even crashes in some cases. Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING on such columns, since we haven't seen reports of this in the wild; I instead discovered this issue while working on KUDU-1751 (changing those encodings to be the default). The patch fixes the issue and also expands test coverage to cover the GetFirstKey/GetLastKey calls. Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 --- M src/kudu/cfile/bshuf_block.cc M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/encoding-test.cc 3 files changed, 49 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/5 -- To view, visit http://gerrit.cloudera.org:8080/5192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] bshuf block: fix GetFirstKey() and GetLastKey()
Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/5192 to look at the new patch set (#4). Change subject: bshuf_block: fix GetFirstKey() and GetLastKey() .. bshuf_block: fix GetFirstKey() and GetLastKey() This fixes a bug where the bitshuffle encoding would return the wrong value for GetFirstKey() and GetLastKey(). The issue was that, in the case of UINT32s, the data_ array was being inspected after 'Finish()' had already collapsed down the elements to their minimum width. Since this collapsing is only done on UINT32 types, this would only affect these functions on dictionary-encoded binary blocks, which use bitshuffle internally for storing the UINT32 code words. However, the "cleanup" commit prior to this one also fixed some incorrect indexing that would affect BIT_SHUFFLE blocks outside the context of UINT32. These functions are only used in the case that the column in question is a non-composite primary key, in order to build the value index as well as to determine the min/max bounds of a written CFile. The result of returning the wrong value for GetLastKey() was that the resulting index blocks would be incorrect, and seeks based on key would end up at the wrong row. This could cause spurious "row not found" errors or even crashes in some cases. Apparently real users have not made use of BITSHUFFLE or DICT_ENCODING on such columns, since we haven't seen reports of this in the wild; I instead discovered this issue while working on KUDU-1751 (changing those encodings to be the default). The patch fixes the issue and also expands test coverage to cover the GetFirstKey/GetLastKey calls. Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 --- M src/kudu/cfile/bshuf_block.cc M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/encoding-test.cc 3 files changed, 49 insertions(+), 11 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/5192/4 -- To view, visit http://gerrit.cloudera.org:8080/5192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I83dbc01ebf7a10e69b7fff6b7973967ebf6b4580 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins