viktorluc-db commented on code in PR #47771:
URL: https://github.com/apache/spark/pull/47771#discussion_r1729576212
##########
common/unsafe/src/main/java/org/apache/spark/unsafe/UTF8StringBuilder.java:
##########
@@ -93,7 +93,38 @@ public void appendBytes(Object base, long offset, int
length) {
cursor += length;
}
+ public void appendByte(byte singleByte) {
Review Comment:
I put it as a public method in case anybody needs to append a single byte
without having to allocate any objects on the heap (just as I would have to in
the 'appendCodePoint' method if i decided to use the already existing
'appendBytes' method). I suppose that it is unnecessary to expose it with the
assumption that somebody will need it at some point, as it can be exposed later
if needed. I will change it to private.
For the second part, if i understood correctly, you proposed to make four
new methods, where they respectively take 1,2,3,4 bytes that we want to append
as parameters?
If thats the case, that would be much more efficient with respect to number
of method calls (it would be 2 calls always), than the current implementation
(which has 2n calls, where n is the number of appended bytes). In the current
implementation, the number of buffer reallocations(due to calling 'grow') will
be usually at most 1 per call of 'appendCodePoint', whereas in your proposed
implementation it would be always at most 1. I had this in mind but I traded
efficiency for code conciseness in this place. I think your proposal sounds
good since my tradeoff was bad. Will implement it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]