Re: [RFC] Time to talk about StringNG merge again?
Now I see. I have worked on this too mich - I keep seeing what I believe it is instead of what it actually is. Fixed in both rawSpace and reserveSpace. On Wed, Jul 31, 2013 at 7:12 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/31/2013 11:11 AM, Alex Rousskov wrote: On 07/31/2013 10:12 AM, Kinkie wrote: Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). Ok. I'm going to check minSpace. maxSize+minSpace is definitely not enough to overflow size_type minSpace is controlled completely by the unknown caller code. It may be UINT_MAX or something equally capable of overflowing when you add to it. What is currently done is: reserveSpace(minSpace) { reserveCapacity(length()+minSpace); } Kinkie, I am afraid you are missing the point here. Consider the case where minSpace is the maximum value that size_type can hold and length() is 100. The above sum overflows _before_ any of your checks in reserveCapacity() happen. The correct check in reserveSpace() is: Must(minSpace = 0 length() maximum size_type value - minSpace) Must(minSpace = 0 length() = maximum size_type value - minSpace) Alex. -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
Another small bug in that code is that arguments or even fmt might actually end with a valid \0 byte that should be appended and counted (not a c-string terminating byte). This check should be done once, with controlled fmt and arguments so that we can detect problems reliably. You can do that at module initialization time, for example (./configure might be too early because libraries might change) and set a static SnprintfCountsTerminator variable accordingly. char buf[16]; SprintfCountsTerminator = snprintf(buf, sizeof(buf), %s, 1) == 2; Clever! Done. -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/30/2013 03:56 AM, Kinkie wrote: lp:~squid/squid/stringng-cherrypick has the most uptodate code. I did not finish the review before running out of time, but I found one or two bugs in r12761 (in the code I have not looked at before). There are also a few polishing issues: Thanks. typedef int32_t size_type; I think we should make it unsigned. I agree with your earlier observation that it is better to do that now rather than later or never. Yes, it is agreed. It's however something I'd prefer to do as a separate change, after everything else has been defined in order to have a clear impact. void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);} As I mentioned before, this lacks overflow control if the sum of length() and minSpace exceeds size_type maximum. Please add a check. reserve* calls cow() which - if a resizing is needed - calls reAlloc which throws SBufTooBigException if the requested size is too big. Isn't that good enough? cow(minSpace+length()); Same here, in rawSpace(). /** Request to extend the SBuf's free store space. /** Request to resize the SBuf's store capacity The caller does not really request to extend or resize the store because they do not normally know what the store size is. I suggest replacing extend and resize with guarantee. Done. * This method can also be used to prepare the SBuf by preallocating a * predefined amount of free space; this may help to optimize subsequent * calls to SBuf::append or similar methods. In this case the returned * pointer should be ignored. ... char *rawSpace(size_type minSize); Please remove that part of the comment. You can say See also: reserveSpace() if you want. Done. However, I would _add_ something like this: Unlike reserveSpace(), this method does not guarantee exclusive buffer ownership. It is instead optimized for a one writer (appender), many readers scenario by avoiding unnecessary copying and allocations. This is perfect; thanks! //reserve twice the format-string size, it's a likely heuristic rawSpace(strlen(fmt)*2); while (length() = maxSize) { ... sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap); The above does not illustrate how to use rawSpace() well. We do not have to provide good examples in lower level code, but I think it is a good idea to eat our own dog food. The above code should probably be rewritten as // heuristic: we will need twice as much space as the format const size_type minSpace = strlen(fmt)*2; while (length() = maxSize) { ... const char *space = rawSpace(minSpace); sz = vsnprintf(space, spaceSize(), fmt, ap); Done (but space must be a char *) BTW, please add SBuf::spaceSize() as a public method so that optimization like the above will work. Otherwise, non-privileged users will have to write foobar(buf.rawSpace(minSize), minSize); instead of getting an advantage of the actual space size being larger than the requested minimum. Done, but I'm quite wary of this: it implies a knowledge about internal store positioning which I would prefer to hide from clients. if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics I would use reserveSpace() here instead of ignoring rawSpace() result because we know the buffer needs to be reallocated at this point, right? Hand-unrolled code doesn't ignore that return value anymore (see below) /* snprintf on Linux returns -1 on overflows */ ... if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); If snprintf on Linux returns -1 on overflows, we should not throw when sz is -1. We should grow the buffer instead, right? Man snprintf(3) says If an output error is encountered, a negative value is returned. I guess the comment is broken. vsnprintf is standard as of C99; it should be therefore quite consistent in our build environment. while (length() = maxSize) { sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap); ... /* snprintf on FreeBSD returns at least free_space on overflows */ if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); else break; } Nothing in that loop grows length(), which is your guard against an infinite loop on FreeBSD (at least). Can you think of a better loop guard? This loop will be executed at most twice via break clause, as on the second go we
Re: [RFC] Time to talk about StringNG merge again?
On 1/08/2013 12:18 a.m., Kinkie wrote: On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/30/2013 03:56 AM, Kinkie wrote: lp:~squid/squid/stringng-cherrypick has the most uptodate code. I did not finish the review before running out of time, but I found one or two bugs in r12761 (in the code I have not looked at before). There are also a few polishing issues: Thanks. typedef int32_t size_type; I think we should make it unsigned. I agree with your earlier observation that it is better to do that now rather than later or never. Yes, it is agreed. It's however something I'd prefer to do as a separate change, after everything else has been defined in order to have a clear impact. Yes and no. It will probably cause a lot of little changes all over the classes involved. So doing it at a point of stability as a separate commit to stringng is good but not a reason to void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);} As I mentioned before, this lacks overflow control if the sum of length() and minSpace exceeds size_type maximum. Please add a check. reserve* calls cow() which - if a resizing is needed - calls reAlloc which throws SBufTooBigException if the requested size is too big. Isn't that good enough? Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). cow(minSpace+length()); Same here, in rawSpace(). BTW, please add SBuf::spaceSize() as a public method so that optimization like the above will work. Otherwise, non-privileged users will have to write foobar(buf.rawSpace(minSize), minSize); instead of getting an advantage of the actual space size being larger than the requested minimum. Done, but I'm quite wary of this: it implies a knowledge about internal store positioning which I would prefer to hide from clients. I think we may find a reason in future to extend this futther to allow caller handling of the cases where realloc is not possible to the new minSize. But for now that should be enough. if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics I would use reserveSpace() here instead of ignoring rawSpace() result because we know the buffer needs to be reallocated at this point, right? Hand-unrolled code doesn't ignore that return value anymore (see below) /* snprintf on Linux returns -1 on overflows */ ... if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); If snprintf on Linux returns -1 on overflows, we should not throw when sz is -1. We should grow the buffer instead, right? Man snprintf(3) says If an output error is encountered, a negative value is returned. I guess the comment is broken. vsnprintf is standard as of C99; it should be therefore quite consistent in our build environment. What other types of outt error than buffer overflow can occur? Might be worth a little test to see what overflow produces exactly. And I agree with Alex that growing is the better way to handle it, on the proviso that we can reliably detect that overflow. while (length() = maxSize) { sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap); ... /* snprintf on FreeBSD returns at least free_space on overflows */ if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); else break; } Nothing in that loop grows length(), which is your guard against an infinite loop on FreeBSD (at least). Can you think of a better loop guard? This loop will be executed at most twice via break clause, as on the second go we _know_ how much we have to write, the first attempt is . I'm hand-unrolling it. len_ += sz; // TODO: this does NOT belong here, but to class-init or autoconf /* on Linux and FreeBSD, '\0' is not counted in return value */ /* on XXX it might be counted */ /* check that '\0' is appended and not counted */ if (operator[](len_-1) == '\0') { --sz; --len_; } There is no guarantee that len_ is positive. We could start with a zero len_ and an empty pattern would result in zero sz. The above code may then access character at -1 offset of the raw storage array... vprintf() is probably so slow that any code using it is not optimized. We should use at(len_-1) instead here. Ok. This will go away by itself when we move to unsigned size_type. /* check that '\0' is appended and not counted */ if (operator[](len_-1) == '\0') { Perhaps this day was too long, but I do not think the
Re: [RFC] Time to talk about StringNG merge again?
On Wed, Jul 31, 2013 at 2:46 PM, Amos Jeffries squ...@treenet.co.nz wrote: On 1/08/2013 12:18 a.m., Kinkie wrote: On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/30/2013 03:56 AM, Kinkie wrote: lp:~squid/squid/stringng-cherrypick has the most uptodate code. I did not finish the review before running out of time, but I found one or two bugs in r12761 (in the code I have not looked at before). There are also a few polishing issues: Thanks. typedef int32_t size_type; I think we should make it unsigned. I agree with your earlier observation that it is better to do that now rather than later or never. Yes, it is agreed. It's however something I'd prefer to do as a separate change, after everything else has been defined in order to have a clear impact. Yes and no. It will probably cause a lot of little changes all over the classes involved. So doing it at a point of stability as a separate commit to stringng is good but not a reason to How about doing it after this round of reviews is done? That should be a reasonable compromise. void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);} As I mentioned before, this lacks overflow control if the sum of length() and minSpace exceeds size_type maximum. Please add a check. reserve* calls cow() which - if a resizing is needed - calls reAlloc which throws SBufTooBigException if the requested size is too big. Isn't that good enough? Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). Ok. I'm going to check minSpace. maxSize+minSpace is definitely not enough to overflow size_type cow(minSpace+length()); Same here, in rawSpace(). BTW, please add SBuf::spaceSize() as a public method so that optimization like the above will work. Otherwise, non-privileged users will have to write foobar(buf.rawSpace(minSize), minSize); instead of getting an advantage of the actual space size being larger than the requested minimum. Done, but I'm quite wary of this: it implies a knowledge about internal store positioning which I would prefer to hide from clients. I think we may find a reason in future to extend this futther to allow caller handling of the cases where realloc is not possible to the new minSize. But for now that should be enough. Ok. if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics I would use reserveSpace() here instead of ignoring rawSpace() result because we know the buffer needs to be reallocated at this point, right? Hand-unrolled code doesn't ignore that return value anymore (see below) /* snprintf on Linux returns -1 on overflows */ ... if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); If snprintf on Linux returns -1 on overflows, we should not throw when sz is -1. We should grow the buffer instead, right? Man snprintf(3) says If an output error is encountered, a negative value is returned. I guess the comment is broken. vsnprintf is standard as of C99; it should be therefore quite consistent in our build environment. What other types of outt error than buffer overflow can occur? Might be worth a little test to see what overflow produces exactly. And I agree with Alex that growing is the better way to handle it, on the proviso that we can reliably detect that overflow. Overflow is signaled by returning a size greater than the buffer. The freebsd man page mentions possible errors in write(2) ( does not apply here) but also ENOMEM or EILSEQ (illegal wide character). Kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 1/08/2013 12:58 a.m., Kinkie wrote: On Wed, Jul 31, 2013 at 2:46 PM, Amos Jeffries squ...@treenet.co.nz wrote: On 1/08/2013 12:18 a.m., Kinkie wrote: On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/30/2013 03:56 AM, Kinkie wrote: lp:~squid/squid/stringng-cherrypick has the most uptodate code. I did not finish the review before running out of time, but I found one or two bugs in r12761 (in the code I have not looked at before). There are also a few polishing issues: Thanks. typedef int32_t size_type; I think we should make it unsigned. I agree with your earlier observation that it is better to do that now rather than later or never. Yes, it is agreed. It's however something I'd prefer to do as a separate change, after everything else has been defined in order to have a clear impact. Yes and no. It will probably cause a lot of little changes all over the classes involved. So doing it at a point of stability as a separate commit to stringng is good but not a reason to How about doing it after this round of reviews is done? That should be a reasonable compromise. void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);} As I mentioned before, this lacks overflow control if the sum of length() and minSpace exceeds size_type maximum. Please add a check. reserve* calls cow() which - if a resizing is needed - calls reAlloc which throws SBufTooBigException if the requested size is too big. Isn't that good enough? Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). Ok. I'm going to check minSpace. maxSize+minSpace is definitely not enough to overflow size_type minSpace is controlled completely by the unknown caller code. It may be UINT_MAX or something equally capable of overflowing when you add to it. I think the right check for here is: if (maxSize - length() minSpace || minSpace 0) abort(). The math being done on the two values we are confident about already and ensuring that the leftover space is enough for the caller. The extra 0 check disappearing when we move to unsigned of course. Amos
Re: [RFC] Time to talk about StringNG merge again?
Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). Ok. I'm going to check minSpace. maxSize+minSpace is definitely not enough to overflow size_type minSpace is controlled completely by the unknown caller code. It may be UINT_MAX or something equally capable of overflowing when you add to it. What is currently done is: reserveSpace(minSpace) { reserveCapacity(length()+minSpace); } reserveCapacity(minCapacity) { Must(0 = minCapacity = maxSize); // maxSize is 0xfff, 256Mb cow(minCapacity); } cow(newsize) //it's private { if(!need_to_resize()) return; reAlloc(newsize); } reAlloc(newsize) { // it's private Must (newsize maxSize); } I think the right check for here is: if (maxSize - length() minSpace || minSpace 0) abort(). The math being done on the two values we are confident about already and ensuring that the leftover space is enough for the caller. I'm currently checking 0=requestedSpace=maxSize; worst case is that if someone requests reserveSpace(maxSize); the exception will be triggered in reserveCapacity() instead of reserveSpace() as the length()+requestedSpace will be checked there. I am convinced that a few of these checks are redundant, but CPU power is cheaper than brainpower. The extra 0 check disappearing when we move to unsigned of course. Yes. -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/31/2013 06:18 AM, Kinkie wrote: typedef int32_t size_type; I think we should make it unsigned. I agree with your earlier observation that it is better to do that now rather than later or never. Yes, it is agreed. It's however something I'd prefer to do as a separate change, after everything else has been defined in order to have a clear impact. I think that is a good approach. Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 07/31/2013 10:12 AM, Kinkie wrote: Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). Ok. I'm going to check minSpace. maxSize+minSpace is definitely not enough to overflow size_type minSpace is controlled completely by the unknown caller code. It may be UINT_MAX or something equally capable of overflowing when you add to it. What is currently done is: reserveSpace(minSpace) { reserveCapacity(length()+minSpace); } Kinkie, I am afraid you are missing the point here. Consider the case where minSpace is the maximum value that size_type can hold and length() is 100. The above sum overflows _before_ any of your checks in reserveCapacity() happen. The correct check in reserveSpace() is: Must(minSpace = 0 length() maximum size_type value - minSpace) The negative check must come first (but will be deleted later). HTH, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 07/31/2013 11:11 AM, Alex Rousskov wrote: On 07/31/2013 10:12 AM, Kinkie wrote: Not if the math overflowed down to a smaller value before it even got passed to reserveCapacity(). Ok. I'm going to check minSpace. maxSize+minSpace is definitely not enough to overflow size_type minSpace is controlled completely by the unknown caller code. It may be UINT_MAX or something equally capable of overflowing when you add to it. What is currently done is: reserveSpace(minSpace) { reserveCapacity(length()+minSpace); } Kinkie, I am afraid you are missing the point here. Consider the case where minSpace is the maximum value that size_type can hold and length() is 100. The above sum overflows _before_ any of your checks in reserveCapacity() happen. The correct check in reserveSpace() is: Must(minSpace = 0 length() maximum size_type value - minSpace) Must(minSpace = 0 length() = maximum size_type value - minSpace) Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 07/31/2013 06:18 AM, Kinkie wrote: On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov wrote: On 07/30/2013 03:56 AM, Kinkie wrote: len_ += sz; // TODO: this does NOT belong here, but to class-init or autoconf /* on Linux and FreeBSD, '\0' is not counted in return value */ /* on XXX it might be counted */ /* check that '\0' is appended and not counted */ if (operator[](len_-1) == '\0') { --sz; --len_; } There is no guarantee that len_ is positive. We could start with a zero len_ and an empty pattern would result in zero sz. The above code may then access character at -1 offset of the raw storage array... Ok. This will go away by itself when we move to unsigned size_type. I do not see how it will go away. If an unsigned len_ is zero, you would be accessing a non-existent byte at a very large (unsigned)-1 offset, which may even be worse than accessing a non-existent byte at -1 offset. Another small bug in that code is that arguments or even fmt might actually end with a valid \0 byte that should be appended and counted (not a c-string terminating byte). This check should be done once, with controlled fmt and arguments so that we can detect problems reliably. You can do that at module initialization time, for example (./configure might be too early because libraries might change) and set a static SnprintfCountsTerminator variable accordingly. char buf[16]; SprintfCountsTerminator = snprintf(buf, sizeof(buf), %s, 1) == 2; Cheers, Alex.
Re: [RFC] Time to talk about StringNG merge again?
Ok, it's now done, including documentation. lp:~squid/squid/stringng-cherrypick has the most uptodate code. On Mon, Jul 29, 2013 at 11:55 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/29/2013 01:38 PM, Kinkie wrote: So to implement this I should: - rename current reserveSpace to rawSpace, add returning the pointer as in current rawSpace. - reimplement reserveSpace as a convenience calling reserveCapacity - adjust clients of reserveSpace to use rawSpace instead. If this is correct, I'm OK with it. Yes, I think the above plan will result in the implementation I am rooting for. Thank you, Alex. -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/30/2013 03:56 AM, Kinkie wrote: lp:~squid/squid/stringng-cherrypick has the most uptodate code. I did not finish the review before running out of time, but I found one or two bugs in r12761 (in the code I have not looked at before). There are also a few polishing issues: typedef int32_t size_type; I think we should make it unsigned. I agree with your earlier observation that it is better to do that now rather than later or never. void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);} As I mentioned before, this lacks overflow control if the sum of length() and minSpace exceeds size_type maximum. Please add a check. cow(minSpace+length()); Same here, in rawSpace(). /** Request to extend the SBuf's free store space. /** Request to resize the SBuf's store capacity The caller does not really request to extend or resize the store because they do not normally know what the store size is. I suggest replacing extend and resize with guarantee. * This method can also be used to prepare the SBuf by preallocating a * predefined amount of free space; this may help to optimize subsequent * calls to SBuf::append or similar methods. In this case the returned * pointer should be ignored. ... char *rawSpace(size_type minSize); Please remove that part of the comment. You can say See also: reserveSpace() if you want. However, I would _add_ something like this: Unlike reserveSpace(), this method does not guarantee exclusive buffer ownership. It is instead optimized for a one writer (appender), many readers scenario by avoiding unnecessary copying and allocations. //reserve twice the format-string size, it's a likely heuristic rawSpace(strlen(fmt)*2); while (length() = maxSize) { ... sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap); The above does not illustrate how to use rawSpace() well. We do not have to provide good examples in lower level code, but I think it is a good idea to eat our own dog food. The above code should probably be rewritten as // heuristic: we will need twice as much space as the format const size_type minSpace = strlen(fmt)*2; while (length() = maxSize) { ... const char *space = rawSpace(minSpace); sz = vsnprintf(space, spaceSize(), fmt, ap); BTW, please add SBuf::spaceSize() as a public method so that optimization like the above will work. Otherwise, non-privileged users will have to write foobar(buf.rawSpace(minSize), minSize); instead of getting an advantage of the actual space size being larger than the requested minimum. if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics I would use reserveSpace() here instead of ignoring rawSpace() result because we know the buffer needs to be reallocated at this point, right? /* snprintf on Linux returns -1 on overflows */ ... if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); If snprintf on Linux returns -1 on overflows, we should not throw when sz is -1. We should grow the buffer instead, right? while (length() = maxSize) { sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap); ... /* snprintf on FreeBSD returns at least free_space on overflows */ if (sz = static_castint(store_-spaceSize())) rawSpace(sz*2); // TODO: tune heuristics else if (sz 0) // output error in vsnprintf throw TextException(output error in vsnprintf,__FILE__, __LINE__); else break; } Nothing in that loop grows length(), which is your guard against an infinite loop on FreeBSD (at least). Can you think of a better loop guard? len_ += sz; // TODO: this does NOT belong here, but to class-init or autoconf /* on Linux and FreeBSD, '\0' is not counted in return value */ /* on XXX it might be counted */ /* check that '\0' is appended and not counted */ if (operator[](len_-1) == '\0') { --sz; --len_; } There is no guarantee that len_ is positive. We could start with a zero len_ and an empty pattern would result in zero sz. The above code may then access character at -1 offset of the raw storage array... vprintf() is probably so slow that any code using it is not optimized. We should use at(len_-1) instead here. /* check that '\0' is appended and not counted */ if (operator[](len_-1) == '\0') { Perhaps this day was too long, but I do not think the above checks whether \0 was appended but not counted. It checks whether \0 was appended and counted. Which is probably what you want to check, so the comment should be adjusted to remove the word not. HTH, Alex. On Mon, Jul
Re: [RFC] Time to talk about StringNG merge again?
On Sun, Jul 28, 2013 at 1:03 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 02:57 PM, Kinkie wrote: On Sat, Jul 27, 2013 at 10:32 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 01:03 PM, Kinkie wrote: On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote: On 07/27/2013 12:00 PM, Kinkie wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } Current call chain (and functions for this purpose, outlined): append (various flavors) calls lowAppend which calls reserveSpace which checks if we're at the tail of the used portion of the store (via store-canAppend()) reAlloc() if needed // reAlloc is the low-level, unconditional part of cow() (back to lowAppend): tell the MemBlob to append the user-supplied buffer and adjust internal data Yes, I know. The current code is inconsistent/confusing because reserveSpace does not guarantee exclusive buffer ownership but reserveCapacity does. In the fixed design, that inconsistency is removed. I agree with you that it is clearer that both reserve* methods have the same semantics. Let's consider it done: they do the same thing and one is an inline wrapper of the other. The way I envisioned the key guarantee difference between reserve* and cow() is that: reserve* guarantees that it is safe to APPEND after the currently-used area and up to the requested size/space. cow() slso guarantees that it is safe to WRITE over the currently-used area, up to the requested size. The former is a weaker guarantee, and a possible optimization. What I understand about your request is that you want the stronger guarantee to be always used. It's not a big deal, as the cost is at most some unneeded copies here and there, and code-wise it's very simple. It'd just alias reserveCapacity to cow, and reserveSpace to wrap on top of that. Did I understand correctly? If so, I can do that right away. The other size is rawSpace(). For that, I'd like to adopt Amos's suggestion and enforce that the requester specify how much storage he needs. If he wants raw access to the data, he better know what he's doing. I don't like the idea of using rawSpace for append; MemBlob::append also does some housekeeping within MemBlob and I'm not keen on the idea shuffling that outside the class itself. It is however immaterial to this discussion, as it's an implementation detail from the SBuf user's point of view. The current design is inconsistent (similar-sounding methods behave very differently) and more dangerous (reserveSpace() does not return volatile raw space to the caller, increasing the changes that the caller will do something bad before extracting that [now stale] space from the SBuf). The proposed design solves both problems. It may be useful to document that reserve* methods are meant to be used only in preparation for other SBuf methods (assign, append, setAt etc., not for direct manipulation by the caller. This is implied by the documentation of rawSpace: (\warning Use with EXTREME caution, this is a dangerous operation.). To wrap it up, if you agre I will: - alias reserveCapacity(size) to cow(size) - reserveSpace(size) - reserveCapacity(size+length()) - rawSpace(size_type size=npos) - rawSpace(size_type size) And leave everything else as is. Do you agree? -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/29/2013 03:02 AM, Kinkie wrote: On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } The way I envisioned the key guarantee difference between reserve* and cow() is that: reserve* guarantees that it is safe to APPEND after the currently-used area and up to the requested size/space. cow() slso guarantees that it is safe to WRITE over the currently-used area, up to the requested size. I do not think it is a good idea to limit reserveCapacity() to append because absolutely nothing in its name and in the semantics of standard STL methods with similar names restrict them to append. If you call it reserve*() it has to apply to buffer capacity for any subsequent operation. cow() is a private method. It's primary goal is ensuring single ownership. Its capacity parameter is just a performance optimization: When we have to copy, it saves time to enlarge the buffer as needed as well. Perhaps that internal optimization creates some confusion, prompting these discussions. From the API point of view, you may ignore that optimization and think of cow() as a method that ensures single ownership by creating a dedicated buffer and copying current content (if needed). The former is a weaker guarantee, and a possible optimization. What I understand about your request is that you want the stronger guarantee to be always used. It's not a big deal, as the cost is at most some unneeded copies here and there, and code-wise it's very simple. It'd just alias reserveCapacity to cow, and reserveSpace to wrap on top of that. Did I understand correctly? If so, I can do that right away. I think you understand what I am suggesting, but I am not sure you understand why it should be done that way and why there is no performance penalty associated with that solution (if other methods are implemented and used correctly). The other size is rawSpace(). For that, I'd like to adopt Amos's suggestion and enforce that the requester specify how much storage he needs. If he wants raw access to the data, he better know what he's doing. Fine with me. I don't like the idea of using rawSpace for append; which is rather absurd since rawSpace() is designed to be used for appending and can only be used safely for appending. MemBlob::append also does some housekeeping within MemBlob and I'm not keen on the idea shuffling that outside the class itself. It is however immaterial to this discussion, as it's an implementation detail from the SBuf user's point of view. I do not see any inappropriate housekeeping in MemBlob::append(). It just copies data and increments the used buffer size, as it should. The current design is inconsistent (similar-sounding methods behave very differently) and more dangerous (reserveSpace() does not return volatile raw space to the caller, increasing the changes that the caller will do something bad before extracting that [now stale] space from the SBuf). The proposed design solves both problems. It may be useful to document that reserve* methods are meant to be used only in preparation for other SBuf methods (assign, append, setAt etc., not for direct manipulation by the caller. There is no need to document that because reserve*() methods return void, as they should. To wrap it up, if you agre I will: - alias reserveCapacity(size) to cow(size) - reserveSpace(size) - reserveCapacity(size+length()) Yes. - rawSpace(size_type size=npos) - rawSpace(size_type size) OK (but irrelevant to this discussion). And leave everything else as is. No, you also need to move your optimization from reserveSpace() to rawSpace() and use rawSpace() in append(). There is no need to drop that very useful optimization! Cheers, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On Mon, Jul 29, 2013 at 5:32 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/29/2013 03:02 AM, Kinkie wrote: On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } The way I envisioned the key guarantee difference between reserve* and cow() is that: reserve* guarantees that it is safe to APPEND after the currently-used area and up to the requested size/space. cow() slso guarantees that it is safe to WRITE over the currently-used area, up to the requested size. I do not think it is a good idea to limit reserveCapacity() to append because absolutely nothing in its name and in the semantics of standard STL methods with similar names restrict them to append. If you call it reserve*() it has to apply to buffer capacity for any subsequent operation. Changing names is not unthinkable. After all we are not bound to a standard API. cow() is a private method. It's primary goal is ensuring single ownership. Its capacity parameter is just a performance optimization: When we have to copy, it saves time to enlarge the buffer as needed as well. Perhaps that internal optimization creates some confusion, prompting these discussions. From the API point of view, you may ignore that optimization and think of cow() as a method that ensures single ownership by creating a dedicated buffer and copying current content (if needed). Er.. no (again, it may be misnamed, let's focus on its purpose, we'll see if we need to change the purpose or the name at the end of this discussion). The purpose of cow is to ensure writeability, thus no store sharing (thus single ownership). It is a way for the client to say I'm about to write, please copy if needed. Nowhere the name talks about single ownership, although it is a necessary consequence. The responsibility to reAlloc's purpose is to grow the SBuf (BTW: it currently doesn't enforce that the requested size be bigger than the SBuf's data; it should. This is a bug). The former is a weaker guarantee, and a possible optimization. What I understand about your request is that you want the stronger guarantee to be always used. It's not a big deal, as the cost is at most some unneeded copies here and there, and code-wise it's very simple. It'd just alias reserveCapacity to cow, and reserveSpace to wrap on top of that. Did I understand correctly? If so, I can do that right away. I think you understand what I am suggesting, but I am not sure you understand why it should be done that way and why there is no performance penalty associated with that solution (if other methods are implemented and used correctly). Yes, you are right. The other size is rawSpace(). For that, I'd like to adopt Amos's suggestion and enforce that the requester specify how much storage he needs. If he wants raw access to the data, he better know what he's doing. Fine with me. I don't like the idea of using rawSpace for append; which is rather absurd since rawSpace() is designed to be used for appending and can only be used safely for appending. Doh! You're right, I hardwired my brain to think of an empty SBuf when calling it, and wasn't seeing what's written in the code. MemBlob::append also does some housekeeping within MemBlob and I'm not keen on the idea shuffling that outside the class itself. It is however immaterial to this discussion, as it's an implementation detail from the SBuf user's point of view. I do not see any inappropriate housekeeping in MemBlob::append(). It just copies data and increments the used buffer size, as it should. The current design is inconsistent (similar-sounding methods behave very differently) and more dangerous (reserveSpace() does not return volatile raw space to the caller, increasing the changes that the caller will do something bad before extracting that [now stale] space from the SBuf). The proposed design solves both problems. It may be useful to document that reserve* methods are meant to be used only in preparation for other SBuf methods (assign, append, setAt etc., not for direct manipulation by the caller. There is no need to document that because reserve*() methods return void, as they should. To wrap it up, if you agre I will: - alias reserveCapacity(size) to cow(size) - reserveSpace(size) - reserveCapacity(size+length()) Yes. - rawSpace(size_type
Re: [RFC] Time to talk about StringNG merge again?
On 07/29/2013 01:38 PM, Kinkie wrote: So to implement this I should: - rename current reserveSpace to rawSpace, add returning the pointer as in current rawSpace. - reimplement reserveSpace as a convenience calling reserveCapacity - adjust clients of reserveSpace to use rawSpace instead. If this is correct, I'm OK with it. Yes, I think the above plan will result in the implementation I am rooting for. Thank you, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 27/07/2013 11:38 a.m., Alex Rousskov wrote: On 07/26/2013 03:27 PM, Kinkie wrote: Resending due to MUA insisting on using HTML email.. On Thu, Jul 18, 2013 at 1:06 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/04/2013 11:51 PM, Kinkie wrote: void SBuf::reserveCapacity(size_type minCapacity) { Must(0 = minCapacity minCapacity = maxSize); reserveSpace(minCapacity-length()); } This does not look right. For example, if minCapacity is smaller than length(), we should do nothing, but will throw instead. I think this method should look something like this: Must(0 = minCapacity minCapacity = maxSize); cow(minCapacity); if (store_-canAppend(off_+len_, minSpace)) { debugs(24, 7, not growing); return; } AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The comment is badly worded but the code is correct: I am not sure which code you are referring to here, but I do not think it is correct for SBuf::reserveCapacity(minCapacity) to throw when minCapacity is smaller than length(). Decoupled the methods, but the other way around: reserveCapacity(totalCapacity) ensures unique ownership, reserveSpace(capacityToAdd) is an optimization to prime the SBuf to receive data. The upper bound in that Must() clause is actually redundant, as in case of a needed reallocation reserveSpace() - cow() - reAlloc() which throws a SBufTooBigException if size is too big. As a consequence, I'm removing the upper bound check. Once we change size_type to be unsigned, the lower bound check will be redundant too. A decision needs to be made if these two methods have different purposes or are to be simply different expressions of the same underlying concept (as it is now). Sorry, this question is too abstract for me to answer it -- it feels like both statements are true: The methods have different purpose but are manipulating the same underlying concept of available buffer area. One method deals with the total size of the buffer area, while the other is concerned with the size of the yet unused buffer area. As they are (were) one of the two is redundant, an in fact it was two lines of code. With the new semantics, reserveSpace is meant to announce to the SBuf I'm planning to append lots of data, please be prepared, while reserveCapacity means I'm going to meddle with your internal storage, and I'm going to need this space. Get ready. I do not see the difference because lots of data is going to be appended to your internal storage so both are meddling with that storage. To me, the difference was simply in argument math: Sometimes it is easier to specify the total capacity, sometimes the available space. The way the two functions are implemented now the difference is more than just math. One does not guarantee exclusive ownership and the other one does. Your semantics description above does not seem to convey that key difference. Here is the use case I am thinking about: sbuf.reserveSpace(ioSize); bytesRead = read(sbuf.rawSpace(), ioSize); sbuf.forceSize(sbuf.size() + bytesRead); The above case is already handled by rawSpace(), but now I am confused why rawSpace() is implemented using unconditional cow() while reserveSpace() appears to be optimizing something. That optimization seems to be the key here. Perhaps rawSpace() should be deleted and reserveCapacity() adjusted to use the same optimization? reserveSpace(n) ought to be reserveCapacity(content size + n) To summarize, 1) We need to agree whether the optimization currently residing in reserveSpace() should apply to reserveCapacity(). This is the key difference between the two methods. Documentation and code should be adjusted accordingly. 2) We probably should not have both reserveSpace() and rawSpace(). What about char* getReservedRawSpace(size) instead? The reservation mandatory when fetching the raw Space, * ensure any cow() is done before returning the buffer, * ensure it is 0-safe; the space starts with 0-terminator Amos
Re: [RFC] Time to talk about StringNG merge again?
On 28/07/2013 12:00 a.m., Kinkie wrote: Here is the use case I am thinking about: sbuf.reserveSpace(ioSize); bytesRead = read(sbuf.rawSpace(), ioSize); sbuf.forceSize(sbuf.size() + bytesRead); This explains it all, we're thinking about two different use cases. The other use case is: sbuf.reserveCapacity(newCapacity); sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows). This could be used e.g. to instantiate error pages from their templates (pseudo-code): Sbuf template(.); Sbuf errorpage; errorpage.reserveCapacity(template * reasonable_scaling_factor); SBufTokenizer t(template); while (SBuf parsed=t.nextToken(%)) { errorpage.append(parsed).append(handleCode(t.peek()); } errorpage.append(t.whatever_remains()); In cases such as this it may not be needed to cow(), so why do it? The above case is already handled by rawSpace(), but now I am confused why rawSpace() is implemented using unconditional cow() while reserveSpace() appears to be optimizing something. That optimization seems to be the key here. Perhaps rawSpace() should be deleted and reserveCapacity() adjusted to use the same optimization? reserveSpace(n) ought to be reserveCapacity(content size + n) Apart from this, I see the benefit of Amos' suggestion of having rawSpace also absorb the function of reserveCapacity. This also covers you observation that both reserveSpace and rawSpace are maybe not needed, one can do the job of both. No. I think what Alex is getting at is that they *do* have potential uses but those uses have very different semantic requirements, from each other and from what some (one?) of them do right now. There is a case for each of them: reserveCapacity(size) - in code needing and being able to hint a specific total capacity (minimum) to exist in the buffer. reserveSpace(size) - in code which only knows how much is going to be _added_, to simplify the caller code. This can therefore be an inline wrapper for reserveCapacity(currentSize+newSize). * both of the above are about optimizing cow() and ensuring extra space is available. Nothing more. rawSpace() - in code needing to play with the raw buffer for some reason. Adding size to rawSpace() adds the multiple benefits of not needing to explicitly run reserve*(), thus preventing mistakes with selecting _which_ reserve*(), enforcing that the space available is guaranteed to be a fixed known size in advance of that use, thus nobody can play with the buffer directly unless they know how much of it they will be playing with or at least can (over-)estimate an amount. Amos
Re: [RFC] Time to talk about StringNG merge again?
On 07/27/2013 05:35 AM, Amos Jeffries wrote: * ensure it is 0-safe; the space starts with 0-terminator Please no. We have enough bugs with implicit zero terminator being forced into buffers that do not need one, screwing up buffer size management and related optimizations/assertions. I suspects it all starts with an innocent 0-terminator added where it does not seem to hurt and might help. The space(0) call will not work correctly if we start adding implicit 0-terminators. Alex.
Re: [RFC] Time to talk about StringNG merge again?
On Sat, Jul 27, 2013 at 5:57 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 05:35 AM, Amos Jeffries wrote: * ensure it is 0-safe; the space starts with 0-terminator Please no. We have enough bugs with implicit zero terminator being forced into buffers that do not need one, screwing up buffer size management and related optimizations/assertions. I suspects it all starts with an innocent 0-terminator added where it does not seem to hurt and might help. The space(0) call will not work correctly if we start adding implicit 0-terminators. I'm fine with this. If you agree to Amos' proposed API (as modified by your proposal of not auto-terminating), I'll implement them immediately. -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/27/2013 06:00 AM, Kinkie wrote: Here is the use case I am thinking about: sbuf.reserveSpace(ioSize); bytesRead = read(sbuf.rawSpace(), ioSize); sbuf.forceSize(sbuf.size() + bytesRead); The other use case is: sbuf.reserveCapacity(newCapacity); sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows). Sure, that is another use case for another method. This could be used e.g. to instantiate error pages from their templates (pseudo-code): Sbuf template(.); Sbuf errorpage; errorpage.reserveCapacity(template * reasonable_scaling_factor); SBufTokenizer t(template); while (SBuf parsed=t.nextToken(%)) { errorpage.append(parsed).append(handleCode(t.peek()); } errorpage.append(t.whatever_remains()); In cases such as this it may not be needed to cow(), so why do it? AFAICT, your example immediately above will not cause CoW, with or without the current optimization in reserveSpace() because errorpage SBuf is not shared with other users. There will be no reason to Copy errorpage buffer. The following sketch may be a better example: SBuf buf; buf.reserveCapacity(...); while (there is more data and there is buffer space) { buf.append(data); SBuf token = nextToken(buf); give token to somebody else } In the above example, many tokens and a single buf share the same underlying buffer. A straight cow()-based implementation of reserveSpace() called by append() would result in one cow() per iteration. My understanding is that your current optimization in reserveSpace() will remove all cow()s in this case because while tokens and buf share buffer content, they do not share buffer space and so no CoW is really needed when appending to a shared buffer. Is my understanding correct? Do we want this optimization? I think we do. The above case is already handled by rawSpace(), but now I am confused why rawSpace() is implemented using unconditional cow() while reserveSpace() appears to be optimizing something. That optimization seems to be the key here. Perhaps rawSpace() should be deleted and reserveCapacity() adjusted to use the same optimization? reserveSpace(n) ought to be reserveCapacity(content size + n) Apart from this, I see the benefit of Amos' suggestion of having rawSpace also absorb the function of reserveCapacity. This also covers you observation that both reserveSpace and rawSpace are maybe not needed, one can do the job of both. *If the optimization discussed above works*, then we should consider having two or tree methods: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. 1b is optional but nice to have and costs only one line of code. I will focus on (1a) vs. (2) below. As you can see, (1a) and (2) semantics are very different. In some contexts, the difference may not be important at all. In some, it may be critical for good performance (if the optimization discussed above works). I see only one potential problem with this approach: We cannot be certain that some code is not going to first get space twice and only then use the obtained space twice, as illustrated below: char *buf1 = bigBuffer.rawSpace(...); char *buf2 = bigBuffer.rawSpace(...); ... read into buf1; read into buf2; // corrupts buf1 content Needless to say that the above buggy code would probably be spread over several functions so that the bug will not be clearly visible. I think we can try to combat the above by requiring that the pointer returned by the rawSpace() call is used immediately. HTH, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On Sat, Jul 27, 2013 at 6:39 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 06:00 AM, Kinkie wrote: Here is the use case I am thinking about: sbuf.reserveSpace(ioSize); bytesRead = read(sbuf.rawSpace(), ioSize); sbuf.forceSize(sbuf.size() + bytesRead); The other use case is: sbuf.reserveCapacity(newCapacity); sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows). Sure, that is another use case for another method. This could be used e.g. to instantiate error pages from their templates (pseudo-code): Sbuf template(.); Sbuf errorpage; errorpage.reserveCapacity(template * reasonable_scaling_factor); SBufTokenizer t(template); while (SBuf parsed=t.nextToken(%)) { errorpage.append(parsed).append(handleCode(t.peek()); } errorpage.append(t.whatever_remains()); In cases such as this it may not be needed to cow(), so why do it? AFAICT, your example immediately above will not cause CoW, with or without the current optimization in reserveSpace() because errorpage SBuf is not shared with other users. There will be no reason to Copy errorpage buffer. The following sketch may be a better example: SBuf buf; buf.reserveCapacity(...); while (there is more data and there is buffer space) { buf.append(data); SBuf token = nextToken(buf); give token to somebody else } In the above example, many tokens and a single buf share the same underlying buffer. A straight cow()-based implementation of reserveSpace() called by append() would result in one cow() per iteration. My understanding is that your current optimization in reserveSpace() will remove all cow()s in this case because while tokens and buf share buffer content, they do not share buffer space and so no CoW is really needed when appending to a shared buffer. Is my understanding correct? Yes; that's exactly the point. Append doesn't require cow. It only does when there is not enough unused space in the memblob OR the SBuf being appended to is not at the end of the used shared storage ( sorry for the multiple negatives) Do we want this optimization? I think we do. Good :) Then we agree. The above case is already handled by rawSpace(), but now I am confused why rawSpace() is implemented using unconditional cow() while reserveSpace() appears to be optimizing something. That optimization seems to be the key here. Perhaps rawSpace() should be deleted and reserveCapacity() adjusted to use the same optimization? reserveSpace(n) ought to be reserveCapacity(content size + n) Apart from this, I see the benefit of Amos' suggestion of having rawSpace also absorb the function of reserveCapacity. This also covers you observation that both reserveSpace and rawSpace are maybe not needed, one can do the job of both. *If the optimization discussed above works*, then we should consider having two or tree methods: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. 1b is optional but nice to have and costs only one line of code. I will focus on (1a) vs. (2) below. As you can see, (1a) and (2) semantics are very different. In some contexts, the difference may not be important at all. In some, it may be critical for good performance (if the optimization discussed above works). I see only one potential problem with this approach: We cannot be certain that some code is not going to first get space twice and only then use the obtained space twice, as illustrated below: char *buf1 = bigBuffer.rawSpace(...); char *buf2 = bigBuffer.rawSpace(...); ... read into buf1; read into buf2; // corrupts buf1 content Needless to say that the above buggy code would probably be spread over several functions so that the bug will not be clearly visible. I think we can try to combat the above by requiring that the pointer returned by the rawSpace() call is used immediately. Yes. It is the caller's risk and responsibility not to mess with the raw storage, and it should be avoided whenever possible. Several best practices are going to be helpful avoid issues, but they can't be technically enforced, only documented and reviewed. Given this, I expect that the common case is going to be 2. This is why I like Amos's suggestion more (two methods for the append-friendly optimization, and one for accessing the internals). -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/27/2013 12:00 PM, Kinkie wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. I like Amos's suggestion more (two methods for the append-friendly optimization, and one for accessing the internals). There should be no need for two methods for append-friendly optimization because append only cares about the space size, not total size. This is method #2 in the above list. Methods #1a and #1b ensure exclusive ownership. They do not return a pointer. They must not contain that append-friendly optimization! AFAICT, Amos explanation of what I was getting at is correct. His email did not discuss the optimization trick which is one more reason why we want rawSpace(), but I think we both suggest the same set of three methods: // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } Needless to say, it is critical to document these correctly and clearly so that we do not have to come back to this discussion again. HTH, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 12:00 PM, Kinkie wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. I like Amos's suggestion more (two methods for the append-friendly optimization, and one for accessing the internals). There should be no need for two methods for append-friendly optimization because append only cares about the space size, not total size. This is method #2 in the above list. Methods #1a and #1b ensure exclusive ownership. They do not return a pointer. They must not contain that append-friendly optimization! AFAICT, Amos explanation of what I was getting at is correct. His email did not discuss the optimization trick which is one more reason why we want rawSpace(), but I think we both suggest the same set of three methods: // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } Needless to say, it is critical to document these correctly and clearly so that we do not have to come back to this discussion again. I don't understand how this helps with the append() use-case. The most likely useage pattern would be 2, and that forces a potentially unneeded cow().. IMO the three methods would be: void reserveCapacity(size_type minCap) { if (needed()) cow(); } void reserveSpace(size_type minSpace) { reserveCapacity(minSpace+length()); } char *rawSpace(size_type minSpace) { cow(minSpace+length()); return *freespace; } -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/27/2013 01:03 PM, Kinkie wrote: On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote: On 07/27/2013 12:00 PM, Kinkie wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } I don't understand how this helps with the append() use-case. append() should call rawSpace(). rawSpace() is optimized. The most likely useage pattern would be 2, and that forces a potentially unneeded cow().. rawSpace() does not force that because your optimization should be placed there. IMO the three methods would be: void reserveCapacity(size_type minCap) { if (needed()) cow(); } I assume that cow() already implements the if needed part. void reserveSpace(size_type minSpace) { reserveCapacity(minSpace+length()); } Yes, plus the size_type overflow check. char *rawSpace(size_type minSpace) { cow(minSpace+length()); return *freespace; } No, this needs your no-cow optimization instead. That optimization is currently in SBuf::reserveSpace() but should be in rawSpace(). HTH, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On Sat, Jul 27, 2013 at 10:32 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 01:03 PM, Kinkie wrote: On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote: On 07/27/2013 12:00 PM, Kinkie wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } I don't understand how this helps with the append() use-case. append() should call rawSpace(). rawSpace() is optimized. Current call chain (and functions for this purpose, outlined): append (various flavors) calls lowAppend which calls reserveSpace which checks if we're at the tail of the used portion of the store (via store-canAppend()) reAlloc() if needed // reAlloc is the low-level, unconditional part of cow() (back to lowAppend): tell the MemBlob to append the user-supplied buffer and adjust internal data (typing as I think) using rawSpace as you suggest would shuffle the responsibility of copying data into the MemBlob out of MemBlob itself - a layering breakage IMO cow() has a different contract from reserveSpace: it doesn't check if the MemBlob is append-able. It promises to reAlloc unless 1. there is one sole handle to the MemBlob and 2. there is enough space in the Blob for whatever the user requests to add. I'm not saying that the same results can't be achieved with the approach you suggest: it's just designed differently, IMO simply with a stronger layering and different contracts. The most likely useage pattern would be 2, and that forces a potentially unneeded cow().. rawSpace() does not force that because your optimization should be placed there. Yes. IMO the three methods would be: void reserveCapacity(size_type minCap) { if (needed()) cow(); } I assume that cow() already implements the if needed part. It makes a different promise, see above. reserveCapacity(minSpace+length()); } Yes, plus the size_type overflow check. Sure. char *rawSpace(size_type minSpace) { cow(minSpace+length()); return *freespace; } No, this needs your no-cow optimization instead. That optimization is currently in SBuf::reserveSpace() but should be in rawSpace(). I now get it. I fail to see how this is better than what's currently done, however :( -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
On 07/27/2013 02:57 PM, Kinkie wrote: On Sat, Jul 27, 2013 at 10:32 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/27/2013 01:03 PM, Kinkie wrote: On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote: On 07/27/2013 12:00 PM, Kinkie wrote: 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership. 1b. Reserve buffer space. Ensure exclusive buffer ownership. 2. Reserve N space bytes for the caller to append to. No guarantees regarding buffer ownership are provided. // 1a. void reserveCapacity(size_type minCap) { cow(minCap); } // 1b. void reserveSpace(size_type minSpace) { // check that the sum below does not exceed size_type Must(size() = size_type's limit - minSpace); reserveCapacity(size() + minSpace); } // 2. char *rawSpace(size_type minSpace) { ... your optimization goes here ... return pointer to space; } Current call chain (and functions for this purpose, outlined): append (various flavors) calls lowAppend which calls reserveSpace which checks if we're at the tail of the used portion of the store (via store-canAppend()) reAlloc() if needed // reAlloc is the low-level, unconditional part of cow() (back to lowAppend): tell the MemBlob to append the user-supplied buffer and adjust internal data Yes, I know. The current code is inconsistent/confusing because reserveSpace does not guarantee exclusive buffer ownership but reserveCapacity does. In the fixed design, that inconsistency is removed. (typing as I think) using rawSpace as you suggest would shuffle the responsibility of copying data into the MemBlob out of MemBlob itself - a layering breakage IMO The change is essentially moving one method body into another. It does not introduce any new layer breakage AFAICT. Cow() continues to copy data if needed, as it does now, via reAlloc(). cow() has a different contract from reserveSpace: Not sure why you are discussing reserveSpace(). It is just a convenience wrapper for reserveCapacity(). it doesn't check if the MemBlob is append-able. I do not know what appendable buffer means exactly in this context, but reserveCapacity() is not specific to append in the fixed design. It is about taking control over the buffer and ensuring its minimal size. I believe that is what your current cow() does as well. It promises to reAlloc unless 1. there is one sole handle to the MemBlob and 2. there is enough space in the Blob for whatever the user requests to add. Exactly. This is what reserveCapacity() needs. I'm not saying that the same results can't be achieved with the approach you suggest: it's just designed differently, IMO simply with a stronger layering and different contracts. The current design is inconsistent (similar-sounding methods behave very differently) and more dangerous (reserveSpace() does not return volatile raw space to the caller, increasing the changes that the caller will do something bad before extracting that [now stale] space from the SBuf). The proposed design solves both problems. I now get it. I fail to see how this is better than what's currently done, however :( If there is no difference to you, please change it because there is a significant difference to me (which I am apparently failing to convey). Thank you, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 07/26/2013 03:27 PM, Kinkie wrote: Resending due to MUA insisting on using HTML email.. On Thu, Jul 18, 2013 at 1:06 AM, Alex Rousskov rouss...@measurement-factory.com wrote: On 07/04/2013 11:51 PM, Kinkie wrote: void SBuf::reserveCapacity(size_type minCapacity) { Must(0 = minCapacity minCapacity = maxSize); reserveSpace(minCapacity-length()); } This does not look right. For example, if minCapacity is smaller than length(), we should do nothing, but will throw instead. I think this method should look something like this: Must(0 = minCapacity minCapacity = maxSize); cow(minCapacity); if (store_-canAppend(off_+len_, minSpace)) { debugs(24, 7, not growing); return; } AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The comment is badly worded but the code is correct: I am not sure which code you are referring to here, but I do not think it is correct for SBuf::reserveCapacity(minCapacity) to throw when minCapacity is smaller than length(). Decoupled the methods, but the other way around: reserveCapacity(totalCapacity) ensures unique ownership, reserveSpace(capacityToAdd) is an optimization to prime the SBuf to receive data. The upper bound in that Must() clause is actually redundant, as in case of a needed reallocation reserveSpace() - cow() - reAlloc() which throws a SBufTooBigException if size is too big. As a consequence, I'm removing the upper bound check. Once we change size_type to be unsigned, the lower bound check will be redundant too. A decision needs to be made if these two methods have different purposes or are to be simply different expressions of the same underlying concept (as it is now). Sorry, this question is too abstract for me to answer it -- it feels like both statements are true: The methods have different purpose but are manipulating the same underlying concept of available buffer area. One method deals with the total size of the buffer area, while the other is concerned with the size of the yet unused buffer area. As they are (were) one of the two is redundant, an in fact it was two lines of code. With the new semantics, reserveSpace is meant to announce to the SBuf I'm planning to append lots of data, please be prepared, while reserveCapacity means I'm going to meddle with your internal storage, and I'm going to need this space. Get ready. I do not see the difference because lots of data is going to be appended to your internal storage so both are meddling with that storage. To me, the difference was simply in argument math: Sometimes it is easier to specify the total capacity, sometimes the available space. The way the two functions are implemented now the difference is more than just math. One does not guarantee exclusive ownership and the other one does. Your semantics description above does not seem to convey that key difference. Here is the use case I am thinking about: sbuf.reserveSpace(ioSize); bytesRead = read(sbuf.rawSpace(), ioSize); sbuf.forceSize(sbuf.size() + bytesRead); The above case is already handled by rawSpace(), but now I am confused why rawSpace() is implemented using unconditional cow() while reserveSpace() appears to be optimizing something. That optimization seems to be the key here. Perhaps rawSpace() should be deleted and reserveCapacity() adjusted to use the same optimization? reserveSpace(n) ought to be reserveCapacity(content size + n) To summarize, 1) We need to agree whether the optimization currently residing in reserveSpace() should apply to reserveCapacity(). This is the key difference between the two methods. Documentation and code should be adjusted accordingly. 2) We probably should not have both reserveSpace() and rawSpace(). HTH, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 07/04/2013 11:51 PM, Kinkie wrote: void SBuf::reserveCapacity(size_type minCapacity) { Must(0 = minCapacity minCapacity = maxSize); reserveSpace(minCapacity-length()); } This does not look right. For example, if minCapacity is smaller than length(), we should do nothing, but will throw instead. I think this method should look something like this: Must(0 = minCapacity minCapacity = maxSize); cow(minCapacity); if (store_-canAppend(off_+len_, minSpace)) { debugs(24, 7, not growing); return; } AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The comment is badly worded but the code is correct: I am not sure which code you are referring to here, but I do not think it is correct for SBuf::reserveCapacity(minCapacity) to throw when minCapacity is smaller than length(). A decision needs to be made if these two methods have different purposes or are to be simply different expressions of the same underlying concept (as it is now). Sorry, this question is too abstract for me to answer it -- it feels like both statements are true: The methods have different purpose but are manipulating the same underlying concept of available buffer area. One method deals with the total size of the buffer area, while the other is concerned with the size of the yet unused buffer area. IMO, thinking about it, it does make sense that reserveSpace also guarantees that after the call the MemBlob will be exclusively owned by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be meant as an optimization, to hint the SBuf memory management system that an append operation of a known size is about to happen, so that only one cow is performed. What do you think? I think that both methods should be adjusted, possibly as sketched above. They should not throw when the required space or capacity is available. The documentation should tell users that both reserve*() methods ensure single-ownership of the buffer by the caller. HTH, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 9/07/2013 8:16 p.m., Kinkie wrote: Also, I suggest splitting this into two methods, one with a required (first?) SBufCaseSensitive parameter and one without it. This will allow callers to specify n without specifying isCaseSensitive and vice versa. The shorter, inlined method will simply call the longer one with caseSensitive, of course, so the implementation will not require more code or performance overhead. Ok. I'm calling the two shorthand versions cmp and casecmp respectively (please let me know if you'd prefer the naming-convention compliant caseCmp instead) Please. We need semantic similarity to std::string not symbolic. I take this as a please yes. Correct. Sorry for not being clear. Code is as before at lp:~squid/squid/stringng-cherrypick, ready and in sync with trunk. test-suite runs OK on eu and ubuntu raring. I'll be on holiday in a couple of days for a couple of weeks. In the meantime, I've found quite a few issues with \0-cleanliness and parameter auto-promotion; I've also noticed that I've messed up the commits crossing things with an older branch (where things were silently failing as it didn't contain Alex' extensive unit tests)., I'll have to tidy things up, current status is not done :( How long is that going to take do you think? Anything I can do to it while you are on holiday? Amos
Re: [RFC] Time to talk about StringNG merge again?
That comment is probably stale because canAppend() is currently not related to ownership of the buffer: It will happily return true for shared buffers. AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The comment is badly worded but the code is correct: canAppend() tells us if the end of our buffer is at the end of the used portion of the MemBlob, and there is enough space in the unused MemBlob portion to contain the data we plan to use. If the buffer is shared, this guarantee may become stale at the next append operation to any SBuf sharing the same MemBlob. A decision needs to be made if these two methods have different purposes or are to be simply different expressions of the same underlying concept (as it is now). IMO, thinking about it, it does make sense that reserveSpace also guarantees that after the call the MemBlob will be exclusively owned by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be meant as an optimization, to hint the SBuf memory management system that an append operation of a known size is about to happen, so that only one cow is performed. What do you think? Sounds good. The above two changes imply that reserve*() methods ensure single-ownership of the buffer by the caller. This is not something documentation of those methods currently guarantees, but I think it is logical to assume that anybody calling one of these two methods wants to modify the string and, hence, will need exclusive control over that string. I cannot think of any reason to call those methods otherwise. See above. typedef int32_t size_type; Would it be better to make this unsigned? I do not think string sizes can go negative. std::string is using an unsigned type, right? FWIW, whenever I try to use MemBlob that has a signed size_type as well, I am forced to do casts that would not exist otherwise. I think both types should be changed to reflect what they are representing (a size of a buffer). I have the same doubt. I'm just a bit afraid to change this late. Will try to do as an isolated change. So far this is NOT done. On another note, would it be better to use MemBlob::size_type instead of int32_t? currently they're both int32_t. I believe it's more readable with the extra level of indirection, at the cost of having to manually keep them in sync. Added a comment to highlight this fact. So long as SBuf is backed by MemBlob defining SBuf::size_type to be MemBlob::size_type makes more sense. One who needs to know can always look up both (probably should). The compiler will tell about signedness errors in new/modified code very quickly. Ok, changed to reference MemBlob::size_type instead. The problem will be converting from signed to unsigned at this stage. I think get the rest to a milestone we can agree on merging before attempting please. Then if everthing goes sour in the change we can still revert and merge with a TODO about perfecting the polish. When you do the change ensure you have a latest GCC or clang you can use, they will be far more pedantic about signedness comparisons and iterator ranges etc where the issues are most likey to occur. Ok. Most of my development is done on Ubuntu Raring, that should be fresh enough. Also, I suggest splitting this into two methods, one with a required (first?) SBufCaseSensitive parameter and one without it. This will allow callers to specify n without specifying isCaseSensitive and vice versa. The shorter, inlined method will simply call the longer one with caseSensitive, of course, so the implementation will not require more code or performance overhead. Ok. I'm calling the two shorthand versions cmp and casecmp respectively (please let me know if you'd prefer the naming-convention compliant caseCmp instead) Please. We need semantic similarity to std::string not symbolic. I take this as a please yes. Code is as before at lp:~squid/squid/stringng-cherrypick, ready and in sync with trunk. test-suite runs OK on eu and ubuntu raring. I'll be on holiday in a couple of days for a couple of weeks. In the meantime, I've found quite a few issues with \0-cleanliness and parameter auto-promotion; I've also noticed that I've messed up the commits crossing things with an older branch (where things were silently failing as it didn't contain Alex' extensive unit tests)., I'll have to tidy things up, current status is not done :( -- /kinkie
Re: [RFC] Time to talk about StringNG merge again?
Snipped out the Done bits. On 5/07/2013 5:51 p.m., Kinkie wrote: On Tue, Apr 16, 2013 at 12:49 AM, Alex Rousskov rouss...@measurement-factory.com wrote: Hi Kinkie, Here is my review of https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I found approximately three bugs. I am also concerned about size_type type, after starting to use MemBlob which has the same problem. The rest is polishing. Ok, thanks. I have some doubts about size_type as well, I am a bit worried about touching it at this stage in development, but it may be better now than never. if (InitialStore == NULL) { static char lowPrototype[] = ; InitialStore = new MemBlob(lowPrototype, 0); } If possible, please use new MemBlob(0) and remove lowPrototype. void SBuf::reserveCapacity(size_type minCapacity) { Must(0 = minCapacity minCapacity = maxSize); reserveSpace(minCapacity-length()); } This does not look right. For example, if minCapacity is smaller than length(), we should do nothing, but will throw instead. I think this method should look something like this: Must(0 = minCapacity minCapacity = maxSize); cow(minCapacity); // we're not concerned about RefCounts here, // the store knows the last-used portion. If // it's available, we're effectively claiming ownership // of it. If it's not, we need to go away (realloc) if (store_-canAppend(off_+len_, minSpace)) { debugs(24, 7, not growing); return; } That comment is probably stale because canAppend() is currently not related to ownership of the buffer: It will happily return true for shared buffers. AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The comment is badly worded but the code is correct: canAppend() tells us if the end of our buffer is at the end of the used portion of the MemBlob, and there is enough space in the unused MemBlob portion to contain the data we plan to use. If the buffer is shared, this guarantee may become stale at the next append operation to any SBuf sharing the same MemBlob. A decision needs to be made if these two methods have different purposes or are to be simply different expressions of the same underlying concept (as it is now). IMO, thinking about it, it does make sense that reserveSpace also guarantees that after the call the MemBlob will be exclusively owned by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be meant as an optimization, to hint the SBuf memory management system that an append operation of a known size is about to happen, so that only one cow is performed. What do you think? Sounds good. The above two changes imply that reserve*() methods ensure single-ownership of the buffer by the caller. This is not something documentation of those methods currently guarantees, but I think it is logical to assume that anybody calling one of these two methods wants to modify the string and, hence, will need exclusive control over that string. I cannot think of any reason to call those methods otherwise. See above. typedef int32_t size_type; Would it be better to make this unsigned? I do not think string sizes can go negative. std::string is using an unsigned type, right? FWIW, whenever I try to use MemBlob that has a signed size_type as well, I am forced to do casts that would not exist otherwise. I think both types should be changed to reflect what they are representing (a size of a buffer). I have the same doubt. I'm just a bit afraid to change this late. Will try to do as an isolated change. So far this is NOT done. On another note, would it be better to use MemBlob::size_type instead of int32_t? currently they're both int32_t. I believe it's more readable with the extra level of indirection, at the cost of having to manually keep them in sync. Added a comment to highlight this fact. So long as SBuf is backed by MemBlob defining SBuf::size_type to be MemBlob::size_type makes more sense. One who needs to know can always look up both (probably should). The compiler will tell about signedness errors in new/modified code very quickly. The problem will be converting from signed to unsigned at this stage. I think get the rest to a milestone we can agree on merging before attempting please. Then if everthing goes sour in the change we can still revert and merge with a TODO about perfecting the polish. When you do the change ensure you have a latest GCC or clang you can use, they will be far more pedantic about signedness comparisons and iterator ranges etc where the issues are most likey to occur. SBuf::append(const std::string str, SBuf::size_type pos, SBuf::size_type n) { return append(str.data(), pos, n); //bounds checked in append() } This will break when n is npos because str.data() is not 0-terminated. The low-lever c-string append you call
Re: [RFC] Time to talk about StringNG merge again?
On Tue, Apr 16, 2013 at 12:49 AM, Alex Rousskov rouss...@measurement-factory.com wrote: Hi Kinkie, Here is my review of https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I found approximately three bugs. I am also concerned about size_type type, after starting to use MemBlob which has the same problem. The rest is polishing. Ok, thanks. I have some doubts about size_type as well, I am a bit worried about touching it at this stage in development, but it may be better now than never. fgrep npos SBuf.cc ... if (pos == npos || pos length() || n == 0) { if (n == npos || (pos+n) length()) if (startPos == SBuf::npos) return SBuf::npos; For consistency sake inside SBUf methods, please use either npos (preferred because it is shorter?) or SBuf::npos. ok, npos it is. if (InitialStore == NULL) { static char lowPrototype[] = ; InitialStore = new MemBlob(lowPrototype, 0); } If possible, please use new MemBlob(0) and remove lowPrototype. static MemBlob::Pointer InitialStore = NULL; if (InitialStore == NULL) { static char lowPrototype[] = ; InitialStore = new MemBlob(lowPrototype, 0); } I believe the above can be written simply as static MemBlob::Pointer InitialStore = new MemBlob(0); Yes, it can. Done. /** Request to resize the SBuf's store * * After this method is called, the SBuf is guaranteed to have at least * minCapacity bytes of total space, including the currently-used portion s/total space/total buffer size/ we use the term space to denote free/usable space. See reserveSpace(), for example. Ok. void SBuf::reserveCapacity(size_type minCapacity) { Must(0 = minCapacity minCapacity = maxSize); reserveSpace(minCapacity-length()); } This does not look right. For example, if minCapacity is smaller than length(), we should do nothing, but will throw instead. I think this method should look something like this: Must(0 = minCapacity minCapacity = maxSize); cow(minCapacity); // we're not concerned about RefCounts here, // the store knows the last-used portion. If // it's available, we're effectively claiming ownership // of it. If it's not, we need to go away (realloc) if (store_-canAppend(off_+len_, minSpace)) { debugs(24, 7, not growing); return; } That comment is probably stale because canAppend() is currently not related to ownership of the buffer: It will happily return true for shared buffers. AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The comment is badly worded but the code is correct: canAppend() tells us if the end of our buffer is at the end of the used portion of the MemBlob, and there is enough space in the unused MemBlob portion to contain the data we plan to use. If the buffer is shared, this guarantee may become stale at the next append operation to any SBuf sharing the same MemBlob. A decision needs to be made if these two methods have different purposes or are to be simply different expressions of the same underlying concept (as it is now). IMO, thinking about it, it does make sense that reserveSpace also guarantees that after the call the MemBlob will be exclusively owned by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be meant as an optimization, to hint the SBuf memory management system that an append operation of a known size is about to happen, so that only one cow is performed. What do you think? The above two changes imply that reserve*() methods ensure single-ownership of the buffer by the caller. This is not something documentation of those methods currently guarantees, but I think it is logical to assume that anybody calling one of these two methods wants to modify the string and, hence, will need exclusive control over that string. I cannot think of any reason to call those methods otherwise. See above. typedef int32_t size_type; Would it be better to make this unsigned? I do not think string sizes can go negative. std::string is using an unsigned type, right? FWIW, whenever I try to use MemBlob that has a signed size_type as well, I am forced to do casts that would not exist otherwise. I think both types should be changed to reflect what they are representing (a size of a buffer). I have the same doubt. I'm just a bit afraid to change this late. Will try to do as an isolated change. So far this is NOT done. On another note, would it be better to use MemBlob::size_type instead of int32_t? currently they're both int32_t. I believe it's more readable with the extra level of indirection, at the cost of having to manually keep them in sync. Added a comment to highlight this fact. u_int64_t alloc; ///number of calls to SBuf constructors u_int64_t allocCopy;
Re: [RFC] Time to talk about StringNG merge again?
Hi Kinkie, Here is my review of https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I found approximately three bugs. I am also concerned about size_type type, after starting to use MemBlob which has the same problem. The rest is polishing. fgrep npos SBuf.cc ... if (pos == npos || pos length() || n == 0) { if (n == npos || (pos+n) length()) if (startPos == SBuf::npos) return SBuf::npos; For consistency sake inside SBUf methods, please use either npos (preferred because it is shorter?) or SBuf::npos. if (InitialStore == NULL) { static char lowPrototype[] = ; InitialStore = new MemBlob(lowPrototype, 0); } If possible, please use new MemBlob(0) and remove lowPrototype. static MemBlob::Pointer InitialStore = NULL; if (InitialStore == NULL) { static char lowPrototype[] = ; InitialStore = new MemBlob(lowPrototype, 0); } I believe the above can be written simply as static MemBlob::Pointer InitialStore = new MemBlob(0); /** Request to resize the SBuf's store * * After this method is called, the SBuf is guaranteed to have at least * minCapacity bytes of total space, including the currently-used portion s/total space/total buffer size/ we use the term space to denote free/usable space. See reserveSpace(), for example. void SBuf::reserveCapacity(size_type minCapacity) { Must(0 = minCapacity minCapacity = maxSize); reserveSpace(minCapacity-length()); } This does not look right. For example, if minCapacity is smaller than length(), we should do nothing, but will throw instead. I think this method should look something like this: Must(0 = minCapacity minCapacity = maxSize); cow(minCapacity); // we're not concerned about RefCounts here, // the store knows the last-used portion. If // it's available, we're effectively claiming ownership // of it. If it's not, we need to go away (realloc) if (store_-canAppend(off_+len_, minSpace)) { debugs(24, 7, not growing); return; } That comment is probably stale because canAppend() is currently not related to ownership of the buffer: It will happily return true for shared buffers. AFAICT, reserveSpace() should be implemented using something like this: Must(0 = minSpace minSpace = maxSize - length()); reserveCapacity(length() + minSpace); The above two changes imply that reserve*() methods ensure single-ownership of the buffer by the caller. This is not something documentation of those methods currently guarantees, but I think it is logical to assume that anybody calling one of these two methods wants to modify the string and, hence, will need exclusive control over that string. I cannot think of any reason to call those methods otherwise. typedef int32_t size_type; Would it be better to make this unsigned? I do not think string sizes can go negative. std::string is using an unsigned type, right? FWIW, whenever I try to use MemBlob that has a signed size_type as well, I am forced to do casts that would not exist otherwise. I think both types should be changed to reflect what they are representing (a size of a buffer). On another note, would it be better to use MemBlob::size_type instead of int32_t? u_int64_t alloc; ///number of calls to SBuf constructors u_int64_t allocCopy; ///number of calls to SBuf copy-constructor u_int64_t allocFromString; ///number of copy-allocations from Strings u_int64_t allocFromCString; ///number of copy-allocations from c-strings u_int64_t assignFast; ///number of no-copy assignment operations For consistency sake, please use uint64_t like the rest of Squid code does. /** Append operation for std::string * * Append the supplied std::string to the SBuf; extend storage as needed. * * \param string the std::string to be copied. * \param pos how many bytes to skip at the beginning of the c-string * \param n how many bytes to import into the SBuf. If it is SBuf::npos * or unspecified, imports to end-of-cstring */ SBuf append(const std::string str, size_type pos = 0, size_type n = npos); Please adjust the docs to talk about std::string, not c-string. str is not 0-terminated and its end is determined by str.size(). SBuf::append(const std::string str, SBuf::size_type pos, SBuf::size_type n) { return append(str.data(), pos, n); //bounds checked in append() } This will break when n is npos because str.data() is not 0-terminated. The low-lever c-string append you call here assumes that the string parameter is 0-terminated if n is npos. Please add the corresponding missing test case. //we can assume that we'll need to append at least strlen(fmt) bytes, reserveSpace(strlen(fmt)*2); The comment does not match the code which assumes we will need to append at least twice the format length (a
Re: [RFC] Time to talk about StringNG merge again?
-- This is not a review, but I wanted to clarify or emphasize some of the points in Amos' review. Thank you Amos for working on this! On 03/30/2013 02:32 AM, Amos Jeffries wrote: in src/SBuf.cc: * SBuf::chop() - the TODO optimization is not possible. We cannot be completely sure we are teh only user of the I think it is possible, but the MemBlob cooperation would be required to ensure correctness. * SBuf::operator []() documentation is incorrect. It *does* check access bounds, but returns \0 for any bytes outside instead of throwing exception. The documentation is correct. The implementation is no longer correct. Please revert to the recent code version that did none of the above. The [] operator is meant for performance-sensitive loops and should not do any checks. Even the stats collection should be removed from it IMO. We provide at() that trades speed for safety. std::string does the same. in src/SBuf.cci BTW, IIRC, we discussed moving all .cci stuff into .h. Or was that a different project? * Sbuf::bufEnd() is lacking any checks about whether the space returned is safe to use, no cow() or whatever. bufEnd() is a private convenience method. It is unsafe and non-cowing by design. The callers should check and cow, as needed. If the documentation does not reflect that, it should be fixed. Thank you, Alex.
Re: [RFC] Time to talk about StringNG merge again?
Hi, I forgot to mention another couple of changes: - I added an explicit copy-constructor to import std::string - I relaxed the behavior of the chop() method for out-of-bounds cases: instead of throwing, it'll now ignore the part of the specified limits that is out of bounds. On Mon, Apr 1, 2013 at 7:43 PM, Kinkie gkin...@gmail.com wrote: My audit, I'm sure Alex will have more. +1 for the src/MemBlob.cc and src/MemBlob.h operator addition to go into trunk immediately. The rest has a lot of polishing still needed. Ok, I'll wait for Alex' review before merging. Design Question: with all the stats being collected is it worth collecting an assign-to-self counter? to see if there are any obvious logic bugs in usage which we could avoid? I tried to measure most operations clusters, to measure the effectiveness of the shared-storage approach during the transition phase. Once we have a clearer picture, we can gradually remove the meaningless statistics. in src/SBuf.cc: * can we consider making it a guideline to place constructor lists with the colon after the constructor definition and the initializer list one to a line following, with indentation? It makes it so much clearer to see what member is being initialized when there is any kind of complex parameters for it. NP: astyle enforces only the indentation. ie: +SBuf::SBuf() : +store_(GetStorePrototype()), +off_(0), +len_(0) +{ It uses lots of vertical space, and unintialized members can get caught by Coverity anyway. This said, I have no strong opinion. ** along with this can we ensure that the constructor arguments are all on one line. If it breaks the 80-character line style preference we need to reconsider what its doing. - case in point there being the constructor for OutOfBoundsException. Well, OutOfBoundsException is 19 chars long; add the class name and half the vertical screen estate has been used. Again, no strong opinion. I don't think we should put constraints on length of class names etc. Besides, implementation classes are not really meant to be read often are they? * SBuf::vappendf() - comment //times 1.2 for instance... does nt match the reserveSpace() call it is about. The second line of comment can be erased. Done. - according to the comments about Linux vsnprintf() return of -1 is guaranteed to violate the Must() condition when reserveSpace(-1*2) is called. The if-statement sz0 condition ensures it. Changed the two cases, now throws TextException in case of output error - I'd assume it's unrecoverable. - please avoid C-style casting in new code whenever possible. May occur elsewhere also. Ok. - the line +if (operator[](len_-1) == 0) { should be comparing against '\0' instead of 0 to ensure byte/word alignment changes by the compiler don't involve magic type castings. Ok, done. * SBuf::compare() please move the stats.compareSlow counting up top. Just under the Must() keeps it in line with the pther stats collections. Ok. * SBuf::startsWith() lacks the length debugs() statement used in operatror==() immediately below it. Added. * SBuf::copy() why not do: const SBuf::size_type toexport = min(n, length()); Done. * SBuf::forceSize() - we need to consider what happens (or already has happened) when a size greater than capacity is forced. Right. What about throwing an OutOfBoundsException? Overflow has likely happened. Not changed yet. * SBuf::chop() - the TODO optimization is not possible. We cannot be completely sure we are teh only user of the Removed the TODO. It's a very unlikely case anyway. * SBuf::rfind() - typo in weridness comment memrhr. Fixed. * Please move operator () to an inline function. Done (via .cci for now. I recall Alex was suggesting to get rid of .cci, will move to .h if I can find confirmation of that). * SBuf::reAlloc() comment needs to be doxygen format. Done. in SBuf.h * Debug.h is not needed here. Please move to SBuf.cci instead. - probably also SquidString.h can also be removed, but check that with a build test. Needed for String import/export functions. Hopefully String will just go in a relatively short time after merging. * several places saying * \note bounds is 0 = pos length() are incorrect. SBuf::checkAccessBounds() tests for: 0 = pos = length() Then checkAccessBounds is wrong. Changed unit test accordingly. * s/minCapcity/minCapacity/ Fixed. * SBuf::operator []() documentation is incorrect. It *does* check access bounds, but returns \0 for any bytes outside instead of throwing exception. Gah! Changed to match documentation. * SBuf::vappendf() documentation lacking empty line separate from earlier method declaration above. Added. * SBuf::chop() - documentation contains HTML tags, Please avoid that. 'i' and 'n' (quoted) is a better written style of emphasis. Fixed. - documentatin of n==0 case lacks the info that SBuf is
Re: [RFC] Time to talk about StringNG merge again?
On 04/01/2013 11:43 AM, Kinkie wrote: Attached is a bundle containing the changes from this review. Will you commit that to your stringng branch on lp? Or should I patch that branch before reviewing/testing instead? Thank you, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 2/04/2013 6:57 a.m., Alex Rousskov wrote: -- This is not a review, but I wanted to clarify or emphasize some of the points in Amos' review. Thank you Amos for working on this! On 03/30/2013 02:32 AM, Amos Jeffries wrote: in src/SBuf.cc: * SBuf::chop() - the TODO optimization is not possible. We cannot be completely sure we are teh only user of the I think it is possible, but the MemBlob cooperation would be required to ensure correctness. * SBuf::operator []() documentation is incorrect. It *does* check access bounds, but returns \0 for any bytes outside instead of throwing exception. The documentation is correct. The implementation is no longer correct. Please revert to the recent code version that did none of the above. The [] operator is meant for performance-sensitive loops and should not do any checks. Even the stats collection should be removed from it IMO. We provide at() that trades speed for safety. std::string does the same. in src/SBuf.cci BTW, IIRC, we discussed moving all .cci stuff into .h. Or was that a different project? We had. However since it was not done yet I reviewed in-place. Kinkie; I have a little time in the next few hours to try my hand at this as a test of the shared-access branch. * Sbuf::bufEnd() is lacking any checks about whether the space returned is safe to use, no cow() or whatever. bufEnd() is a private convenience method. It is unsafe and non-cowing by design. The callers should check and cow, as needed. If the documentation does not reflect that, it should be fixed. Thank you, Alex.
Re: [RFC] Time to talk about StringNG merge again?
On 30/03/2013 7:05 a.m., Kinkie wrote: Hi, it was easier than expected. Here's a merge bundle; it intentionally forgets commit history, it can remain in the branch. This bundle passes a full testsuite build on my kubuntu quantal. It contains only what was audited so far: - SBuf - SBufExceptions And I think it was audited, but I'm not sure: - SBufStream Left out but ready for audit are: - cachemgr integration - tokenizer Should be ready but I'd like to doublecheck - SBufList (intended to replace StrList) - SBufUtils Thanks! My audit, I'm sure Alex will have more. +1 for the src/MemBlob.cc and src/MemBlob.h operator addition to go into trunk immediately. The rest has a lot of polishing still needed. Design Question: with all the stats being collected is it worth collecting an assign-to-self counter? to see if there are any obvious logic bugs in usage which we could avoid? in src/SBuf.cc: * can we consider making it a guideline to place constructor lists with the colon after the constructor definition and the initializer list one to a line following, with indentation? It makes it so much clearer to see what member is being initialized when there is any kind of complex parameters for it. NP: astyle enforces only the indentation. ie: +SBuf::SBuf() : +store_(GetStorePrototype()), +off_(0), +len_(0) +{ ** along with this can we ensure that the constructor arguments are all on one line. If it breaks the 80-character line style preference we need to reconsider what its doing. - case in point there being the constructor for OutOfBoundsException. * SBuf::vappendf() - comment //times 1.2 for instance... does nt match the reserveSpace() call it is about. The second line of comment can be erased. - according to the comments about Linux vsnprintf() return of -1 is guaranteed to violate the Must() condition when reserveSpace(-1*2) is called. The if-statement sz0 condition ensures it. - please avoid C-style casting in new code whenever possible. May occur elsewhere also. - the line +if (operator[](len_-1) == 0) { should be comparing against '\0' instead of 0 to ensure byte/word alignment changes by the compiler don't involve magic type castings. * SBuf::compare() please move the stats.compareSlow counting up top. Just under the Must() keeps it in line with the pther stats collections. * SBuf::startsWith() lacks the length debugs() statement used in operatror==() immediately below it. * SBuf::copy() why not do: const SBuf::size_type toexport = min(n, length()); * SBuf::forceSize() - we need to consider what happens (or already has happened) when a size greater than capacity is forced. * SBuf::chop() - the TODO optimization is not possible. We cannot be completely sure we are teh only user of the * SBuf::rfind() - typo in weridness comment memrhr. * Please move operator () to an inline function. * SBuf::reAlloc() comment needs to be doxygen format. in SBuf.h * Debug.h is not needed here. Please move to SBuf.cci instead. - probably also SquidString.h can also be removed, but check that with a build test. * several places saying * \note bounds is 0 = pos length() are incorrect. SBuf::checkAccessBounds() tests for: 0 = pos = length() * s/minCapcity/minCapacity/ * SBuf::operator []() documentation is incorrect. It *does* check access bounds, but returns \0 for any bytes outside instead of throwing exception. * SBuf::vappendf() documentation lacking empty line separate from earlier method declaration above. * SBuf::chop() - documentation contains HTML tags, Please avoid that. 'i' and 'n' (quoted) is a better written style of emphasis. - documentatin of n==0 case lacks the info that SBuf is cleared, like the poslength() case does. * SBuf::rawContent() does not call cow(), so please call it a read-only pointer in the documentation. * SBuf::print() - one-liner doxygen uses ///. - it could document better what the method does to differentiate itself from dump() other than print which is obvious from the name. * SBuf::operator ==() has a useless doxygen comment. Please make it a normal code comment boolean conditionsls or just remove. We have no style guideline around this but the existing comment is simply useless. * please mark the SBuf::toString() converter as \deprecated. * SBuf::GetStats() doxygen one-liner. same elsewhere ie SBuf::length(). * SBuf::rawSpace() - the line This call forces a cow() duplicates and contradicts the above comments about COW *if necessary*. IMO it should say somethign like guarantees the space is safe to write to. * SBuf::c_str() s/will remain valid if the caller keeps holding/will remain valid only if the caller keeps holding/ * if you are going to make TODO comments doxygen ones, please use \todo tag instead of our developers TODO: * SBuf::ReserveSpace() ... + * least minSpace bytes of append-able backing store (on top of the + * currently-used
Re: [RFC] Time to talk about StringNG merge again?
On 29/03/2013 11:46 p.m., Kinkie wrote: Hi all, it's been a while since the last StringNG merge checkpoint , and several improvements were made in the meantime. I have briefly reviewed the previous requests for changes, I don't think there's any outstanding change requests; the new unit testing gives encouraging results (thanks Alex Amos!). Feature branch is at lp:~squid/squid/stringng Questions were raised about the merge strategy - should we include the Tokenizer and additional stuff? My opinion: in order to minimize the effort, I'd like to merge everything, but marking everything but SBuf as experimental-do-not-touch or #ifdef-d out. It'd mean either a bit of unused shipped code or unused shipped files. Comments? If the other stuff has not been reviewed, or is not able to pass review quickly then NO please. I am happy to spend some time assisting the cherry-picking process for SBuf alone to go in as soon as it passes the audit if time and effort is the worry. Amos
Re: [RFC] Time to talk about StringNG merge again?
On 03/29/2013 04:46 AM, Kinkie wrote: it's been a while since the last StringNG merge checkpoint , and several improvements were made in the meantime. I have briefly reviewed the previous requests for changes, I don't think there's any outstanding change requests; the new unit testing gives encouraging results (thanks Alex Amos!). Feature branch is at lp:~squid/squid/stringng Well, request for changes follow requests for merge. I do not recall recent requests for StringNG merge. Let's treat your email as the latest request for merge. FWIW, I should be able to work on this next week. Questions were raised about the merge strategy - should we include the Tokenizer and additional stuff? My opinion: in order to minimize the effort, I'd like to merge everything, but marking everything but SBuf as experimental-do-not-touch or #ifdef-d out. It'd mean either a bit of unused shipped code or unused shipped files. My opinion on that has not changed (just like the relevant code?): We should not include that code because it is of insufficient quality and has not been reviewed. Removing it just before commit after patching (or merging into) trunk is a minor overhead because nothing uses it. The code may still remain in the StringNG branch, of course. Cheers, Alex.
Re: [RFC] Time to talk about StringNG merge again?
Ok, it's agreed then, I'll cherrypick. File-wise it's not hard. I'm a bit worried about automake , but I'll manage. On Fri, Mar 29, 2013 at 4:14 PM, Alex Rousskov rouss...@measurement-factory.com wrote: On 03/29/2013 04:46 AM, Kinkie wrote: it's been a while since the last StringNG merge checkpoint , and several improvements were made in the meantime. I have briefly reviewed the previous requests for changes, I don't think there's any outstanding change requests; the new unit testing gives encouraging results (thanks Alex Amos!). Feature branch is at lp:~squid/squid/stringng Well, request for changes follow requests for merge. I do not recall recent requests for StringNG merge. Let's treat your email as the latest request for merge. FWIW, I should be able to work on this next week. Questions were raised about the merge strategy - should we include the Tokenizer and additional stuff? My opinion: in order to minimize the effort, I'd like to merge everything, but marking everything but SBuf as experimental-do-not-touch or #ifdef-d out. It'd mean either a bit of unused shipped code or unused shipped files. My opinion on that has not changed (just like the relevant code?): We should not include that code because it is of insufficient quality and has not been reviewed. Removing it just before commit after patching (or merging into) trunk is a minor overhead because nothing uses it. The code may still remain in the StringNG branch, of course. Cheers, Alex. -- /kinkie