Re: [RFC] Time to talk about StringNG merge again?

2013-08-01 Thread Kinkie
Now I see.
I have worked on this too mich - I keep seeing what I believe it is
instead of what it actually is.
Fixed in both rawSpace and reserveSpace.

On Wed, Jul 31, 2013 at 7:12 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/31/2013 11:11 AM, Alex Rousskov wrote:
 On 07/31/2013 10:12 AM, Kinkie wrote:
 Not if the math overflowed down to a smaller value before it even got
 passed
 to reserveCapacity().

 Ok. I'm going to check minSpace. maxSize+minSpace is definitely not
 enough to overflow size_type


 minSpace is controlled completely by the unknown caller code. It may be
 UINT_MAX or something equally capable of overflowing when you add to it.


 What is currently done is:

 reserveSpace(minSpace) {
   reserveCapacity(length()+minSpace);
 }


 Kinkie,

 I am afraid you are missing the point here. Consider the case where
 minSpace is the maximum value that size_type can hold and length() is
 100. The above sum overflows _before_ any of your checks in
 reserveCapacity() happen.

 The correct check in reserveSpace() is:

 Must(minSpace = 0  length()  maximum size_type value - minSpace)


 Must(minSpace = 0  length() = maximum size_type value - minSpace)

 Alex.




-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-08-01 Thread Kinkie
 Another small bug in that code is that arguments or even fmt might
 actually end with a valid \0 byte that should be appended and counted
 (not a c-string terminating byte).

 This check should be done once, with controlled fmt and arguments so
 that we can detect problems reliably. You can do that at module
 initialization time, for example (./configure might be too early because
 libraries might change) and set a static SnprintfCountsTerminator
 variable accordingly.

 char buf[16];
 SprintfCountsTerminator = snprintf(buf, sizeof(buf), %s, 1) == 2;

Clever!
Done.


-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Kinkie
On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/30/2013 03:56 AM, Kinkie wrote:

 lp:~squid/squid/stringng-cherrypick has the most uptodate code.


 I did not finish the review before running out of time, but I found one
 or two bugs in r12761 (in the code I have not looked at before). There
 are also a few polishing issues:

Thanks.

 typedef int32_t size_type;

 I think we should make it unsigned. I agree with your earlier
 observation that it is better to do that now rather than later or never.

Yes, it is agreed. It's however something I'd prefer to do as a
separate change, after everything else has been defined in order to
have a clear impact.

 void reserveSpace(size_type minSpace) 
 {reserveCapacity(length()+minSpace);}

 As I mentioned before, this lacks overflow control if the sum of
 length() and minSpace exceeds size_type maximum. Please add a check.

reserve* calls cow() which - if a resizing is needed - calls reAlloc
which throws SBufTooBigException if the requested size is too big.
Isn't that good enough?

 cow(minSpace+length());

 Same here, in rawSpace().
 /** Request to extend the SBuf's free store space.
 /** Request to resize the SBuf's store capacity

 The caller does not really request to extend or resize the store because
 they do not normally know what the store size is. I suggest replacing
 extend and resize with guarantee.

Done.

  * This method can also be used to prepare the SBuf by preallocating a
  * predefined amount of free space; this may help to optimize subsequent
  * calls to SBuf::append or similar methods. In this case the returned
  * pointer should be ignored.
 ...
 char *rawSpace(size_type minSize);


 Please remove that part of the comment. You can say See also:
 reserveSpace() if you want.

Done.


 However, I would _add_ something like this:
 Unlike reserveSpace(), this method does not guarantee exclusive
 buffer ownership. It is instead optimized for a one writer
 (appender), many readers scenario by avoiding unnecessary
 copying and allocations.

This is perfect; thanks!


 //reserve twice the format-string size, it's a likely heuristic
 rawSpace(strlen(fmt)*2);

 while (length() = maxSize) {
 ...
 sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap);


 The above does not illustrate how to use rawSpace() well. We do not have
 to provide good examples in lower level code, but I think it is a good
 idea to eat our own dog food. The above code should probably be rewritten as

  // heuristic: we will need twice as much space as the format
  const size_type minSpace = strlen(fmt)*2;

  while (length() = maxSize) {
 ...
  const char *space = rawSpace(minSpace);
  sz = vsnprintf(space, spaceSize(), fmt, ap);

Done (but space must be a char *)

 BTW, please add SBuf::spaceSize() as a public method so that
 optimization like the above will work. Otherwise, non-privileged users
 will have to write

 foobar(buf.rawSpace(minSize), minSize);

 instead of getting an advantage of the actual space size being larger
 than the requested minimum.

Done, but I'm quite wary of this: it implies a knowledge about
internal store positioning which I would prefer to hide from clients.

 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 I would use reserveSpace() here instead of ignoring rawSpace() result
 because we know the buffer needs to be reallocated at this point, right?


Hand-unrolled code doesn't ignore that return value anymore (see below)

 /* snprintf on Linux returns -1 on overflows */
 ...
 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 else if (sz  0) // output error in vsnprintf
 throw TextException(output error in vsnprintf,__FILE__, 
 __LINE__);

 If snprintf on Linux returns -1 on overflows, we should not throw when
 sz is -1. We should grow the buffer instead, right?

Man snprintf(3) says
If an output error is encountered, a negative value is returned.

I guess the comment is broken.
vsnprintf is standard as of C99; it should be therefore quite
consistent in our build environment.

 while (length() = maxSize) {
 sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap);
 ...
 /* snprintf on FreeBSD returns at least free_space on overflows */

 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 else if (sz  0) // output error in vsnprintf
 throw TextException(output error in vsnprintf,__FILE__, 
 __LINE__);
 else
 break;
 }

 Nothing in that loop grows length(), which is your guard against an
 infinite loop on FreeBSD (at least). Can you think of a better loop guard?

This loop will be executed at most twice via break clause, as on the
second go we 

Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Amos Jeffries

On 1/08/2013 12:18 a.m., Kinkie wrote:

On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:

On 07/30/2013 03:56 AM, Kinkie wrote:


lp:~squid/squid/stringng-cherrypick has the most uptodate code.


I did not finish the review before running out of time, but I found one
or two bugs in r12761 (in the code I have not looked at before). There
are also a few polishing issues:

Thanks.


 typedef int32_t size_type;

I think we should make it unsigned. I agree with your earlier
observation that it is better to do that now rather than later or never.

Yes, it is agreed. It's however something I'd prefer to do as a
separate change, after everything else has been defined in order to
have a clear impact.


Yes and no. It will probably cause a lot of little changes all over the 
classes involved. So doing it at a point of stability as a separate 
commit to stringng is good but not a reason to






 void reserveSpace(size_type minSpace) {reserveCapacity(length()+minSpace);}

As I mentioned before, this lacks overflow control if the sum of
length() and minSpace exceeds size_type maximum. Please add a check.

reserve* calls cow() which - if a resizing is needed - calls reAlloc
which throws SBufTooBigException if the requested size is too big.
Isn't that good enough?


Not if the math overflowed down to a smaller value before it even got 
passed to reserveCapacity().





 cow(minSpace+length());

Same here, in rawSpace().





BTW, please add SBuf::spaceSize() as a public method so that
optimization like the above will work. Otherwise, non-privileged users
will have to write

 foobar(buf.rawSpace(minSize), minSize);

instead of getting an advantage of the actual space size being larger
than the requested minimum.

Done, but I'm quite wary of this: it implies a knowledge about
internal store positioning which I would prefer to hide from clients.


I think we may find a reason in future to extend this futther to allow 
caller handling of the cases where realloc is not possible to the new 
minSize.

But for now that should be enough.




 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics

I would use reserveSpace() here instead of ignoring rawSpace() result
because we know the buffer needs to be reallocated at this point, right?


Hand-unrolled code doesn't ignore that return value anymore (see below)


 /* snprintf on Linux returns -1 on overflows */

...

 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 else if (sz  0) // output error in vsnprintf
 throw TextException(output error in vsnprintf,__FILE__, 
__LINE__);

If snprintf on Linux returns -1 on overflows, we should not throw when
sz is -1. We should grow the buffer instead, right?

Man snprintf(3) says
If an output error is encountered, a negative value is returned.

I guess the comment is broken.
vsnprintf is standard as of C99; it should be therefore quite
consistent in our build environment.


What other types of outt error than buffer overflow can occur?
Might be worth a little test to see what overflow produces exactly. And 
I agree with Alex that growing is the better way to handle it, on the 
proviso that we can reliably detect that overflow.






 while (length() = maxSize) {
 sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap);

...

 /* snprintf on FreeBSD returns at least free_space on overflows */

 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 else if (sz  0) // output error in vsnprintf
 throw TextException(output error in vsnprintf,__FILE__, 
__LINE__);
 else
 break;
 }

Nothing in that loop grows length(), which is your guard against an
infinite loop on FreeBSD (at least). Can you think of a better loop guard?

This loop will be executed at most twice via break clause, as on the
second go we _know_ how much we have to write, the first attempt is .
I'm hand-unrolling it.


 len_ += sz;
 // TODO: this does NOT belong here, but to class-init or autoconf
 /* on Linux and FreeBSD, '\0' is not counted in return value */
 /* on XXX it might be counted */
 /* check that '\0' is appended and not counted */

 if (operator[](len_-1) == '\0') {
 --sz;
 --len_;
 }

There is no guarantee that len_ is positive. We could start with a zero
len_ and an empty  pattern would result in zero sz. The above code may
then access character at -1 offset of the raw storage array...

vprintf() is probably so slow that any code using it is not optimized.
We should use at(len_-1) instead here.

Ok.
This will go away by itself when we move to unsigned size_type.


 /* check that '\0' is appended and not counted */

 if (operator[](len_-1) == '\0') {

Perhaps this day was too long, but I do not think the 

Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Kinkie
On Wed, Jul 31, 2013 at 2:46 PM, Amos Jeffries squ...@treenet.co.nz wrote:
 On 1/08/2013 12:18 a.m., Kinkie wrote:

 On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov
 rouss...@measurement-factory.com wrote:

 On 07/30/2013 03:56 AM, Kinkie wrote:

 lp:~squid/squid/stringng-cherrypick has the most uptodate code.


 I did not finish the review before running out of time, but I found one
 or two bugs in r12761 (in the code I have not looked at before). There
 are also a few polishing issues:

 Thanks.

  typedef int32_t size_type;

 I think we should make it unsigned. I agree with your earlier
 observation that it is better to do that now rather than later or never.

 Yes, it is agreed. It's however something I'd prefer to do as a
 separate change, after everything else has been defined in order to
 have a clear impact.


 Yes and no. It will probably cause a lot of little changes all over the
 classes involved. So doing it at a point of stability as a separate commit
 to stringng is good but not a reason to

How about doing it after this round of reviews is done?
That should be a reasonable compromise.

  void reserveSpace(size_type minSpace)
 {reserveCapacity(length()+minSpace);}

 As I mentioned before, this lacks overflow control if the sum of
 length() and minSpace exceeds size_type maximum. Please add a check.

 reserve* calls cow() which - if a resizing is needed - calls reAlloc
 which throws SBufTooBigException if the requested size is too big.
 Isn't that good enough?


 Not if the math overflowed down to a smaller value before it even got passed
 to reserveCapacity().

Ok. I'm going to check minSpace. maxSize+minSpace is definitely not
enough to overflow size_type

  cow(minSpace+length());

 Same here, in rawSpace().



 BTW, please add SBuf::spaceSize() as a public method so that
 optimization like the above will work. Otherwise, non-privileged users
 will have to write

  foobar(buf.rawSpace(minSize), minSize);

 instead of getting an advantage of the actual space size being larger
 than the requested minimum.

 Done, but I'm quite wary of this: it implies a knowledge about
 internal store positioning which I would prefer to hide from clients.


 I think we may find a reason in future to extend this futther to allow
 caller handling of the cases where realloc is not possible to the new
 minSize.
 But for now that should be enough.

Ok.




  if (sz = static_castint(store_-spaceSize()))
  rawSpace(sz*2); // TODO: tune heuristics

 I would use reserveSpace() here instead of ignoring rawSpace() result
 because we know the buffer needs to be reallocated at this point, right?


 Hand-unrolled code doesn't ignore that return value anymore (see below)

  /* snprintf on Linux returns -1 on overflows */

 ...

  if (sz = static_castint(store_-spaceSize()))
  rawSpace(sz*2); // TODO: tune heuristics
  else if (sz  0) // output error in vsnprintf
  throw TextException(output error in vsnprintf,__FILE__,
 __LINE__);

 If snprintf on Linux returns -1 on overflows, we should not throw when
 sz is -1. We should grow the buffer instead, right?

 Man snprintf(3) says
 If an output error is encountered, a negative value is returned.

 I guess the comment is broken.
 vsnprintf is standard as of C99; it should be therefore quite
 consistent in our build environment.


 What other types of outt error than buffer overflow can occur?
 Might be worth a little test to see what overflow produces exactly. And I
 agree with Alex that growing is the better way to handle it, on the proviso
 that we can reliably detect that overflow.

Overflow is signaled by returning a size greater than the buffer. The
freebsd man page mentions possible errors in write(2) ( does not apply
here) but also ENOMEM or EILSEQ (illegal wide character).

  Kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Amos Jeffries

On 1/08/2013 12:58 a.m., Kinkie wrote:

On Wed, Jul 31, 2013 at 2:46 PM, Amos Jeffries squ...@treenet.co.nz wrote:

On 1/08/2013 12:18 a.m., Kinkie wrote:

On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:

On 07/30/2013 03:56 AM, Kinkie wrote:


lp:~squid/squid/stringng-cherrypick has the most uptodate code.


I did not finish the review before running out of time, but I found one
or two bugs in r12761 (in the code I have not looked at before). There
are also a few polishing issues:

Thanks.


  typedef int32_t size_type;

I think we should make it unsigned. I agree with your earlier
observation that it is better to do that now rather than later or never.

Yes, it is agreed. It's however something I'd prefer to do as a
separate change, after everything else has been defined in order to
have a clear impact.


Yes and no. It will probably cause a lot of little changes all over the
classes involved. So doing it at a point of stability as a separate commit
to stringng is good but not a reason to

How about doing it after this round of reviews is done?
That should be a reasonable compromise.


  void reserveSpace(size_type minSpace)
{reserveCapacity(length()+minSpace);}

As I mentioned before, this lacks overflow control if the sum of
length() and minSpace exceeds size_type maximum. Please add a check.

reserve* calls cow() which - if a resizing is needed - calls reAlloc
which throws SBufTooBigException if the requested size is too big.
Isn't that good enough?


Not if the math overflowed down to a smaller value before it even got passed
to reserveCapacity().

Ok. I'm going to check minSpace. maxSize+minSpace is definitely not
enough to overflow size_type


minSpace is controlled completely by the unknown caller code. It may be 
UINT_MAX or something equally capable of overflowing when you add to it.


I think the right check for here is:
  if (maxSize - length()  minSpace || minSpace  0) abort().

The math being done on the two values we are confident about already and 
ensuring that the leftover space is enough for the caller.

The extra 0 check disappearing when we move to unsigned of course.

Amos


Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Kinkie
 Not if the math overflowed down to a smaller value before it even got
 passed
 to reserveCapacity().

 Ok. I'm going to check minSpace. maxSize+minSpace is definitely not
 enough to overflow size_type


 minSpace is controlled completely by the unknown caller code. It may be
 UINT_MAX or something equally capable of overflowing when you add to it.

What is currently done is:

reserveSpace(minSpace) {
  reserveCapacity(length()+minSpace);
}

reserveCapacity(minCapacity)
{
Must(0 = minCapacity = maxSize); // maxSize is 0xfff, 256Mb
cow(minCapacity);
}
cow(newsize) //it's private
{
  if(!need_to_resize()) return;
  reAlloc(newsize);
}
reAlloc(newsize) { // it's private
  Must (newsize  maxSize);
}

 I think the right check for here is:
   if (maxSize - length()  minSpace || minSpace  0) abort().

 The math being done on the two values we are confident about already and
 ensuring that the leftover space is enough for the caller.

I'm currently checking 0=requestedSpace=maxSize;
worst case is that if someone requests
reserveSpace(maxSize);
the exception will be triggered in reserveCapacity() instead of
reserveSpace() as the length()+requestedSpace will be checked there.
I am convinced that a few of these checks are redundant, but CPU power
is cheaper than brainpower.

 The extra 0 check disappearing when we move to unsigned of course.

Yes.

-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Alex Rousskov
On 07/31/2013 06:18 AM, Kinkie wrote:
  typedef int32_t size_type;
 
  I think we should make it unsigned. I agree with your earlier
  observation that it is better to do that now rather than later or never.

 Yes, it is agreed. It's however something I'd prefer to do as a
 separate change, after everything else has been defined in order to
 have a clear impact.

I think that is a good approach.

Alex.





Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Alex Rousskov
On 07/31/2013 10:12 AM, Kinkie wrote:
 Not if the math overflowed down to a smaller value before it even got
 passed
 to reserveCapacity().

 Ok. I'm going to check minSpace. maxSize+minSpace is definitely not
 enough to overflow size_type


 minSpace is controlled completely by the unknown caller code. It may be
 UINT_MAX or something equally capable of overflowing when you add to it.


 What is currently done is:
 
 reserveSpace(minSpace) {
   reserveCapacity(length()+minSpace);
 }


Kinkie,

I am afraid you are missing the point here. Consider the case where
minSpace is the maximum value that size_type can hold and length() is
100. The above sum overflows _before_ any of your checks in
reserveCapacity() happen.

The correct check in reserveSpace() is:

Must(minSpace = 0  length()  maximum size_type value - minSpace)

The negative check must come first (but will be deleted later).


HTH,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Alex Rousskov
On 07/31/2013 11:11 AM, Alex Rousskov wrote:
 On 07/31/2013 10:12 AM, Kinkie wrote:
 Not if the math overflowed down to a smaller value before it even got
 passed
 to reserveCapacity().

 Ok. I'm going to check minSpace. maxSize+minSpace is definitely not
 enough to overflow size_type


 minSpace is controlled completely by the unknown caller code. It may be
 UINT_MAX or something equally capable of overflowing when you add to it.
 
 
 What is currently done is:

 reserveSpace(minSpace) {
   reserveCapacity(length()+minSpace);
 }
 
 
 Kinkie,
 
 I am afraid you are missing the point here. Consider the case where
 minSpace is the maximum value that size_type can hold and length() is
 100. The above sum overflows _before_ any of your checks in
 reserveCapacity() happen.
 
 The correct check in reserveSpace() is:
 
 Must(minSpace = 0  length()  maximum size_type value - minSpace)


Must(minSpace = 0  length() = maximum size_type value - minSpace)

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-31 Thread Alex Rousskov
On 07/31/2013 06:18 AM, Kinkie wrote:
 On Wed, Jul 31, 2013 at 2:53 AM, Alex Rousskov wrote:
 On 07/30/2013 03:56 AM, Kinkie wrote:
 len_ += sz;
 // TODO: this does NOT belong here, but to class-init or autoconf
 /* on Linux and FreeBSD, '\0' is not counted in return value */
 /* on XXX it might be counted */
 /* check that '\0' is appended and not counted */

 if (operator[](len_-1) == '\0') {
 --sz;
 --len_;
 }

 There is no guarantee that len_ is positive. We could start with a zero
 len_ and an empty  pattern would result in zero sz. The above code may
 then access character at -1 offset of the raw storage array...

 Ok.
 This will go away by itself when we move to unsigned size_type.


I do not see how it will go away. If an unsigned len_ is zero, you would
be accessing a non-existent byte at a very large (unsigned)-1 offset,
which may even be worse than accessing a non-existent byte at -1 offset.

Another small bug in that code is that arguments or even fmt might
actually end with a valid \0 byte that should be appended and counted
(not a c-string terminating byte).

This check should be done once, with controlled fmt and arguments so
that we can detect problems reliably. You can do that at module
initialization time, for example (./configure might be too early because
libraries might change) and set a static SnprintfCountsTerminator
variable accordingly.

char buf[16];
SprintfCountsTerminator = snprintf(buf, sizeof(buf), %s, 1) == 2;


Cheers,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-30 Thread Kinkie
Ok, it's now done, including documentation.
lp:~squid/squid/stringng-cherrypick has the most uptodate code.

On Mon, Jul 29, 2013 at 11:55 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/29/2013 01:38 PM, Kinkie wrote:

 So to implement this I should:
 - rename current reserveSpace to rawSpace, add returning the pointer
 as in current rawSpace.
 - reimplement reserveSpace as a convenience calling reserveCapacity
 - adjust clients of reserveSpace to use rawSpace instead.

 If this is correct, I'm OK with it.


 Yes, I think the above plan will result in the implementation I am
 rooting for.


 Thank you,

 Alex.




-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-30 Thread Alex Rousskov
On 07/30/2013 03:56 AM, Kinkie wrote:

 lp:~squid/squid/stringng-cherrypick has the most uptodate code.


I did not finish the review before running out of time, but I found one
or two bugs in r12761 (in the code I have not looked at before). There
are also a few polishing issues:


 typedef int32_t size_type;

I think we should make it unsigned. I agree with your earlier
observation that it is better to do that now rather than later or never.


 void reserveSpace(size_type minSpace) 
 {reserveCapacity(length()+minSpace);}

As I mentioned before, this lacks overflow control if the sum of
length() and minSpace exceeds size_type maximum. Please add a check.

 cow(minSpace+length());

Same here, in rawSpace().


 /** Request to extend the SBuf's free store space.
 /** Request to resize the SBuf's store capacity

The caller does not really request to extend or resize the store because
they do not normally know what the store size is. I suggest replacing
extend and resize with guarantee.


  * This method can also be used to prepare the SBuf by preallocating a
  * predefined amount of free space; this may help to optimize subsequent
  * calls to SBuf::append or similar methods. In this case the returned
  * pointer should be ignored.
...
 char *rawSpace(size_type minSize);


Please remove that part of the comment. You can say See also:
reserveSpace() if you want.

However, I would _add_ something like this:

Unlike reserveSpace(), this method does not guarantee exclusive
buffer ownership. It is instead optimized for a one writer
(appender), many readers scenario by avoiding unnecessary
copying and allocations.

 //reserve twice the format-string size, it's a likely heuristic
 rawSpace(strlen(fmt)*2);
 
 while (length() = maxSize) {
...
 sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap);


The above does not illustrate how to use rawSpace() well. We do not have
to provide good examples in lower level code, but I think it is a good
idea to eat our own dog food. The above code should probably be rewritten as

 // heuristic: we will need twice as much space as the format
 const size_type minSpace = strlen(fmt)*2;

 while (length() = maxSize) {
...
 const char *space = rawSpace(minSpace);
 sz = vsnprintf(space, spaceSize(), fmt, ap);


BTW, please add SBuf::spaceSize() as a public method so that
optimization like the above will work. Otherwise, non-privileged users
will have to write

foobar(buf.rawSpace(minSize), minSize);

instead of getting an advantage of the actual space size being larger
than the requested minimum.



 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics

I would use reserveSpace() here instead of ignoring rawSpace() result
because we know the buffer needs to be reallocated at this point, right?


 /* snprintf on Linux returns -1 on overflows */
...
 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 else if (sz  0) // output error in vsnprintf
 throw TextException(output error in vsnprintf,__FILE__, 
 __LINE__);

If snprintf on Linux returns -1 on overflows, we should not throw when
sz is -1. We should grow the buffer instead, right?


 while (length() = maxSize) {
 sz = vsnprintf(bufEnd(), store_-spaceSize(), fmt, ap);
...
 /* snprintf on FreeBSD returns at least free_space on overflows */

 if (sz = static_castint(store_-spaceSize()))
 rawSpace(sz*2); // TODO: tune heuristics
 else if (sz  0) // output error in vsnprintf
 throw TextException(output error in vsnprintf,__FILE__, 
 __LINE__);
 else
 break;
 }

Nothing in that loop grows length(), which is your guard against an
infinite loop on FreeBSD (at least). Can you think of a better loop guard?


 len_ += sz;
 // TODO: this does NOT belong here, but to class-init or autoconf
 /* on Linux and FreeBSD, '\0' is not counted in return value */
 /* on XXX it might be counted */
 /* check that '\0' is appended and not counted */
 
 if (operator[](len_-1) == '\0') {
 --sz;
 --len_;
 }

There is no guarantee that len_ is positive. We could start with a zero
len_ and an empty  pattern would result in zero sz. The above code may
then access character at -1 offset of the raw storage array...

vprintf() is probably so slow that any code using it is not optimized.
We should use at(len_-1) instead here.


 /* check that '\0' is appended and not counted */
 
 if (operator[](len_-1) == '\0') {

Perhaps this day was too long, but I do not think the above checks
whether \0 was appended but not counted. It checks whether \0 was
appended and counted. Which is probably what you want to check, so the
comment should be adjusted to remove the word not.


HTH,

Alex.



 On Mon, Jul 

Re: [RFC] Time to talk about StringNG merge again?

2013-07-29 Thread Kinkie
On Sun, Jul 28, 2013 at 1:03 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/27/2013 02:57 PM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 10:32 PM, Alex Rousskov
 rouss...@measurement-factory.com wrote:
 On 07/27/2013 01:03 PM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote:
 On 07/27/2013 12:00 PM, Kinkie wrote:
 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }

 Current call chain (and functions for this purpose, outlined):
 append (various flavors) calls
   lowAppend which calls
 reserveSpace which
   checks if we're at the tail of the used portion of the store
 (via store-canAppend())
   reAlloc() if needed // reAlloc is the low-level, unconditional
 part of cow()
   (back to lowAppend): tell the MemBlob to append the user-supplied
 buffer and adjust internal data

 Yes, I know. The current code is inconsistent/confusing because
 reserveSpace does not guarantee exclusive buffer ownership but
 reserveCapacity does. In the fixed design, that inconsistency is removed.

I agree with you that it is clearer that both reserve* methods have
the same semantics. Let's consider it done: they do the same thing and
one is an inline wrapper of the other.

The way I envisioned the key guarantee difference between reserve* and
cow() is that: reserve* guarantees that it is safe to APPEND after the
currently-used area and up to the requested size/space. cow() slso
guarantees that it is safe to WRITE over the currently-used area, up
to the requested size.

The former is a weaker guarantee, and a possible optimization. What I
understand about your request is that you want the stronger guarantee
to be always used. It's not a big deal, as the cost is at most some
unneeded copies here and there, and code-wise it's very simple. It'd
just alias reserveCapacity to cow, and reserveSpace to wrap on top of
that.
Did I understand correctly? If so, I can do that right away.

The other size is rawSpace(). For that, I'd like to adopt Amos's
suggestion and enforce that the requester specify how much storage he
needs. If he wants raw access to the data, he better know what he's
doing.

I don't like the idea of using rawSpace for append; MemBlob::append
also does some housekeeping within MemBlob and I'm not keen on the
idea shuffling that outside the class itself. It is however immaterial
to this discussion, as it's an implementation detail from the SBuf
user's point of view.

 The current design is inconsistent (similar-sounding methods behave very
 differently) and more dangerous (reserveSpace() does not return volatile
 raw space to the caller, increasing the changes that the caller will do
 something bad before extracting that [now stale] space from the SBuf).
 The proposed design solves both problems.

It may be useful to document that reserve* methods are meant to be
used only in preparation for other SBuf methods (assign, append, setAt
etc., not for direct manipulation by the caller.
This is implied by the documentation of rawSpace: (\warning Use with
EXTREME caution, this is a dangerous operation.).


To wrap it up, if you agre I will:
- alias reserveCapacity(size) to cow(size)
- reserveSpace(size) - reserveCapacity(size+length())
- rawSpace(size_type size=npos) - rawSpace(size_type size)

And leave everything else as is.
Do you agree?

-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-29 Thread Alex Rousskov
On 07/29/2013 03:02 AM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote:
 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }


 The way I envisioned the key guarantee difference between reserve* and
 cow() is that: reserve* guarantees that it is safe to APPEND after the
 currently-used area and up to the requested size/space. cow() slso
 guarantees that it is safe to WRITE over the currently-used area, up
 to the requested size.

I do not think it is a good idea to limit reserveCapacity() to append
because absolutely nothing in its name and in the semantics of standard
STL methods with similar names restrict them to append. If you call it
reserve*() it has to apply to buffer capacity for any subsequent operation.

cow() is a private method. It's primary goal is ensuring single
ownership. Its capacity parameter is just a performance optimization:
When we have to copy, it saves time to enlarge the buffer as needed as
well. Perhaps that internal optimization creates some confusion,
prompting these discussions. From the API point of view, you may ignore
that optimization and think of cow() as a method that ensures single
ownership by creating a dedicated buffer and copying current content (if
needed).


 The former is a weaker guarantee, and a possible optimization. What I
 understand about your request is that you want the stronger guarantee
 to be always used. It's not a big deal, as the cost is at most some
 unneeded copies here and there, and code-wise it's very simple. It'd
 just alias reserveCapacity to cow, and reserveSpace to wrap on top of
 that.
 Did I understand correctly? If so, I can do that right away.

I think you understand what I am suggesting, but I am not sure you
understand why it should be done that way and why there is no
performance penalty associated with that solution (if other methods are
implemented and used correctly).


 The other size is rawSpace(). For that, I'd like to adopt Amos's
 suggestion and enforce that the requester specify how much storage he
 needs. If he wants raw access to the data, he better know what he's
 doing.

Fine with me.


 I don't like the idea of using rawSpace for append;

which is rather absurd since rawSpace() is designed to be used for
appending and can only be used safely for appending.


 MemBlob::append
 also does some housekeeping within MemBlob and I'm not keen on the
 idea shuffling that outside the class itself. It is however immaterial
 to this discussion, as it's an implementation detail from the SBuf
 user's point of view.

I do not see any inappropriate housekeeping in MemBlob::append(). It
just copies data and increments the used buffer size, as it should.


 The current design is inconsistent (similar-sounding methods behave very
 differently) and more dangerous (reserveSpace() does not return volatile
 raw space to the caller, increasing the changes that the caller will do
 something bad before extracting that [now stale] space from the SBuf).
 The proposed design solves both problems.
 
 It may be useful to document that reserve* methods are meant to be
 used only in preparation for other SBuf methods (assign, append, setAt
 etc., not for direct manipulation by the caller.

There is no need to document that because reserve*() methods return
void, as they should.


 To wrap it up, if you agre I will:
 - alias reserveCapacity(size) to cow(size)
 - reserveSpace(size) - reserveCapacity(size+length())

Yes.


 - rawSpace(size_type size=npos) - rawSpace(size_type size)

OK (but irrelevant to this discussion).


 And leave everything else as is.

No, you also need to move your optimization from reserveSpace() to
rawSpace() and use rawSpace() in append(). There is no need to drop that
very useful optimization!


Cheers,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-29 Thread Kinkie
On Mon, Jul 29, 2013 at 5:32 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/29/2013 03:02 AM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote:
 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }


 The way I envisioned the key guarantee difference between reserve* and
 cow() is that: reserve* guarantees that it is safe to APPEND after the
 currently-used area and up to the requested size/space. cow() slso
 guarantees that it is safe to WRITE over the currently-used area, up
 to the requested size.

 I do not think it is a good idea to limit reserveCapacity() to append
 because absolutely nothing in its name and in the semantics of standard
 STL methods with similar names restrict them to append. If you call it
 reserve*() it has to apply to buffer capacity for any subsequent operation.

Changing names is not unthinkable. After all we are not bound to a standard API.

 cow() is a private method. It's primary goal is ensuring single
 ownership. Its capacity parameter is just a performance optimization:
 When we have to copy, it saves time to enlarge the buffer as needed as
 well. Perhaps that internal optimization creates some confusion,
 prompting these discussions. From the API point of view, you may ignore
 that optimization and think of cow() as a method that ensures single
 ownership by creating a dedicated buffer and copying current content (if
 needed).

Er.. no (again, it may be misnamed, let's focus on its purpose, we'll
see if we need to change the purpose or the name at the end of this
discussion).
The purpose of cow is to ensure writeability, thus no store sharing
(thus single ownership).
It is a way for the client to say I'm about to write, please copy if
needed. Nowhere the name talks about single ownership, although it is
a necessary consequence.
The responsibility to reAlloc's purpose is to grow the SBuf (BTW: it
currently doesn't enforce that the requested size be bigger than the
SBuf's data; it should. This is a bug).

 The former is a weaker guarantee, and a possible optimization. What I
 understand about your request is that you want the stronger guarantee
 to be always used. It's not a big deal, as the cost is at most some
 unneeded copies here and there, and code-wise it's very simple. It'd
 just alias reserveCapacity to cow, and reserveSpace to wrap on top of
 that.
 Did I understand correctly? If so, I can do that right away.

 I think you understand what I am suggesting, but I am not sure you
 understand why it should be done that way and why there is no
 performance penalty associated with that solution (if other methods are
 implemented and used correctly).

Yes, you are right.

 The other size is rawSpace(). For that, I'd like to adopt Amos's
 suggestion and enforce that the requester specify how much storage he
 needs. If he wants raw access to the data, he better know what he's
 doing.

 Fine with me.


 I don't like the idea of using rawSpace for append;

 which is rather absurd since rawSpace() is designed to be used for
 appending and can only be used safely for appending.

Doh! You're right, I hardwired my brain to think of an empty SBuf when
calling it, and wasn't seeing what's written in the code.

 MemBlob::append
 also does some housekeeping within MemBlob and I'm not keen on the
 idea shuffling that outside the class itself. It is however immaterial
 to this discussion, as it's an implementation detail from the SBuf
 user's point of view.

 I do not see any inappropriate housekeeping in MemBlob::append(). It
 just copies data and increments the used buffer size, as it should.

 The current design is inconsistent (similar-sounding methods behave very
 differently) and more dangerous (reserveSpace() does not return volatile
 raw space to the caller, increasing the changes that the caller will do
 something bad before extracting that [now stale] space from the SBuf).
 The proposed design solves both problems.

 It may be useful to document that reserve* methods are meant to be
 used only in preparation for other SBuf methods (assign, append, setAt
 etc., not for direct manipulation by the caller.

 There is no need to document that because reserve*() methods return
 void, as they should.


 To wrap it up, if you agre I will:
 - alias reserveCapacity(size) to cow(size)
 - reserveSpace(size) - reserveCapacity(size+length())

 Yes.


 - rawSpace(size_type 

Re: [RFC] Time to talk about StringNG merge again?

2013-07-29 Thread Alex Rousskov
On 07/29/2013 01:38 PM, Kinkie wrote:

 So to implement this I should:
 - rename current reserveSpace to rawSpace, add returning the pointer
 as in current rawSpace.
 - reimplement reserveSpace as a convenience calling reserveCapacity
 - adjust clients of reserveSpace to use rawSpace instead.
 
 If this is correct, I'm OK with it.


Yes, I think the above plan will result in the implementation I am
rooting for.


Thank you,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Amos Jeffries

On 27/07/2013 11:38 a.m., Alex Rousskov wrote:

On 07/26/2013 03:27 PM, Kinkie wrote:

Resending due to MUA insisting on using HTML email..

On Thu, Jul 18, 2013 at 1:06 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:

On 07/04/2013 11:51 PM, Kinkie wrote:

void

SBuf::reserveCapacity(size_type minCapacity)
{
 Must(0 = minCapacity  minCapacity = maxSize);
 reserveSpace(minCapacity-length());
}

This does not look right. For example, if minCapacity is smaller than
length(), we should do nothing, but will throw instead. I think this
method should look something like this:

 Must(0 = minCapacity  minCapacity = maxSize);
 cow(minCapacity);



 if (store_-canAppend(off_+len_, minSpace)) {
 debugs(24, 7, not growing);
 return;
 }

AFAICT, reserveSpace() should be implemented using
something like this:

 Must(0 = minSpace  minSpace = maxSize - length());
 reserveCapacity(length() + minSpace);



The comment is badly worded but the code is correct:

I am not sure which code you are referring to here, but I do not think
it is correct for SBuf::reserveCapacity(minCapacity) to throw when
minCapacity is smaller than length().


Decoupled the methods, but the other way around:
reserveCapacity(totalCapacity) ensures unique ownership,
reserveSpace(capacityToAdd) is an optimization to prime the SBuf to
receive data.
The upper bound in that Must() clause is actually redundant, as in
case of a needed reallocation
reserveSpace() - cow() - reAlloc() which throws a
SBufTooBigException if size is too big.
As a consequence, I'm removing the upper bound check.
Once we change size_type to be unsigned, the lower bound check will be
redundant too.





A decision needs to be made if these two methods have different
purposes or are to be simply different expressions of the same
underlying concept (as it is now).

Sorry, this question is too abstract for me to answer it -- it feels
like both statements are true: The methods have different purpose but
are manipulating the same underlying concept of available buffer area.
One method deals with the total size of the buffer area, while the other
is concerned with the size of the yet unused buffer area.


As they are (were) one of the two is redundant, an in fact it was two
lines of code.
With the new semantics, reserveSpace is meant to announce to the SBuf
I'm planning to append lots of data, please be prepared, while
reserveCapacity means I'm going to meddle with your internal storage,
and I'm going to need this space. Get ready.

I do not see the difference because lots of data is going to be
appended to your internal storage so both are meddling with that
storage. To me, the difference was simply in argument math: Sometimes it
is easier to specify the total capacity, sometimes the available space.

The way the two functions are implemented now the difference is more
than just math. One does not guarantee exclusive ownership and the other
one does. Your semantics description above does not seem to convey that
key difference.

Here is the use case I am thinking about:

 sbuf.reserveSpace(ioSize);
 bytesRead = read(sbuf.rawSpace(), ioSize);
 sbuf.forceSize(sbuf.size() + bytesRead);

The above case is already handled by rawSpace(), but now I am confused
why rawSpace() is implemented using unconditional cow() while
reserveSpace() appears to be optimizing something. That optimization
seems to be the key here. Perhaps rawSpace() should be deleted and
reserveCapacity() adjusted to use the same optimization?

   reserveSpace(n) ought to be reserveCapacity(content size + n)


To summarize,

1) We need to agree whether the optimization currently residing in
reserveSpace() should apply to reserveCapacity(). This is the key
difference between the two methods. Documentation and code should be
adjusted accordingly.

2) We probably should not have both reserveSpace() and rawSpace().


What about   char* getReservedRawSpace(size)  instead?

The reservation mandatory when fetching the raw Space,
 * ensure any cow() is done before returning the buffer,
 * ensure it is 0-safe; the space starts with 0-terminator

Amos


Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Amos Jeffries

On 28/07/2013 12:00 a.m., Kinkie wrote:

Here is the use case I am thinking about:

 sbuf.reserveSpace(ioSize);
 bytesRead = read(sbuf.rawSpace(), ioSize);
 sbuf.forceSize(sbuf.size() + bytesRead);

This explains it all, we're thinking about two different use cases.
The other use case is:
sbuf.reserveCapacity(newCapacity);
sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows).

This could be used e.g. to instantiate error pages from their
templates (pseudo-code):

Sbuf template(.);
Sbuf errorpage;
errorpage.reserveCapacity(template * reasonable_scaling_factor);
SBufTokenizer t(template);
while (SBuf parsed=t.nextToken(%)) {
errorpage.append(parsed).append(handleCode(t.peek());
}
errorpage.append(t.whatever_remains());

In cases such as this it may not be needed to cow(), so why do it?


The above case is already handled by rawSpace(), but now I am confused
why rawSpace() is implemented using unconditional cow() while
reserveSpace() appears to be optimizing something. That optimization
seems to be the key here. Perhaps rawSpace() should be deleted and
reserveCapacity() adjusted to use the same optimization?
   reserveSpace(n) ought to be reserveCapacity(content size + n)

Apart from this, I see the benefit of Amos' suggestion of having
rawSpace also absorb the function of reserveCapacity.
This also covers you observation that both reserveSpace and rawSpace
are maybe not needed, one can do the job of both.


No. I think what Alex is getting at is that they *do* have potential 
uses but those uses have very different semantic requirements, from each 
other and from what some (one?) of them do right now.


There is a case for each of them:

  reserveCapacity(size) - in code needing and being able to hint a 
specific total capacity (minimum) to exist in the buffer.


  reserveSpace(size)  - in code which only knows how much is going to 
be _added_, to simplify the caller code. This can therefore be an inline 
wrapper for reserveCapacity(currentSize+newSize).


  * both of the above are about optimizing cow() and ensuring extra 
space is available. Nothing more.


  rawSpace() - in code needing to play with the raw buffer for some reason.

Adding size to rawSpace() adds the multiple benefits of not needing to 
explicitly run reserve*(), thus preventing mistakes with selecting 
_which_ reserve*(), enforcing that the space available is guaranteed to 
be a fixed known size in advance of that use, thus nobody can play with 
the buffer directly unless they know how much of it they will be playing 
with or at least can (over-)estimate an amount.


Amos


Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Alex Rousskov
On 07/27/2013 05:35 AM, Amos Jeffries wrote:
  * ensure it is 0-safe; the space starts with 0-terminator

Please no. We have enough bugs with implicit zero terminator being
forced into buffers that do not need one, screwing up buffer size
management and related optimizations/assertions. I suspects it all
starts with an innocent 0-terminator added where it does not seem to
hurt and might help.

The space(0) call will not work correctly if we start adding implicit
0-terminators.

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Kinkie
On Sat, Jul 27, 2013 at 5:57 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/27/2013 05:35 AM, Amos Jeffries wrote:
  * ensure it is 0-safe; the space starts with 0-terminator

 Please no. We have enough bugs with implicit zero terminator being
 forced into buffers that do not need one, screwing up buffer size
 management and related optimizations/assertions. I suspects it all
 starts with an innocent 0-terminator added where it does not seem to
 hurt and might help.

 The space(0) call will not work correctly if we start adding implicit
 0-terminators.

I'm fine with this.
If you agree to Amos' proposed API (as modified by your proposal of
not auto-terminating), I'll implement them immediately.

-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Alex Rousskov
On 07/27/2013 06:00 AM, Kinkie wrote:
 Here is the use case I am thinking about:

 sbuf.reserveSpace(ioSize);
 bytesRead = read(sbuf.rawSpace(), ioSize);
 sbuf.forceSize(sbuf.size() + bytesRead);


 The other use case is:
 sbuf.reserveCapacity(newCapacity);
 sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows).

Sure, that is another use case for another method.


 This could be used e.g. to instantiate error pages from their
 templates (pseudo-code):
 
 Sbuf template(.);
 Sbuf errorpage;
 errorpage.reserveCapacity(template * reasonable_scaling_factor);
 SBufTokenizer t(template);
 while (SBuf parsed=t.nextToken(%)) {
errorpage.append(parsed).append(handleCode(t.peek());
 }
 errorpage.append(t.whatever_remains());
 
 In cases such as this it may not be needed to cow(), so why do it?

AFAICT, your example immediately above will not cause CoW, with or
without the current optimization in reserveSpace() because errorpage
SBuf is not shared with other users. There will be no reason to Copy
errorpage buffer.

The following sketch may be a better example:

SBuf buf;
buf.reserveCapacity(...);
while (there is more data and there is buffer space) {
buf.append(data);
SBuf token = nextToken(buf);
give token to somebody else
}

In the above example, many tokens and a single buf share the same
underlying buffer. A straight cow()-based implementation of
reserveSpace() called by append() would result in one cow() per iteration.

My understanding is that your current optimization in reserveSpace()
will remove all cow()s in this case because while tokens and buf share
buffer content, they do not share buffer space and so no CoW is really
needed when appending to a shared buffer.

Is my understanding correct?

Do we want this optimization? I think we do.


 The above case is already handled by rawSpace(), but now I am confused
 why rawSpace() is implemented using unconditional cow() while
 reserveSpace() appears to be optimizing something. That optimization
 seems to be the key here. Perhaps rawSpace() should be deleted and
 reserveCapacity() adjusted to use the same optimization?
   reserveSpace(n) ought to be reserveCapacity(content size + n)
 
 Apart from this, I see the benefit of Amos' suggestion of having
 rawSpace also absorb the function of reserveCapacity.
 This also covers you observation that both reserveSpace and rawSpace
 are maybe not needed, one can do the job of both.


*If the optimization discussed above works*, then we should consider
having two or tree methods:

1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

1b. Reserve buffer space. Ensure exclusive buffer ownership.

2. Reserve N space bytes for the caller to append to. No guarantees
regarding buffer ownership are provided.

1b is optional but nice to have and costs only one line of code. I will
focus on (1a) vs. (2) below.

As you can see, (1a) and (2) semantics are very different. In some
contexts, the difference may not be important at all. In some, it may be
critical for good performance (if the optimization discussed above works).

I see only one potential problem with this approach: We cannot be
certain that some code is not going to first get space twice and only
then use the obtained space twice, as illustrated below:

char *buf1 = bigBuffer.rawSpace(...);
char *buf2 = bigBuffer.rawSpace(...);
...
read into buf1;
read into buf2; // corrupts buf1 content

Needless to say that the above buggy code would probably be spread over
several functions so that the bug will not be clearly visible.

I think we can try to combat the above by requiring that the pointer
returned by the rawSpace() call is used immediately.


HTH,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Kinkie
On Sat, Jul 27, 2013 at 6:39 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/27/2013 06:00 AM, Kinkie wrote:
 Here is the use case I am thinking about:

 sbuf.reserveSpace(ioSize);
 bytesRead = read(sbuf.rawSpace(), ioSize);
 sbuf.forceSize(sbuf.size() + bytesRead);


 The other use case is:
 sbuf.reserveCapacity(newCapacity);
 sbuf.append(something).append(somethingelse).append(verylongstring).append(whoknows).

 Sure, that is another use case for another method.


 This could be used e.g. to instantiate error pages from their
 templates (pseudo-code):

 Sbuf template(.);
 Sbuf errorpage;
 errorpage.reserveCapacity(template * reasonable_scaling_factor);
 SBufTokenizer t(template);
 while (SBuf parsed=t.nextToken(%)) {
errorpage.append(parsed).append(handleCode(t.peek());
 }
 errorpage.append(t.whatever_remains());

 In cases such as this it may not be needed to cow(), so why do it?

 AFAICT, your example immediately above will not cause CoW, with or
 without the current optimization in reserveSpace() because errorpage
 SBuf is not shared with other users. There will be no reason to Copy
 errorpage buffer.

 The following sketch may be a better example:

 SBuf buf;
 buf.reserveCapacity(...);
 while (there is more data and there is buffer space) {
 buf.append(data);
 SBuf token = nextToken(buf);
 give token to somebody else
 }

 In the above example, many tokens and a single buf share the same
 underlying buffer. A straight cow()-based implementation of
 reserveSpace() called by append() would result in one cow() per iteration.

 My understanding is that your current optimization in reserveSpace()
 will remove all cow()s in this case because while tokens and buf share
 buffer content, they do not share buffer space and so no CoW is really
 needed when appending to a shared buffer.

 Is my understanding correct?

Yes; that's exactly the point. Append doesn't require cow. It only
does when there is not enough unused space in the memblob OR the SBuf
being appended to is not at the end of the used shared storage ( sorry
for the multiple negatives)

 Do we want this optimization? I think we do.

Good :) Then we agree.

 The above case is already handled by rawSpace(), but now I am confused
 why rawSpace() is implemented using unconditional cow() while
 reserveSpace() appears to be optimizing something. That optimization
 seems to be the key here. Perhaps rawSpace() should be deleted and
 reserveCapacity() adjusted to use the same optimization?
   reserveSpace(n) ought to be reserveCapacity(content size + n)

 Apart from this, I see the benefit of Amos' suggestion of having
 rawSpace also absorb the function of reserveCapacity.
 This also covers you observation that both reserveSpace and rawSpace
 are maybe not needed, one can do the job of both.


 *If the optimization discussed above works*, then we should consider
 having two or tree methods:

 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 1b is optional but nice to have and costs only one line of code. I will
 focus on (1a) vs. (2) below.

 As you can see, (1a) and (2) semantics are very different. In some
 contexts, the difference may not be important at all. In some, it may be
 critical for good performance (if the optimization discussed above works).

 I see only one potential problem with this approach: We cannot be
 certain that some code is not going to first get space twice and only
 then use the obtained space twice, as illustrated below:

 char *buf1 = bigBuffer.rawSpace(...);
 char *buf2 = bigBuffer.rawSpace(...);
 ...
 read into buf1;
 read into buf2; // corrupts buf1 content

 Needless to say that the above buggy code would probably be spread over
 several functions so that the bug will not be clearly visible.

 I think we can try to combat the above by requiring that the pointer
 returned by the rawSpace() call is used immediately.

Yes. It is the caller's risk and responsibility not to mess with the
raw storage, and it should be avoided whenever possible. Several best
practices are going to be helpful avoid issues, but they can't be
technically enforced, only documented and reviewed.
Given this, I expect that the common case is going to be 2. This is
why I like Amos's suggestion more (two methods for the append-friendly
optimization, and one for accessing the internals).


-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Alex Rousskov
On 07/27/2013 12:00 PM, Kinkie wrote:
  1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.
 
  1b. Reserve buffer space. Ensure exclusive buffer ownership.
 
  2. Reserve N space bytes for the caller to append to. No guarantees
  regarding buffer ownership are provided.

 I like Amos's suggestion more (two methods for the append-friendly
 optimization, and one for accessing the internals).


There should be no need for two methods for append-friendly optimization
because append only cares about the space size, not total size. This is
method #2 in the above list.

Methods #1a and #1b ensure exclusive ownership. They do not return a
pointer. They must not contain that append-friendly optimization!

AFAICT, Amos explanation of what I was getting at is correct. His email
did not discuss the optimization trick which is one more reason why we
want rawSpace(), but I think we both suggest the same set of three methods:

// 1a.
void reserveCapacity(size_type minCap) {
cow(minCap);
}

// 1b.
void reserveSpace(size_type minSpace) {
// check that the sum below does not exceed size_type
Must(size() = size_type's limit - minSpace);
reserveCapacity(size() + minSpace);
}

// 2.
char *rawSpace(size_type minSpace) {
... your optimization goes here ...
return pointer to space;
}

Needless to say, it is critical to document these correctly and clearly
so that we do not have to come back to this discussion again.


HTH,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Kinkie
On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/27/2013 12:00 PM, Kinkie wrote:
  1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.
 
  1b. Reserve buffer space. Ensure exclusive buffer ownership.
 
  2. Reserve N space bytes for the caller to append to. No guarantees
  regarding buffer ownership are provided.

 I like Amos's suggestion more (two methods for the append-friendly
 optimization, and one for accessing the internals).


 There should be no need for two methods for append-friendly optimization
 because append only cares about the space size, not total size. This is
 method #2 in the above list.

 Methods #1a and #1b ensure exclusive ownership. They do not return a
 pointer. They must not contain that append-friendly optimization!

 AFAICT, Amos explanation of what I was getting at is correct. His email
 did not discuss the optimization trick which is one more reason why we
 want rawSpace(), but I think we both suggest the same set of three methods:

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }

 Needless to say, it is critical to document these correctly and clearly
 so that we do not have to come back to this discussion again.

I don't understand how this helps with the append() use-case.
The most likely useage pattern would be 2, and that forces a
potentially unneeded cow()..

IMO the three methods would be:

void reserveCapacity(size_type minCap) {
  if (needed())
cow();
}

void reserveSpace(size_type minSpace) {
  reserveCapacity(minSpace+length());
}

char *rawSpace(size_type minSpace) {
  cow(minSpace+length());
  return *freespace;
}

-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Alex Rousskov
On 07/27/2013 01:03 PM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote:
 On 07/27/2013 12:00 PM, Kinkie wrote:
 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }

 I don't understand how this helps with the append() use-case.

append() should call rawSpace(). rawSpace() is optimized.


 The most likely useage pattern would be 2, and that forces a
 potentially unneeded cow()..

rawSpace() does not force that because your optimization should be
placed there.


 IMO the three methods would be:
 
 void reserveCapacity(size_type minCap) {
   if (needed())
 cow();
 }

I assume that cow() already implements the if needed part.


 void reserveSpace(size_type minSpace) {
   reserveCapacity(minSpace+length());
 }

Yes, plus the size_type overflow check.


 char *rawSpace(size_type minSpace) {
   cow(minSpace+length());
   return *freespace;
 }

No, this needs your no-cow optimization instead. That optimization is
currently in SBuf::reserveSpace() but should be in rawSpace().


HTH,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Kinkie
On Sat, Jul 27, 2013 at 10:32 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 07/27/2013 01:03 PM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote:
 On 07/27/2013 12:00 PM, Kinkie wrote:
 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }

 I don't understand how this helps with the append() use-case.

 append() should call rawSpace(). rawSpace() is optimized.

Current call chain (and functions for this purpose, outlined):
append (various flavors) calls
  lowAppend which calls
reserveSpace which
  checks if we're at the tail of the used portion of the store
(via store-canAppend())
  reAlloc() if needed // reAlloc is the low-level, unconditional
part of cow()
  (back to lowAppend): tell the MemBlob to append the user-supplied
buffer and adjust internal data


(typing as I think) using rawSpace as you suggest would shuffle the
responsibility of copying data into the MemBlob out of MemBlob itself
- a layering breakage IMO
cow() has a different contract from reserveSpace: it doesn't check if
the MemBlob is append-able. It promises to reAlloc unless 1. there is
one sole handle to the MemBlob and 2. there is enough space in the
Blob for whatever the user requests to add.
I'm not saying that the same results can't be achieved with the
approach you suggest: it's just designed differently, IMO simply with
a stronger layering and different contracts.

 The most likely useage pattern would be 2, and that forces a
 potentially unneeded cow()..

 rawSpace() does not force that because your optimization should be
 placed there.

Yes.

 IMO the three methods would be:

 void reserveCapacity(size_type minCap) {
   if (needed())
 cow();
 }

 I assume that cow() already implements the if needed part.

It makes a different promise, see above.

   reserveCapacity(minSpace+length());
 }

 Yes, plus the size_type overflow check.

Sure.

 char *rawSpace(size_type minSpace) {
   cow(minSpace+length());
   return *freespace;
 }

 No, this needs your no-cow optimization instead. That optimization is
 currently in SBuf::reserveSpace() but should be in rawSpace().

I now get it. I fail to see how this is better than what's currently
done, however :(

-- 
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-27 Thread Alex Rousskov
On 07/27/2013 02:57 PM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 10:32 PM, Alex Rousskov
 rouss...@measurement-factory.com wrote:
 On 07/27/2013 01:03 PM, Kinkie wrote:
 On Sat, Jul 27, 2013 at 8:31 PM, Alex Rousskov wrote:
 On 07/27/2013 12:00 PM, Kinkie wrote:
 1a. Reserve total buffer capacity. Ensure exclusive buffer ownership.

 1b. Reserve buffer space. Ensure exclusive buffer ownership.

 2. Reserve N space bytes for the caller to append to. No guarantees
 regarding buffer ownership are provided.

 // 1a.
 void reserveCapacity(size_type minCap) {
 cow(minCap);
 }

 // 1b.
 void reserveSpace(size_type minSpace) {
 // check that the sum below does not exceed size_type
 Must(size() = size_type's limit - minSpace);
 reserveCapacity(size() + minSpace);
 }

 // 2.
 char *rawSpace(size_type minSpace) {
 ... your optimization goes here ...
 return pointer to space;
 }

 Current call chain (and functions for this purpose, outlined):
 append (various flavors) calls
   lowAppend which calls
 reserveSpace which
   checks if we're at the tail of the used portion of the store
 (via store-canAppend())
   reAlloc() if needed // reAlloc is the low-level, unconditional
 part of cow()
   (back to lowAppend): tell the MemBlob to append the user-supplied
 buffer and adjust internal data

Yes, I know. The current code is inconsistent/confusing because
reserveSpace does not guarantee exclusive buffer ownership but
reserveCapacity does. In the fixed design, that inconsistency is removed.


 (typing as I think) using rawSpace as you suggest would shuffle the
 responsibility of copying data into the MemBlob out of MemBlob itself
 - a layering breakage IMO

The change is essentially moving one method body into another. It does
not introduce any new layer breakage AFAICT. Cow() continues to copy
data if needed, as it does now, via reAlloc().


 cow() has a different contract from reserveSpace: 

Not sure why you are discussing reserveSpace(). It is just a convenience
wrapper for reserveCapacity().


 it doesn't check if the MemBlob is append-able.

I do not know what appendable buffer means exactly in this context,
but reserveCapacity() is not specific to append in the fixed design. It
is about taking control over the buffer and ensuring its minimal size. I
believe that is what your current cow() does as well.


 It promises to reAlloc unless 1. there is
 one sole handle to the MemBlob and 2. there is enough space in the
 Blob for whatever the user requests to add.

Exactly. This is what reserveCapacity() needs.


 I'm not saying that the same results can't be achieved with the
 approach you suggest: it's just designed differently, IMO simply with
 a stronger layering and different contracts.

The current design is inconsistent (similar-sounding methods behave very
differently) and more dangerous (reserveSpace() does not return volatile
raw space to the caller, increasing the changes that the caller will do
something bad before extracting that [now stale] space from the SBuf).
The proposed design solves both problems.


 I now get it. I fail to see how this is better than what's currently
 done, however :(

If there is no difference to you, please change it because there is a
significant difference to me (which I am apparently failing to convey).


Thank you,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-26 Thread Alex Rousskov
On 07/26/2013 03:27 PM, Kinkie wrote:
 Resending due to MUA insisting on using HTML email..
 
 On Thu, Jul 18, 2013 at 1:06 AM, Alex Rousskov
 rouss...@measurement-factory.com wrote:

 On 07/04/2013 11:51 PM, Kinkie wrote:
 void
 SBuf::reserveCapacity(size_type minCapacity)
 {
 Must(0 = minCapacity  minCapacity = maxSize);
 reserveSpace(minCapacity-length());
 }

 This does not look right. For example, if minCapacity is smaller than
 length(), we should do nothing, but will throw instead. I think this
 method should look something like this:

 Must(0 = minCapacity  minCapacity = maxSize);
 cow(minCapacity);


 if (store_-canAppend(off_+len_, minSpace)) {
 debugs(24, 7, not growing);
 return;
 }

 AFAICT, reserveSpace() should be implemented using
 something like this:

 Must(0 = minSpace  minSpace = maxSize - length());
 reserveCapacity(length() + minSpace);


 The comment is badly worded but the code is correct:

 I am not sure which code you are referring to here, but I do not think
 it is correct for SBuf::reserveCapacity(minCapacity) to throw when
 minCapacity is smaller than length().
 
 
 Decoupled the methods, but the other way around:
 reserveCapacity(totalCapacity) ensures unique ownership,
 reserveSpace(capacityToAdd) is an optimization to prime the SBuf to
 receive data.
 The upper bound in that Must() clause is actually redundant, as in
 case of a needed reallocation
 reserveSpace() - cow() - reAlloc() which throws a
 SBufTooBigException if size is too big.
 As a consequence, I'm removing the upper bound check.
 Once we change size_type to be unsigned, the lower bound check will be
 redundant too.
 



 A decision needs to be made if these two methods have different
 purposes or are to be simply different expressions of the same
 underlying concept (as it is now).

 Sorry, this question is too abstract for me to answer it -- it feels
 like both statements are true: The methods have different purpose but
 are manipulating the same underlying concept of available buffer area.
 One method deals with the total size of the buffer area, while the other
 is concerned with the size of the yet unused buffer area.
 
 
 As they are (were) one of the two is redundant, an in fact it was two
 lines of code.
 With the new semantics, reserveSpace is meant to announce to the SBuf
 I'm planning to append lots of data, please be prepared, while
 reserveCapacity means I'm going to meddle with your internal storage,
 and I'm going to need this space. Get ready.

I do not see the difference because lots of data is going to be
appended to your internal storage so both are meddling with that
storage. To me, the difference was simply in argument math: Sometimes it
is easier to specify the total capacity, sometimes the available space.

The way the two functions are implemented now the difference is more
than just math. One does not guarantee exclusive ownership and the other
one does. Your semantics description above does not seem to convey that
key difference.

Here is the use case I am thinking about:

sbuf.reserveSpace(ioSize);
bytesRead = read(sbuf.rawSpace(), ioSize);
sbuf.forceSize(sbuf.size() + bytesRead);

The above case is already handled by rawSpace(), but now I am confused
why rawSpace() is implemented using unconditional cow() while
reserveSpace() appears to be optimizing something. That optimization
seems to be the key here. Perhaps rawSpace() should be deleted and
reserveCapacity() adjusted to use the same optimization?

  reserveSpace(n) ought to be reserveCapacity(content size + n)


To summarize,

1) We need to agree whether the optimization currently residing in
reserveSpace() should apply to reserveCapacity(). This is the key
difference between the two methods. Documentation and code should be
adjusted accordingly.

2) We probably should not have both reserveSpace() and rawSpace().


HTH,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-17 Thread Alex Rousskov
On 07/04/2013 11:51 PM, Kinkie wrote:
 void
  SBuf::reserveCapacity(size_type minCapacity)
  {
  Must(0 = minCapacity  minCapacity = maxSize);
  reserveSpace(minCapacity-length());
  }
 
  This does not look right. For example, if minCapacity is smaller than
  length(), we should do nothing, but will throw instead. I think this
  method should look something like this:
 
  Must(0 = minCapacity  minCapacity = maxSize);
  cow(minCapacity);


  if (store_-canAppend(off_+len_, minSpace)) {
  debugs(24, 7, not growing);
  return;
  }

  AFAICT, reserveSpace() should be implemented using
  something like this:
 
  Must(0 = minSpace  minSpace = maxSize - length());
  reserveCapacity(length() + minSpace);


 The comment is badly worded but the code is correct:

I am not sure which code you are referring to here, but I do not think
it is correct for SBuf::reserveCapacity(minCapacity) to throw when
minCapacity is smaller than length().


 A decision needs to be made if these two methods have different
 purposes or are to be simply different expressions of the same
 underlying concept (as it is now).

Sorry, this question is too abstract for me to answer it -- it feels
like both statements are true: The methods have different purpose but
are manipulating the same underlying concept of available buffer area.
One method deals with the total size of the buffer area, while the other
is concerned with the size of the yet unused buffer area.


 IMO, thinking about it, it does make sense that reserveSpace also
 guarantees that after the call the MemBlob will be exclusively owned
 by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be
 meant as an optimization, to hint the SBuf memory management system
 that an append operation of a known size is about to happen, so that
 only one cow is performed.
 What do you think?

I think that both methods should be adjusted, possibly as sketched
above. They should not throw when the required space or capacity is
available. The documentation should tell users that both reserve*()
methods ensure single-ownership of the buffer by the caller.


HTH,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-07-17 Thread Amos Jeffries

On 9/07/2013 8:16 p.m., Kinkie wrote:

Also, I suggest splitting this into two methods, one with a required
(first?) SBufCaseSensitive parameter and one without it. This will allow
callers to specify n without specifying isCaseSensitive and vice versa.
The shorter, inlined method will simply call the longer one with
caseSensitive, of course, so the implementation will not require more
code or performance overhead.

Ok. I'm calling the two shorthand versions cmp and casecmp respectively
(please let me know if you'd prefer the naming-convention compliant
caseCmp instead)

Please. We need semantic similarity to std::string not symbolic.

I take this as a please yes.


Correct. Sorry for not being clear.






Code is as before at lp:~squid/squid/stringng-cherrypick, ready and in
sync with trunk.
test-suite runs OK on eu and ubuntu raring.

I'll be on holiday in a couple of days for a couple of weeks.

In the meantime, I've found quite a few issues with \0-cleanliness and
parameter auto-promotion; I've also noticed that I've messed up the
commits crossing things with an older branch (where things were
silently failing as it didn't contain Alex' extensive unit tests).,

I'll have to tidy things up, current status is not done :(


How long is that going to take do you think?
Anything I can do to it while you are on holiday?


Amos



Re: [RFC] Time to talk about StringNG merge again?

2013-07-09 Thread Kinkie
 That comment is probably stale because canAppend() is currently not
 related to ownership of the buffer: It will happily return true for
 shared buffers. AFAICT, reserveSpace() should be implemented using
 something like this:

  Must(0 = minSpace  minSpace = maxSize - length());
  reserveCapacity(length() + minSpace);

 The comment is badly worded but the code is correct: canAppend() tells
 us if the end of our buffer is at the end of the used portion of the
 MemBlob, and there is enough space in the unused MemBlob portion to
 contain the data we plan to use. If the buffer is shared, this
 guarantee may become stale at the next append operation to any SBuf
 sharing the same MemBlob.

 A decision needs to be made if these two methods have different
 purposes or are to be simply different expressions of the same
 underlying concept (as it is now).
 IMO, thinking about it, it does make sense that reserveSpace also
 guarantees that after the call the MemBlob will be exclusively owned
 by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be
 meant as an optimization, to hint the SBuf memory management system
 that an append operation of a known size is about to happen, so that
 only one cow is performed.
 What do you think?


 Sounds good.



 The above two changes imply that reserve*() methods ensure
 single-ownership of the buffer by the caller. This is not something
 documentation of those methods currently guarantees, but I think it is
 logical to assume that anybody calling one of these two methods wants to
 modify the string and, hence, will need exclusive control over that
 string. I cannot think of any reason to call those methods otherwise.

 See above.

  typedef int32_t size_type;

 Would it be better to make this unsigned? I do not think string sizes
 can go negative. std::string is using an unsigned type, right? FWIW,
 whenever I try to use MemBlob that has a signed size_type as well, I am
 forced to do casts that would not exist otherwise. I think both types
 should be changed to reflect what they are representing (a size of a
 buffer).

 I have the same doubt. I'm just a bit afraid to change this late. Will
 try to do as an isolated change. So far this is NOT done.

 On another note, would it be better to use MemBlob::size_type instead of
 int32_t?

 currently they're both int32_t. I believe it's more readable with the
 extra level of indirection, at the cost of having to manually keep
 them in sync.
 Added a comment to highlight this fact.


 So long as SBuf is backed by MemBlob defining SBuf::size_type to be
 MemBlob::size_type makes more sense. One who needs to know can always look
 up both (probably should). The compiler will tell about signedness errors in
 new/modified code very quickly.

Ok, changed to reference MemBlob::size_type instead.

 The problem will be converting from signed to unsigned at this stage. I
 think get the rest to a milestone we can agree on merging before attempting
 please. Then if everthing goes sour in the change we can still revert and
 merge with a TODO about perfecting the polish.
 When you do the change ensure you have a latest GCC or clang you can use,
 they will be far more pedantic about signedness comparisons and iterator
 ranges etc where the issues are most likey to occur.

Ok.
Most of my development is done on Ubuntu Raring, that should be fresh enough.


 Also, I suggest splitting this into two methods, one with a required
 (first?) SBufCaseSensitive parameter and one without it. This will allow
 callers to specify n without specifying isCaseSensitive and vice versa.
 The shorter, inlined method will simply call the longer one with
 caseSensitive, of course, so the implementation will not require more
 code or performance overhead.

 Ok. I'm calling the two shorthand versions cmp and casecmp respectively
 (please let me know if you'd prefer the naming-convention compliant
 caseCmp instead)

 Please. We need semantic similarity to std::string not symbolic.

I take this as a please yes.



 Code is as before at lp:~squid/squid/stringng-cherrypick, ready and in
 sync with trunk.
 test-suite runs OK on eu and ubuntu raring.

 I'll be on holiday in a couple of days for a couple of weeks.

In the meantime, I've found quite a few issues with \0-cleanliness and
parameter auto-promotion; I've also noticed that I've messed up the
commits crossing things with an older branch (where things were
silently failing as it didn't contain Alex' extensive unit tests).,

I'll have to tidy things up, current status is not done :(

--
/kinkie


Re: [RFC] Time to talk about StringNG merge again?

2013-07-05 Thread Amos Jeffries

Snipped out the Done bits.

On 5/07/2013 5:51 p.m., Kinkie wrote:

On Tue, Apr 16, 2013 at 12:49 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:

Hi Kinkie,

 Here is my review of
https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I
found approximately three bugs. I am also concerned about size_type
type, after starting to use MemBlob which has the same problem. The rest
is polishing.

Ok, thanks. I have some doubts about size_type as well, I am a bit
worried about touching it at this stage in development, but it may be
better now than never.


 if (InitialStore == NULL) {
 static char lowPrototype[] = ;
 InitialStore = new MemBlob(lowPrototype, 0);
 }

If possible, please use new MemBlob(0) and remove lowPrototype.




void
SBuf::reserveCapacity(size_type minCapacity)
{
 Must(0 = minCapacity  minCapacity = maxSize);
 reserveSpace(minCapacity-length());
}

This does not look right. For example, if minCapacity is smaller than
length(), we should do nothing, but will throw instead. I think this
method should look something like this:

 Must(0 = minCapacity  minCapacity = maxSize);
 cow(minCapacity);



 // we're not concerned about RefCounts here,
 // the store knows the last-used portion. If
 // it's available, we're effectively claiming ownership
 // of it. If it's not, we need to go away (realloc)
 if (store_-canAppend(off_+len_, minSpace)) {
 debugs(24, 7, not growing);
 return;
 }

That comment is probably stale because canAppend() is currently not
related to ownership of the buffer: It will happily return true for
shared buffers. AFAICT, reserveSpace() should be implemented using
something like this:

 Must(0 = minSpace  minSpace = maxSize - length());
 reserveCapacity(length() + minSpace);

The comment is badly worded but the code is correct: canAppend() tells
us if the end of our buffer is at the end of the used portion of the
MemBlob, and there is enough space in the unused MemBlob portion to
contain the data we plan to use. If the buffer is shared, this
guarantee may become stale at the next append operation to any SBuf
sharing the same MemBlob.

A decision needs to be made if these two methods have different
purposes or are to be simply different expressions of the same
underlying concept (as it is now).
IMO, thinking about it, it does make sense that reserveSpace also
guarantees that after the call the MemBlob will be exclusively owned
by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be
meant as an optimization, to hint the SBuf memory management system
that an append operation of a known size is about to happen, so that
only one cow is performed.
What do you think?


Sounds good.




The above two changes imply that reserve*() methods ensure
single-ownership of the buffer by the caller. This is not something
documentation of those methods currently guarantees, but I think it is
logical to assume that anybody calling one of these two methods wants to
modify the string and, hence, will need exclusive control over that
string. I cannot think of any reason to call those methods otherwise.

See above.


 typedef int32_t size_type;

Would it be better to make this unsigned? I do not think string sizes
can go negative. std::string is using an unsigned type, right? FWIW,
whenever I try to use MemBlob that has a signed size_type as well, I am
forced to do casts that would not exist otherwise. I think both types
should be changed to reflect what they are representing (a size of a
buffer).

I have the same doubt. I'm just a bit afraid to change this late. Will
try to do as an isolated change. So far this is NOT done.


On another note, would it be better to use MemBlob::size_type instead of
int32_t?

currently they're both int32_t. I believe it's more readable with the
extra level of indirection, at the cost of having to manually keep
them in sync.
Added a comment to highlight this fact.


So long as SBuf is backed by MemBlob defining SBuf::size_type to be 
MemBlob::size_type makes more sense. One who needs to know can always 
look up both (probably should). The compiler will tell about signedness 
errors in new/modified code very quickly.


The problem will be converting from signed to unsigned at this stage. I 
think get the rest to a milestone we can agree on merging before 
attempting please. Then if everthing goes sour in the change we can 
still revert and merge with a TODO about perfecting the polish.
When you do the change ensure you have a latest GCC or clang you can 
use, they will be far more pedantic about signedness comparisons and 
iterator ranges etc where the issues are most likey to occur.



SBuf::append(const std::string str, SBuf::size_type pos, SBuf::size_type n)
{
 return append(str.data(), pos, n); //bounds checked in append()
}

This will break when n is npos because str.data() is not 0-terminated.
The low-lever c-string append you call 

Re: [RFC] Time to talk about StringNG merge again?

2013-07-04 Thread Kinkie
On Tue, Apr 16, 2013 at 12:49 AM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 Hi Kinkie,

 Here is my review of
 https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I
 found approximately three bugs. I am also concerned about size_type
 type, after starting to use MemBlob which has the same problem. The rest
 is polishing.

Ok, thanks. I have some doubts about size_type as well, I am a bit
worried about touching it at this stage in development, but it may be
better now than never.

 fgrep npos SBuf.cc
 ...
 if (pos == npos || pos  length() || n == 0) {
 if (n == npos || (pos+n)  length())
 if (startPos == SBuf::npos)
 return SBuf::npos;

 For consistency sake inside SBUf methods, please use either npos
 (preferred because it is shorter?) or SBuf::npos.

ok, npos it is.

 if (InitialStore == NULL) {
 static char lowPrototype[] = ;
 InitialStore = new MemBlob(lowPrototype, 0);
 }

 If possible, please use new MemBlob(0) and remove lowPrototype.


 static MemBlob::Pointer InitialStore = NULL;
 if (InitialStore == NULL) {
 static char lowPrototype[] = ;
 InitialStore = new MemBlob(lowPrototype, 0);
 }

 I believe the above can be written simply as

 static MemBlob::Pointer InitialStore = new MemBlob(0);

Yes, it can. Done.

 /** Request to resize the SBuf's store
  *
  * After this method is called, the SBuf is guaranteed to have at least
  * minCapacity bytes of total space, including the currently-used portion

 s/total space/total buffer size/

 we use the term space to denote free/usable space. See reserveSpace(),
 for example.

Ok.

 void
 SBuf::reserveCapacity(size_type minCapacity)
 {
 Must(0 = minCapacity  minCapacity = maxSize);
 reserveSpace(minCapacity-length());
 }

 This does not look right. For example, if minCapacity is smaller than
 length(), we should do nothing, but will throw instead. I think this
 method should look something like this:

 Must(0 = minCapacity  minCapacity = maxSize);
 cow(minCapacity);


 // we're not concerned about RefCounts here,
 // the store knows the last-used portion. If
 // it's available, we're effectively claiming ownership
 // of it. If it's not, we need to go away (realloc)
 if (store_-canAppend(off_+len_, minSpace)) {
 debugs(24, 7, not growing);
 return;
 }

 That comment is probably stale because canAppend() is currently not
 related to ownership of the buffer: It will happily return true for
 shared buffers. AFAICT, reserveSpace() should be implemented using
 something like this:

 Must(0 = minSpace  minSpace = maxSize - length());
 reserveCapacity(length() + minSpace);

The comment is badly worded but the code is correct: canAppend() tells
us if the end of our buffer is at the end of the used portion of the
MemBlob, and there is enough space in the unused MemBlob portion to
contain the data we plan to use. If the buffer is shared, this
guarantee may become stale at the next append operation to any SBuf
sharing the same MemBlob.

A decision needs to be made if these two methods have different
purposes or are to be simply different expressions of the same
underlying concept (as it is now).
IMO, thinking about it, it does make sense that reserveSpace also
guarantees that after the call the MemBlob will be exclusively owned
by this SBuf (aka, blindly cow()), while reserveCapacity can mostly be
meant as an optimization, to hint the SBuf memory management system
that an append operation of a known size is about to happen, so that
only one cow is performed.
What do you think?

 The above two changes imply that reserve*() methods ensure
 single-ownership of the buffer by the caller. This is not something
 documentation of those methods currently guarantees, but I think it is
 logical to assume that anybody calling one of these two methods wants to
 modify the string and, hence, will need exclusive control over that
 string. I cannot think of any reason to call those methods otherwise.

See above.

 typedef int32_t size_type;

 Would it be better to make this unsigned? I do not think string sizes
 can go negative. std::string is using an unsigned type, right? FWIW,
 whenever I try to use MemBlob that has a signed size_type as well, I am
 forced to do casts that would not exist otherwise. I think both types
 should be changed to reflect what they are representing (a size of a
 buffer).

I have the same doubt. I'm just a bit afraid to change this late. Will
try to do as an isolated change. So far this is NOT done.

 On another note, would it be better to use MemBlob::size_type instead of
 int32_t?

currently they're both int32_t. I believe it's more readable with the
extra level of indirection, at the cost of having to manually keep
them in sync.
Added a comment to highlight this fact.

 u_int64_t alloc; ///number of calls to SBuf constructors
 u_int64_t allocCopy; 

Re: [RFC] Time to talk about StringNG merge again?

2013-04-15 Thread Alex Rousskov
Hi Kinkie,

Here is my review of
https://code.launchpad.net/~squid/squid/stringng-cherrypick r12753. I
found approximately three bugs. I am also concerned about size_type
type, after starting to use MemBlob which has the same problem. The rest
is polishing.


 fgrep npos SBuf.cc
...
 if (pos == npos || pos  length() || n == 0) {
 if (n == npos || (pos+n)  length())
 if (startPos == SBuf::npos)
 return SBuf::npos;

For consistency sake inside SBUf methods, please use either npos
(preferred because it is shorter?) or SBuf::npos.


 if (InitialStore == NULL) {
 static char lowPrototype[] = ;
 InitialStore = new MemBlob(lowPrototype, 0);
 }

If possible, please use new MemBlob(0) and remove lowPrototype.


 static MemBlob::Pointer InitialStore = NULL;
 if (InitialStore == NULL) {
 static char lowPrototype[] = ;
 InitialStore = new MemBlob(lowPrototype, 0);
 }

I believe the above can be written simply as

static MemBlob::Pointer InitialStore = new MemBlob(0);


 /** Request to resize the SBuf's store
  *
  * After this method is called, the SBuf is guaranteed to have at least
  * minCapacity bytes of total space, including the currently-used portion

s/total space/total buffer size/

we use the term space to denote free/usable space. See reserveSpace(),
for example.


 void
 SBuf::reserveCapacity(size_type minCapacity)
 {
 Must(0 = minCapacity  minCapacity = maxSize);
 reserveSpace(minCapacity-length());
 }

This does not look right. For example, if minCapacity is smaller than
length(), we should do nothing, but will throw instead. I think this
method should look something like this:

Must(0 = minCapacity  minCapacity = maxSize);
cow(minCapacity);


 // we're not concerned about RefCounts here,
 // the store knows the last-used portion. If
 // it's available, we're effectively claiming ownership
 // of it. If it's not, we need to go away (realloc)
 if (store_-canAppend(off_+len_, minSpace)) {
 debugs(24, 7, not growing);
 return;
 }

That comment is probably stale because canAppend() is currently not
related to ownership of the buffer: It will happily return true for
shared buffers. AFAICT, reserveSpace() should be implemented using
something like this:

Must(0 = minSpace  minSpace = maxSize - length());
reserveCapacity(length() + minSpace);


The above two changes imply that reserve*() methods ensure
single-ownership of the buffer by the caller. This is not something
documentation of those methods currently guarantees, but I think it is
logical to assume that anybody calling one of these two methods wants to
modify the string and, hence, will need exclusive control over that
string. I cannot think of any reason to call those methods otherwise.


 typedef int32_t size_type;

Would it be better to make this unsigned? I do not think string sizes
can go negative. std::string is using an unsigned type, right? FWIW,
whenever I try to use MemBlob that has a signed size_type as well, I am
forced to do casts that would not exist otherwise. I think both types
should be changed to reflect what they are representing (a size of a
buffer).

On another note, would it be better to use MemBlob::size_type instead of
int32_t?


 u_int64_t alloc; ///number of calls to SBuf constructors
 u_int64_t allocCopy; ///number of calls to SBuf copy-constructor
 u_int64_t allocFromString; ///number of copy-allocations from Strings
 u_int64_t allocFromCString; ///number of copy-allocations from c-strings
 u_int64_t assignFast; ///number of no-copy assignment operations

For consistency sake, please use uint64_t like the rest of Squid code does.


 /** Append operation for std::string
  *
  * Append the supplied std::string to the SBuf; extend storage as needed.
  *
  * \param string the std::string to be copied.
  * \param pos how many bytes to skip at the beginning of the c-string
  * \param n how many bytes to import into the SBuf. If it is SBuf::npos
  *  or unspecified, imports to end-of-cstring
  */
 SBuf append(const std::string str, size_type pos = 0, size_type n = 
 npos);

Please adjust the docs to talk about std::string, not c-string. str is
not 0-terminated and its end is determined by str.size().


 SBuf::append(const std::string str, SBuf::size_type pos, SBuf::size_type n)
 {
 return append(str.data(), pos, n); //bounds checked in append()
 }

This will break when n is npos because str.data() is not 0-terminated.
The low-lever c-string append you call here assumes that the string
parameter is 0-terminated if n is npos. Please add the corresponding
missing test case.



 //we can assume that we'll need to append at least strlen(fmt) bytes,
 reserveSpace(strlen(fmt)*2);

The comment does not match the code which assumes we will need to append
at least twice the format length (a 

Re: [RFC] Time to talk about StringNG merge again?

2013-04-01 Thread Alex Rousskov
-- This is not a review, but I wanted to clarify or emphasize some of
the points in Amos' review. Thank you Amos for working on this!


On 03/30/2013 02:32 AM, Amos Jeffries wrote:
 in src/SBuf.cc:
 
 * SBuf::chop() - the TODO optimization is not possible. We cannot be
 completely sure we are teh only user of the

I think it is possible, but the MemBlob cooperation would be required to
ensure correctness.


 * SBuf::operator []() documentation is incorrect. It *does* check access
 bounds, but returns \0 for any bytes outside instead of throwing exception.

The documentation is correct. The implementation is no longer correct.
Please revert to the recent code version that did none of the above. The
[] operator is meant for performance-sensitive loops and should not do
any checks. Even the stats collection should be removed from it IMO. We
provide at() that trades speed for safety. std::string does the same.



 in src/SBuf.cci

BTW, IIRC, we discussed moving all .cci stuff into .h. Or was that a
different project?


 * Sbuf::bufEnd() is lacking any checks about whether the space returned
 is safe to use, no cow() or whatever.

bufEnd() is a private convenience method. It is unsafe and non-cowing by
design. The callers should check and cow, as needed. If the
documentation does not reflect that, it should be fixed.


Thank you,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-04-01 Thread Kinkie
Hi, I forgot to mention another couple of changes:
- I added an explicit copy-constructor to import std::string
- I relaxed the behavior of the chop() method for out-of-bounds cases:
instead of throwing, it'll now ignore the part of the specified limits
that is out of bounds.


On Mon, Apr 1, 2013 at 7:43 PM, Kinkie gkin...@gmail.com wrote:
 My audit, I'm sure Alex will have more.

 +1 for the src/MemBlob.cc and src/MemBlob.h operator addition to go into
 trunk immediately. The rest has a lot of polishing still needed.

 Ok, I'll wait for Alex' review before merging.

 Design Question:  with all the stats being collected is it worth collecting
 an assign-to-self counter? to see if there are any obvious logic bugs in
 usage which we could avoid?

 I tried to measure most operations clusters, to  measure the
 effectiveness of the shared-storage approach during the transition
 phase.
 Once we have a clearer picture, we can gradually remove the
 meaningless statistics.

 in src/SBuf.cc:

  * can we consider making it a guideline to place constructor lists with the
 colon after the constructor definition and the initializer list one to a
 line following, with indentation? It makes it so much clearer to see what
 member is being initialized when there is any kind of complex parameters for
 it. NP: astyle enforces only the indentation.
   ie:
 +SBuf::SBuf() :
 +store_(GetStorePrototype()),
 +off_(0),
 +len_(0)
 +{

 It uses lots of vertical space, and unintialized members can get
 caught by Coverity anyway.
 This said, I have no strong opinion.

  ** along with this can we ensure that the constructor arguments are all on
 one line. If it breaks the 80-character line style preference we need to
 reconsider what its doing.
  - case in point there being the constructor for OutOfBoundsException.

 Well, OutOfBoundsException is 19 chars long; add the class name and
 half the vertical screen estate has been used.
 Again, no strong opinion. I don't think we should put constraints on
 length of class names etc. Besides, implementation classes are not
 really meant to be read often are they?

 * SBuf::vappendf()
  - comment //times 1.2 for instance...  does nt match the reserveSpace()
 call it is about. The second line of comment can be erased.

 Done.

  - according to the comments about Linux vsnprintf() return of -1 is
 guaranteed to violate the Must() condition when reserveSpace(-1*2) is
 called. The if-statement sz0 condition ensures it.

 Changed the two cases, now throws TextException in case of output
 error - I'd assume it's unrecoverable.

  - please avoid C-style casting in new code whenever possible. May occur
 elsewhere also.

 Ok.

  - the line +if (operator[](len_-1) == 0) { should be comparing
 against '\0' instead of 0 to ensure byte/word alignment changes by the
 compiler don't involve magic type castings.

 Ok, done.

 * SBuf::compare() please move the stats.compareSlow counting up top. Just
 under the Must() keeps it in line with the pther stats collections.

 Ok.

 * SBuf::startsWith() lacks the length debugs() statement used in
 operatror==() immediately below it.

 Added.


 * SBuf::copy() why not do:   const SBuf::size_type toexport = min(n,
 length());

 Done.

 * SBuf::forceSize() - we need to consider what happens (or already has
 happened) when a size greater than capacity is forced.

 Right.
 What about throwing an OutOfBoundsException? Overflow has likely happened.
 Not changed yet.

 * SBuf::chop() - the TODO optimization is not possible. We cannot be
 completely sure we are teh only user of the

 Removed the TODO. It's a very unlikely case anyway.

 * SBuf::rfind() - typo in weridness comment memrhr.

 Fixed.

 * Please move operator ()  to an inline function.

 Done (via .cci for now. I recall Alex was suggesting to get rid of
 .cci, will move to .h if I can find confirmation of that).

 * SBuf::reAlloc() comment needs to be doxygen format.

 Done.

 in SBuf.h

 * Debug.h is not needed here. Please move to SBuf.cci instead.
  - probably also SquidString.h can also be removed, but check that with a
 build test.

 Needed for String import/export functions.
 Hopefully String will just go in a relatively short time after merging.

 * several places saying * \note bounds is 0 = pos  length() are
 incorrect. SBuf::checkAccessBounds() tests for: 0 = pos = length()

 Then checkAccessBounds is wrong. Changed unit test accordingly.

 * s/minCapcity/minCapacity/

 Fixed.

 * SBuf::operator []() documentation is incorrect. It *does* check access
 bounds, but returns \0 for any bytes outside instead of throwing exception.

 Gah! Changed to match documentation.

 * SBuf::vappendf() documentation lacking empty line separate from earlier
 method declaration above.

 Added.

 * SBuf::chop()
  - documentation contains HTML tags, Please avoid that. 'i' and 'n' (quoted)
 is a better written style of emphasis.

 Fixed.

  - documentatin of n==0 case lacks the info that SBuf is 

Re: [RFC] Time to talk about StringNG merge again?

2013-04-01 Thread Alex Rousskov
On 04/01/2013 11:43 AM, Kinkie wrote:
 Attached is a bundle containing the changes from this review.

Will you commit that to your stringng branch on lp? Or should I patch
that branch before reviewing/testing instead?


Thank you,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-04-01 Thread Amos Jeffries

On 2/04/2013 6:57 a.m., Alex Rousskov wrote:

-- This is not a review, but I wanted to clarify or emphasize some of
the points in Amos' review. Thank you Amos for working on this!


On 03/30/2013 02:32 AM, Amos Jeffries wrote:

in src/SBuf.cc:

* SBuf::chop() - the TODO optimization is not possible. We cannot be
completely sure we are teh only user of the

I think it is possible, but the MemBlob cooperation would be required to
ensure correctness.



* SBuf::operator []() documentation is incorrect. It *does* check access
bounds, but returns \0 for any bytes outside instead of throwing exception.

The documentation is correct. The implementation is no longer correct.
Please revert to the recent code version that did none of the above. The
[] operator is meant for performance-sensitive loops and should not do
any checks. Even the stats collection should be removed from it IMO. We
provide at() that trades speed for safety. std::string does the same.




in src/SBuf.cci

BTW, IIRC, we discussed moving all .cci stuff into .h. Or was that a
different project?


We had. However since it was not done yet I reviewed in-place.

Kinkie; I have a little time in the next few hours to try my hand at 
this as a test of the shared-access branch.





* Sbuf::bufEnd() is lacking any checks about whether the space returned
is safe to use, no cow() or whatever.

bufEnd() is a private convenience method. It is unsafe and non-cowing by
design. The callers should check and cow, as needed. If the
documentation does not reflect that, it should be fixed.


Thank you,

Alex.





Re: [RFC] Time to talk about StringNG merge again?

2013-03-30 Thread Amos Jeffries

On 30/03/2013 7:05 a.m., Kinkie wrote:

Hi,
   it was easier than expected. Here's a merge bundle; it intentionally
forgets commit history, it can remain in the branch. This bundle
passes a full testsuite build on my kubuntu quantal.
It contains only what was audited so far:
- SBuf
- SBufExceptions
And I think it was audited, but I'm not sure:
- SBufStream

Left out but ready for audit are:
- cachemgr integration
- tokenizer

Should be ready but I'd like to doublecheck
- SBufList (intended to replace StrList)
- SBufUtils

Thanks!



My audit, I'm sure Alex will have more.

+1 for the src/MemBlob.cc and src/MemBlob.h operator addition to go into 
trunk immediately. The rest has a lot of polishing still needed.



Design Question:  with all the stats being collected is it worth 
collecting an assign-to-self counter? to see if there are any obvious 
logic bugs in usage which we could avoid?



in src/SBuf.cc:

 * can we consider making it a guideline to place constructor lists 
with the colon after the constructor definition and the initializer list 
one to a line following, with indentation? It makes it so much clearer 
to see what member is being initialized when there is any kind of 
complex parameters for it. NP: astyle enforces only the indentation.

  ie:
+SBuf::SBuf() :
+store_(GetStorePrototype()),
+off_(0),
+len_(0)
+{

 ** along with this can we ensure that the constructor arguments are 
all on one line. If it breaks the 80-character line style preference we 
need to reconsider what its doing.

 - case in point there being the constructor for OutOfBoundsException.


* SBuf::vappendf()
 - comment //times 1.2 for instance...  does nt match the 
reserveSpace() call it is about. The second line of comment can be erased.
 - according to the comments about Linux vsnprintf() return of -1 is 
guaranteed to violate the Must() condition when reserveSpace(-1*2) is 
called. The if-statement sz0 condition ensures it.
 - please avoid C-style casting in new code whenever possible. May 
occur elsewhere also.
 - the line +if (operator[](len_-1) == 0) { should be comparing 
against '\0' instead of 0 to ensure byte/word alignment changes by the 
compiler don't involve magic type castings.


* SBuf::compare() please move the stats.compareSlow counting up top. 
Just under the Must() keeps it in line with the pther stats collections.


* SBuf::startsWith() lacks the length debugs() statement used in 
operatror==() immediately below it.


* SBuf::copy() why not do:   const SBuf::size_type toexport = min(n, 
length());


* SBuf::forceSize() - we need to consider what happens (or already has 
happened) when a size greater than capacity is forced.


* SBuf::chop() - the TODO optimization is not possible. We cannot be 
completely sure we are teh only user of the


* SBuf::rfind() - typo in weridness comment memrhr.

* Please move operator ()  to an inline function.

* SBuf::reAlloc() comment needs to be doxygen format.


in SBuf.h

* Debug.h is not needed here. Please move to SBuf.cci instead.
 - probably also SquidString.h can also be removed, but check that with 
a build test.


* several places saying * \note bounds is 0 = pos  length() are 
incorrect. SBuf::checkAccessBounds() tests for: 0 = pos = length()


* s/minCapcity/minCapacity/

* SBuf::operator []() documentation is incorrect. It *does* check access 
bounds, but returns \0 for any bytes outside instead of throwing exception.


* SBuf::vappendf() documentation lacking empty line separate from 
earlier method declaration above.


* SBuf::chop()
 - documentation contains HTML tags, Please avoid that. 'i' and 'n' 
(quoted) is a better written style of emphasis.
 - documentatin of n==0 case lacks the info that SBuf is cleared, like 
the poslength() case does.


* SBuf::rawContent() does not call cow(), so please call it a read-only 
pointer in the documentation.


* SBuf::print()
 - one-liner doxygen uses ///.
 - it could document better what the method does to differentiate 
itself from dump() other than print which is obvious from the name.


* SBuf::operator ==() has a useless doxygen comment. Please make it a 
normal code comment boolean conditionsls or just remove. We have no 
style guideline around this but the existing comment is simply useless.


* please mark the SBuf::toString() converter as \deprecated.

* SBuf::GetStats() doxygen one-liner. same elsewhere ie SBuf::length().

* SBuf::rawSpace() - the line This call forces a cow() duplicates and 
contradicts the above comments about COW *if necessary*. IMO it should 
say somethign like guarantees the space is safe to write to.


* SBuf::c_str() s/will remain valid if the caller keeps holding/will 
remain valid only if the caller keeps holding/


* if you are going to make TODO comments doxygen ones, please use \todo 
tag instead of our developers TODO:


* SBuf::ReserveSpace()
...
+ * least minSpace bytes of append-able backing store (on top of the
+ * currently-used 

Re: [RFC] Time to talk about StringNG merge again?

2013-03-29 Thread Amos Jeffries

On 29/03/2013 11:46 p.m., Kinkie wrote:

Hi all,
   it's been a while since the last StringNG merge checkpoint , and
several improvements were made in the meantime.

I have briefly reviewed the previous requests for changes, I don't
think there's any outstanding change requests; the new unit testing
gives encouraging results (thanks Alex  Amos!).

Feature branch is at lp:~squid/squid/stringng

Questions were raised about the merge strategy - should we include the
Tokenizer and additional stuff?
My opinion: in order to minimize the effort, I'd like to merge
everything, but marking everything but SBuf as
experimental-do-not-touch or #ifdef-d out. It'd mean either a bit of
unused shipped code or unused shipped files.

Comments?


If the other stuff has not been reviewed, or is not able to pass review 
quickly then NO please.


I am happy to spend some time assisting the cherry-picking process for 
SBuf alone to go in as soon as it passes the audit if time and effort is 
the worry.


Amos


Re: [RFC] Time to talk about StringNG merge again?

2013-03-29 Thread Alex Rousskov
On 03/29/2013 04:46 AM, Kinkie wrote:

   it's been a while since the last StringNG merge checkpoint , and
 several improvements were made in the meantime.
 
 I have briefly reviewed the previous requests for changes, I don't
 think there's any outstanding change requests; the new unit testing
 gives encouraging results (thanks Alex  Amos!).
 
 Feature branch is at lp:~squid/squid/stringng

Well, request for changes follow requests for merge. I do not recall
recent requests for StringNG merge. Let's treat your email as the latest
request for merge. FWIW, I should be able to work on this next week.


 Questions were raised about the merge strategy - should we include the
 Tokenizer and additional stuff?
 My opinion: in order to minimize the effort, I'd like to merge
 everything, but marking everything but SBuf as
 experimental-do-not-touch or #ifdef-d out. It'd mean either a bit of
 unused shipped code or unused shipped files.

My opinion on that has not changed (just like the relevant code?): We
should not include that code because it is of insufficient quality and
has not been reviewed. Removing it just before commit after patching (or
merging into) trunk is a minor overhead because nothing uses it. The
code may still remain in the StringNG branch, of course.


Cheers,

Alex.



Re: [RFC] Time to talk about StringNG merge again?

2013-03-29 Thread Kinkie
Ok, it's agreed then, I'll cherrypick.
File-wise it's not hard. I'm a bit worried about automake , but I'll manage.

On Fri, Mar 29, 2013 at 4:14 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 On 03/29/2013 04:46 AM, Kinkie wrote:

   it's been a while since the last StringNG merge checkpoint , and
 several improvements were made in the meantime.

 I have briefly reviewed the previous requests for changes, I don't
 think there's any outstanding change requests; the new unit testing
 gives encouraging results (thanks Alex  Amos!).

 Feature branch is at lp:~squid/squid/stringng

 Well, request for changes follow requests for merge. I do not recall
 recent requests for StringNG merge. Let's treat your email as the latest
 request for merge. FWIW, I should be able to work on this next week.


 Questions were raised about the merge strategy - should we include the
 Tokenizer and additional stuff?
 My opinion: in order to minimize the effort, I'd like to merge
 everything, but marking everything but SBuf as
 experimental-do-not-touch or #ifdef-d out. It'd mean either a bit of
 unused shipped code or unused shipped files.

 My opinion on that has not changed (just like the relevant code?): We
 should not include that code because it is of insufficient quality and
 has not been reviewed. Removing it just before commit after patching (or
 merging into) trunk is a minor overhead because nothing uses it. The
 code may still remain in the StringNG branch, of course.


 Cheers,

 Alex.




-- 
/kinkie