On Thu, Jan 10, 2002 at 03:06:38PM +0000, Alex Gough wrote: > Also, I'm a bit concerned that our null termination games: > > s->bufstart = mem_sys_allocate(buflen+1); > ... > memset((char *)s->bufstart+s->bufused,0,1); > > Are going to lead to an eternity of OBO errors. Also if our encoding or > output does not require termination, this is a waste of time. Also > I don't think all the string functions update the termination when > they add characters beyond bufused but below buflen, which means that > any output function that needs null termination will have to check > for this itself anyway. All in all, I think it would make more sense to > keep the code that cares about termination away from the general string > code as it's being a bit obscuring at present.
I would strongly recommend that perl6 mandates that buffers are not nul terminated. Anything that needs a nul should arrange for one to be appended. [eg by ensuring that the buffer is writable, extending it by one byte if needs be, and writing that nul, or by copying out the contents.] If there are no explicit nul bytes, and none needed to be added then as you say 0: Avoids lots of off by one errors. 1: Saves time adding a nul byte. [1 call to memcpy to set a buffer, rather than (a call of given length && an explicit write of nul) or (add one to length && call memcpy) if you know source has a nul byte] 2: It's more robust. If your code knows that it needs a nul byte, you don't need to ponder whether you can trust everyone else (eg the string functions) not to shaft you by buggily forgetting nul bytes and also 3: substrings can point into the buffers of other things. The last one lets you mmap a file (or however VMS does it faster still) and then make your <> scalars point into the mmap buffer, flagged copy on write. And if your grep-as-perl doesn't actually modify the buffer there's no copying. This assumes that the housekeeping of copy-on-write is less than the time spent copying. Nicholas Clark -- ENOJOB http://www.ccl4.org/~nick/CV.html