Re: Buffer/String split, take2

2009-01-22 Thread Kinkie
On Wed, Jan 21, 2009 at 7:19 PM, Tsantilas Christos
chtsa...@users.sourceforge.net wrote:
 Hi all,
  I believe that the DC design is better than Universal Buffer because of
 many reasons. My sense is that the Universal Buffer will be very complex and
 will not have the desired results, because the real problems  exist in other
 subsystems (eg parsers).

 But if choosing the DC design means that we are going to wait for 2 or more
 years to be implemented because of the lack of the development time  maybe
 it is better to accept the Universal Buffer design.  It will not solve all
 the problems but it is not bad, it is an improvement.

 But again I like the idea of a well designed Buffers Api, where buffer
 classes handle different cases and String be a class (or classes) which
 operates on Buffer (sub-)regions.

The need to avoid data copying and the numerous xstrndup()s we have
laying around is in any case the key win to this.
The discussion on the topic went ahead on IRC yesterday. I've updated
the wiki to include it, please see
http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17.


-- 
/kinkie


Re: Buffer/String split, take2

2009-01-22 Thread Tsantilas Christos

Kinkie wrote:

On Wed, Jan 21, 2009 at 7:19 PM, Tsantilas Christos
chtsa...@users.sourceforge.net wrote:

Hi all,
 I believe that the DC design is better than Universal Buffer because of
many reasons. My sense is that the Universal Buffer will be very complex and
will not have the desired results, because the real problems  exist in other
subsystems (eg parsers).

But if choosing the DC design means that we are going to wait for 2 or more
years to be implemented because of the lack of the development time  maybe
it is better to accept the Universal Buffer design.  It will not solve all
the problems but it is not bad, it is an improvement.

But again I like the idea of a well designed Buffers Api, where buffer
classes handle different cases and String be a class (or classes) which
operates on Buffer (sub-)regions.


The need to avoid data copying and the numerous xstrndup()s we have
laying around is in any case the key win to this.

OK.


The discussion on the topic went ahead on IRC yesterday. I've updated
the wiki to include it, please see
http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17.



Long but very interested discussions :-). It remains to see the StringNg 
in practice!




Re: Buffer/String split, take2

2009-01-21 Thread Kinkie
On Tue, Jan 20, 2009 at 10:24 PM, Alex Rousskov
rouss...@measurement-factory.com wrote:
 Hello,

Kinkie has finished another round of his String NG project. The code
 is available at https://code.launchpad.net/~kinkie/squid/stringng
 During code review and subsequent IRC discussion archived at
 http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became
 apparent that the current design makes all participating developers
 unhappy (for different reasons).

 We have to revisit the discussion we had in the beginning of this
 project[1] and put this issue to rest, at last.
 [1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188


 There was not enough developers on the IRC to come to a consensus
 regarding the best direction, but it was clear that the current design
 is the worst one considered as it tries to mix at least two incompatible
 designs together.

 This email summarizes a few design options we can chose from (none of
 them matches the current code for the above mentioned reasons).

 Please voice your opinion: which design would be best for Squid 3.2 and
 the foreseeable future.


 * Universal Buffer:

 Blob = low-level raw chunk of RAM invisible to general code
allocates, holds, frees raw RAM buffer
can grow the buffer and write to the buffer
the memory allocation strategy can change w/o affecting others
does not have a notion of content, just allocated RAM

 Buffer = all-purpose user-level buffer
allows users to safely share a Blob instance via COW
search, compare, consume, append, truncate, import, export, etc.
has (offset, length) to maintain an area of Blob used by this Buffer

 This design is very similar to std::string. The code gets a universal
 buffer that can do everything. This is probably the simplest design
 possible.

 The primary drawback here is that it would be difficult and messy to
 optimize different buffering needs in a single Buffer class.

 For example, I/O buffers usually need to track appended/consumed size
 and want to optimize (or eliminate) coping when it is time to do the
 next I/O while some strings are pointing to the old buffer content.
 Adding that tracking logic and optimizations to generic Buffer would be
 wrong because it will pollute Buffers used like strings.

 Similarly, general strings may want to keep encoding information or
 perform heavy search optimizations. Adding those to generic Buffer would
 be wrong because it will pollute I/O buffers code.

 Another example is adding simple but efficient vector I/O support. With
 a single Buffer, it would be difficult to support vectors because it
 will clash with string-like usage needs.



 * Divide and Conquer (DC):

 Blob = low-level raw chunk of RAM invisible to general code
same as Blob in the Universal Buffer approach

 Buffer = shareable Blob
allows users to safely share a Blob instance via COW
works with Blob as a whole: not areas (see note below)
used as exchange interface between specialized buffers

 IoBuffer = buffer optimized for I/O needs
perhaps should be called IoStream
uses Buffer
has (appended, consumed) to track I/O progress and area
exports available data as a Buffer instance
may eventually support vector I/O by using multiple Buffers

 String = buffer optimized for content manipulation
uses Buffer
has (offset, length) to maintain a Buffer content area
search, compare, replace, append, truncate, import, export, etc.
may eventually store content encoding information

 The killer idea here is that the interpretation of a piece of allocated
 and shareable RAM (i.e, Buffer) is left to classes that specialize in
 certain memory manipulations (e.g., I/O or string search). Optimizing or
 changing one class does not have to affect the other.

 More specialized classes can be added as needed. Buffer is used to share
 info between classes. Conversions are explicit and easier to track. We
 could also add an Area class that makes it possible to store content
 offset and length when importing or exporting a Buffer.

 (note) A possible variation of the same design would be to move area
 manipulation to Buffer. This will free String from area code but force
 IoBuffers and others to use the same area model instead of
 appended/consumed counters or whatever they need. This will probably
 make migration to vectored I/O more complex, but we can deal with it. If
 DC approach is chosen, we will decide where to put area manipulation:
 Buffer, String, or perhaps a separate Area class.


 * Other

 There are probably other options.


 I still think we should implement one good design, commit it, and work
 on converting the code to use it rather than starting with massaging the
 old code to be easier to convert to something in the future. If you
 would like to discuss the choice between those two strategies, please
 start your own thread :-)!


 So far, my _personal_ interpretation of votes based on the recent IRC
 discussions 

Re: Buffer/String split, take2

2009-01-21 Thread Adrian Chadd
2009/1/21 Kinkie gkin...@gmail.com:

 What I fear from the DC approach is that we'll end up with lots of
 duplicate code between the 'buffer' classes, to gain a tiny little bit
 of efficiency and semantic clarity. If that approach has to be taken,
 then I'd rather take the variant of the note - in fact that's quite in
 line with what the current (agreeably ugly) code does.

The trouble is that the current, agreeably ugly code, actually works
(for values of works) right now, and the last thing the project
needs is for that works bit to be disturbed too much.

 In my opinion the 'universal buffer' model can be adapted quite easily
 to address different uses by extending its allocation strategy - it's
 a self-contained function of code exactly for this purpose, and it
 could be extended again by using Strategy patterns to do whatever the
 caller wishes. It would be trivial for instance for users to request
 that the underlying memory be allocated by the pageful, or to request
 preallocation of a certain amount of memory if they know they'll be
 using, etc.
 Having a wide interface is a drawback of the Universal approach,

But you don't know how that memory should be arranged. If its just for
strings, then you know the memory should be arranged in whatever makes
sense to minimise memory allocator overheads. In the parsing codepath,
that involves parsing and creating references to an already-allocated
large chunk of RAM, instead of copying into separately allocated
areas. For things like disk IO (and later on, network IO too!) this
may not be as obvious a case. In fact, based on the -provider-
(anonymous? disk? network? some peer module?) you may want to request
pages from -them- to put data into for various reasons, as simply
grabbing an anonymous page from the system allocator and filling it
with data may need -another- copy step later on.

This is why I'm saying that right now, focusing on -just- the String
stuff and the minimum required to do copy-free parsing and copying in
and out of the store is probably the best bet. A universal buffer
method is probably over-reaching things. There's a lot of code in
Squid which needs tidying up and whatever we come up and -all- of it
-has- to happen -regardless- of what buffer abstraction(s) we choose.

 Regarding vector i/o, it's almost a no-brainer at a first glance:
 given UniversalBuffer, implement UniversalBufferList and make MemBuf
 use the latter to implement producer-consumer semantics. Then use this
 for writev(). produce and consume become then extremely lightweight
 calls. Let me remind you that currently MemBuf happily memmoves
 contents at each consume, and other producer-consumer classes I could
 find (BodyPipe and StoreEntry) are entirely different beasts, which
 would benefit from having their interfaces changed to use
 UniversalBuffers, but probably not their innards.

And again, what I'm saying here is that a conservative, cautious
approach now is likely to save a lot of risk in the development path
forward.

 Regarding Adrian's proposal, he and I discussed the issue extensively.
 I don't agree with him that the current String will give us the best
 long-term benefits. My expectation is (but we can only know after we
 have at least some extensive use of it) that the cheap substringing
 features of the current UniversalBuffer implementation will give us
 substantial benefits in the long term.
 I agree with him that fixing the most broken parts of the String
 interface is a sensible strategy for merging whatever String
 implementation we end up choosing.

 I fear that if we focus too much on the long-term, we may end up
 losing sight of the medium-term, and thus we risk reaching neither
 because short-term noone does anything. EVERYONE keeps on asserting
 that squid (2 and 3) has low-level issues to be fixed, yet at the same
 time only Adrian does something in squid-2, and I feel I'm the only
 one trying to do something in squid-3 - PLEASE correct me and prove me
 wrong.

*shrug* I think people keep choosing the wrong bits to bite off. I'm
not specifically talking about you Kinkie, this certainly isn't the
only instance where the problem isn't really fully understood.

The problem in my eyes is that noone understands the entire Squid-3
codebase enough to start to understand what needs to happen and begin
engineering an actual path forward. Everyone knows their little
corner of the codebase. Squid-3 seems to be plagued by little
mini-projects which focus on specific areas without much knowledge of
how it all holds together, and all kinds of busted behaviour ensues.

 There's another issue which worries me: the current implementation has
 been in the works for 5 months; there have been two extensive reviews,
 two half-rewrites and endless discussions. Now the issue crops up that
 the basic design - whose blueprint has also been available for 5
 months in the wiki - is not good, and that we may end up having to
 basically start from scratch. How can we as 

Re: Buffer/String split, take2

2009-01-21 Thread Tsantilas Christos

Hi all,
 I believe that the DC design is better than Universal Buffer because 
of many reasons. My sense is that the Universal Buffer will be very 
complex and will not have the desired results, because the real problems 
 exist in other subsystems (eg parsers).


But if choosing the DC design means that we are going to wait for 2 or 
more years to be implemented because of the lack of the development time 
 maybe it is better to accept the Universal Buffer design.  It will not 
solve all the problems but it is not bad, it is an improvement.


But again I like the idea of a well designed Buffers Api, where buffer 
classes handle different cases and String be a class (or classes) which 
operates on Buffer (sub-)regions.


Regards,
   Christos.



Buffer/String split, take2

2009-01-20 Thread Alex Rousskov
Hello,

Kinkie has finished another round of his String NG project. The code
is available at https://code.launchpad.net/~kinkie/squid/stringng
During code review and subsequent IRC discussion archived at
http://wiki.squid-cache.org/MeetUps/IrcMeetup-2009-01-17 it became
apparent that the current design makes all participating developers
unhappy (for different reasons). 

We have to revisit the discussion we had in the beginning of this
project[1] and put this issue to rest, at last.
[1] http://thread.gmane.org/gmane.comp.web.squid.devel/8188


There was not enough developers on the IRC to come to a consensus
regarding the best direction, but it was clear that the current design
is the worst one considered as it tries to mix at least two incompatible
designs together.

This email summarizes a few design options we can chose from (none of
them matches the current code for the above mentioned reasons). 

Please voice your opinion: which design would be best for Squid 3.2 and
the foreseeable future.


* Universal Buffer:

Blob = low-level raw chunk of RAM invisible to general code
allocates, holds, frees raw RAM buffer
can grow the buffer and write to the buffer
the memory allocation strategy can change w/o affecting others
does not have a notion of content, just allocated RAM

Buffer = all-purpose user-level buffer
allows users to safely share a Blob instance via COW
search, compare, consume, append, truncate, import, export, etc.
has (offset, length) to maintain an area of Blob used by this Buffer

This design is very similar to std::string. The code gets a universal
buffer that can do everything. This is probably the simplest design
possible.

The primary drawback here is that it would be difficult and messy to
optimize different buffering needs in a single Buffer class. 

For example, I/O buffers usually need to track appended/consumed size
and want to optimize (or eliminate) coping when it is time to do the
next I/O while some strings are pointing to the old buffer content.
Adding that tracking logic and optimizations to generic Buffer would be
wrong because it will pollute Buffers used like strings.

Similarly, general strings may want to keep encoding information or
perform heavy search optimizations. Adding those to generic Buffer would
be wrong because it will pollute I/O buffers code.

Another example is adding simple but efficient vector I/O support. With
a single Buffer, it would be difficult to support vectors because it
will clash with string-like usage needs.



* Divide and Conquer (DC):

Blob = low-level raw chunk of RAM invisible to general code
same as Blob in the Universal Buffer approach

Buffer = shareable Blob
allows users to safely share a Blob instance via COW
works with Blob as a whole: not areas (see note below)
used as exchange interface between specialized buffers

IoBuffer = buffer optimized for I/O needs
perhaps should be called IoStream
uses Buffer
has (appended, consumed) to track I/O progress and area
exports available data as a Buffer instance
may eventually support vector I/O by using multiple Buffers

String = buffer optimized for content manipulation
uses Buffer
has (offset, length) to maintain a Buffer content area
search, compare, replace, append, truncate, import, export, etc.
may eventually store content encoding information

The killer idea here is that the interpretation of a piece of allocated
and shareable RAM (i.e, Buffer) is left to classes that specialize in
certain memory manipulations (e.g., I/O or string search). Optimizing or
changing one class does not have to affect the other. 

More specialized classes can be added as needed. Buffer is used to share
info between classes. Conversions are explicit and easier to track. We
could also add an Area class that makes it possible to store content
offset and length when importing or exporting a Buffer.

(note) A possible variation of the same design would be to move area
manipulation to Buffer. This will free String from area code but force
IoBuffers and others to use the same area model instead of
appended/consumed counters or whatever they need. This will probably
make migration to vectored I/O more complex, but we can deal with it. If
DC approach is chosen, we will decide where to put area manipulation:
Buffer, String, or perhaps a separate Area class.


* Other

There are probably other options.


I still think we should implement one good design, commit it, and work
on converting the code to use it rather than starting with massaging the
old code to be easier to convert to something in the future. If you
would like to discuss the choice between those two strategies, please
start your own thread :-)!


So far, my _personal_ interpretation of votes based on the recent IRC
discussions and that earlier squid-dev thread[1] is:

  Universal String: Kinkie, Amos
  Divide and Conquer: Adrian, Henrik, Alex

Do you prefer a Universal 

Re: Buffer/String split, take2

2009-01-20 Thread Adrian Chadd
2009/1/20 Alex Rousskov rouss...@measurement-factory.com:

 Please voice your opinion: which design would be best for Squid 3.2 and
 the foreseeable future.

[snip]

I'm about 2/3rds of the way along the actual implementation path of
this in Cacheboy so I can provide an opinion based on increasing
amounts of experience. :)

[Warning: long, somewhat rambly post follows, from said experience.]

The thing I'm looking at right now is what buffer design is required
to adequately handle the problem set. There's a few things which we
currently do very stupidly in any Squid related codebase:

* storeClientCopy - which Squid-2.HEAD and Cacheboy avoid the copy on,
but it exposes issues (see below);
* storeAppend - the majority of data coming -into- the cache (ie,
anything from an upstream server, very applicable today for forward
proxies, not as applicable for high-hit-rate reverse proxies) is still
memcpy()'ed, and this can use up a whole lot of bus time;
* creating strings - most strings are created during parsing; few are
generated themselves, and those which are, are at least half static
data which shouldn't be re-generated over and over and over again;
* duplicating strings - httpHeaderClone() and friends - dup'ing
happens quite often, and making it cheap for the read only copies
which are made would be fantastic
* later on, being able to use it for disk buffers, see below
* later on, being able to properly use it for the memory cache, again see below

The biggest problems I've hit thus far stem from the data pipeline
from server - memstore - store client - client side. At the moment,
the storeClientCopy() call aggregates data across the 4k stmem page
size (at least in squid-2/cacheboy, I think its still 4k in squid-3)
and thus if your last access gave you half a page, your next access
can get data from both the other half of the page and whatever is in
the next buffer. Just referencing the stmem pages in 2.HEAD/Cacheboy
means that you can (and do) end up with a large number of small reads
from the memory store. You save on the referencing, but fail on the
work chunk size. You end up having to have a sensible reference
counted buffer design -and- a vector list to operate on it with.

The string type right now makes sense if it references a contiguous,
linear block of memory (ie, a sub-region of a contig buffer). This is
how its treated today. For almost all of the lifting inside Squid
proper, that may be enough. There may however be a need later on for
string-like and buffer-like operations on buffer -vectors- - for
example, if you're doing some kind of content scanning over incoming
data, you may wish to buffer your incoming data until you have enough
data to match that string which is straddling two buffers - and the
current APIs don't support it. Well, nothing in Squid supports it
currently, but I think its worth thinking about for the longer term.

Certainly though, I think that picking a sensible string API with
absolutely no direct buffer access out of a few controlled areas (eg,
translating a list of strings or list of buffers into an iovec for
writev(), for example) is the way to go. That will equip Squid with a
decent enough set of tools to start converting everything else which
currently uses C strings over to using Squid Strings and eventually
reap the benefits of the zero-cost string duplication.

Ok, to summarise, and this may not exactly be liked by the majority of
fellow developers:

I think the benefits that augmenting/fixing the current SquidString
API and tidying up all the bad places where its used right now is
going to give you the maximum long-term benefit. There's a lot of
legacy code right now which absolutely needs to be trashed and
rewritten. I think the smartest path forward is to ignore 95% of the
decision about deciding which buffering method to use for now, fix the
current String API and all the code which uses it so its sensible (and
fixing it so its sensible won't take long; fixing the code which
uses it will take longer) and at that point the codebase will be in
much better shape to decide which will be the better path forward.

Now, just so people don't think I'm stirring trouble, I've gone
through this myself in both a squid-2 branch and Cacheboy, and here's
what I found:

* there's a lot of code which uses C strings created from Strings;
* there's a lot of code which init'ed strings from C strings, where
the length was already known and thrown out;
* there's a lot of code which init'ed strings from C strings which
were once Strings;
* there's even code which init's strings -from- a string, but only by
using strBuf(s) (I'm pointing at the http header related code here,
ugh)
* all the stuff which directly accesses the string buffer code can and
should be tossed, immediately - unfortunately there's a lot of it, the
majority being in what I gather is very long-lived code in
src/client_side.c (and what it became in squid-3)

So what I'm sort of doing now in Cacheboy-head, combined