Hi everyone, I've made progress with my COSS work:
* COSS now seems to work fine with 64-bit file offsets; I'm doing all of my testing with 10 gigabyte stripes and I'll test it with larger once the code is stable * I've fixed the object reading race condition: this needed me to flesh out the internals a little to look like a 'real' filesystem. Read operations now result in creating a 'readop' which can either be completed immediately or are put on hold pending the completion of another read (the object relocate read.) This seems to work fine. * I've removed the xmalloc/xfree stuff Eric did for object reads; all object reads are now copied straight out of a membuf. I had to change the membuf semantics to allow them to hang around after a write has completed. * Little fixes here and there Whats left! * The 'other' race condition - object reads from disk being scheduled from an area that is just about to be written over - needs to be fixed. This doesn't happen often at all but I'd prefer it to be fixed for completeness. * Fix dirty/clean rebuilds to work properly. My local tree writes the object size out, if known, when writing out the TLV swap metadata. I've hacked up a little utility which lets me read a COSS stripe - since stripes now begin at fixed multiples in the COSS file I can parse the store a stripe at a time. This gives me some hope in figuring out how to complete a COSS dirty rebuild. * POSIX AIO scheduling: the default 128 pending aiops get used up really quickly when the cache is full; they're almost always going to be read events. Something 'better' needs to be dreamt up to better schedule the AIO calls. I'll leave this until after I've fixed everything else; this way I can ask for testers and I won't have to worry (too much, I hope!) about fixing random crashes. * COSS only caches objects that specify their size up-front; this doesn't happen very often. I haven't yet done any analysis to get exact numbers but I think the store layer might need a little tweaking to allow some kind of "delayed" swap. Again I have no idea if this has been implemented in Squid-3 but I'll do a proof of concept in my local 2.5 tree, test it out and provide feedback here so we can all discuss it. Adrian
