[EMAIL PROTECTED] wrote on Wed, 01 Nov 2006 09:36 -0800:
>> The error handling may not do the right thing.  If you get a read
>> error on your issue_pages, you go through and error out the entire
>> page_list, even though the previously cached ones are probably fine.
> 
> So, the only ones that are in the page_list are the ones that were not in
> the page-cache and were newly allocated/added
> to the page-cache. So we want those pages to be marked as erroneous in case
> another I/O happens to hit on those pages, no?

I must have misunderstood then.  I thought that "pages" was a list
of all the ones needed for this particular access, while
"issue_pages" was the (possibly empty) subset that needed to have IO
started on them.  Then if IO failed, you would just want to kill off
the issue_pages, not the entire set.  But sounds like you know
what's going on.

>> Little stuff:  no need to cast return value of kmalloc().  There's
>> also kzalloc() if you're going to do a memset(,0,) next.
> 
> Is kzalloc() available on older kernels or should I add a configure time
> option for that?
> Will change this too..

True, it's somewhat recent.  Ho hum.  Could write your own kzalloc()
and use the config option, or just not bother and use the older
interfaces.  I don't care.  Probably either is easier than
submitting the code to the kernel so they maintain future changes
like this.  :)

> Yep. this is the question I am grappling with.
> Even with close-to-open semantics, write buffering seems like it will be
> prone to correctness issues.
> There is no equivalent to IS_IMMUTABLE, but we could do it for IS_CTOO or
> something like that to go through the write buffering
> path.

Or some magic per-directory opt-in xattr flag.  I like that
thinking, as it parallels the immutable flag usage nicely.

> In order to do write buffering efficiently, we should try to avoid doing a
> read before we modify the page and also keep track of the dirty
> ranges, but that seems like a hard thing to do from a module without vfs
> changes. Aggregating writebacks for efficiency is also
> an issue, but I suspect we could look at the ->writepages callback and see
> if that will work for us.

"Avoid doing a read before we modify the page?"  Wouldn't it go
something like this?  I likely misunderstand again:

    read:
        if IS_CTOO || IS_IMMUTABLE
            if in page cache
                return data, dirty or not
            else
                do IO
    write:
        if IS_CTOO
            if in page cache
                update data, mark dirty
            else
                do IO, then update and mark dirty

and just let pagecache writeback take care of its business as normal.

Dirty ranges?  Nah, don't bother.  Just do full page granularity.

And efficiency may just happen with the new writepages.  If it
doesn't, I don't think we should go out of our way to make this
usage model particularly fast.

> BTW: Do not try the patch just yet. I have done zero testing on it and it
> might crash your machine.

Heh.  I'm too busy crashing it myself with SCSI patches.

                -- Pete

P.S.  Your gmail produces ugly mail.  Can you turn off HTML, and
maybe get it to do quoting right?

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to