Hi Pete,

I must have misunderstood then.  I thought that "pages" was a list
of all the ones needed for this particular access, while
"issue_pages" was the (possibly empty) subset that needed to have IO
started on them.  Then if IO failed, you would just want to kill off
the issue_pages, not the entire set.  But sounds like you know
what's going on.

pages -> array of all pages needed for the access
issue_pages -> array of only those pages for which I/O needs to be done
page_list -> linked list of pages for which I/O is needed. this is
required by the read_cache_pages() function and is drained after the
cb finishes for each page.
Hopefully, I have done the right thing there. I will double check today.

>> Little stuff:  no need to cast return value of kmalloc().  There's
>> also kzalloc() if you're going to do a memset(,0,) next.
>
> Is kzalloc() available on older kernels or should I add a configure time
> option for that?
> Will change this too..

True, it's somewhat recent.  Ho hum.  Could write your own kzalloc()
and use the config option, or just not bother and use the older
interfaces.  I don't care.  Probably either is easier than
submitting the code to the kernel so they maintain future changes
like this.  :)

yeah.. I think I will add the configure option and kzalloc stuff.

Or some magic per-directory opt-in xattr flag.  I like that
thinking, as it parallels the immutable flag usage nicely.

Exactly.

> In order to do write buffering efficiently, we should try to avoid doing a
> read before we modify the page and also keep track of the dirty
> ranges, but that seems like a hard thing to do from a module without vfs
> changes. Aggregating writebacks for efficiency is also
> an issue, but I suspect we could look at the ->writepages callback and see
> if that will work for us.

"Avoid doing a read before we modify the page?"  Wouldn't it go
something like this?  I likely misunderstand again:

    read:
        if IS_CTOO || IS_IMMUTABLE
            if in page cache
                return data, dirty or not
            else
                do IO
    write:
        if IS_CTOO
            if in page cache
                update data, mark dirty
            else
                do IO, then update and mark dirty

and just let pagecache writeback take care of its business as normal.

Dirty ranges?  Nah, don't bother.  Just do full page granularity.

Oh ok.. I guess if we do whole page granularity we might be ok.. not
sure. think of the following scenario,
I am mostly worried about correctness issues with caching even in the
presence of non-overlapping data.
Initial file size is 0.
Client 1 on node 1 writes to file offsets 0 to 3000 and client 2 on
node 2 from 3001 to 6000. With a page size of 4096, they both share
file block 0.. What ends up on disk for file block 0 with write
buffering? Sadly, we cannot punt and say user error since the program
was operating with nonoverlapping offsets.
Perhaps, we could ask such users/programs from not using the write
buffering algorithm..
Will MPI-IO programs be able to prevent this kind of thing from
happening magically?


And efficiency may just happen with the new writepages.  If it
doesn't, I don't think we should go out of our way to make this
usage model particularly fast.

sounds good to me.


> BTW: Do not try the patch just yet. I have done zero testing on it and it
> might crash your machine.

Heh.  I'm too busy crashing it myself with SCSI patches.

Awesome. I am doing the same at work here :)

P.S.  Your gmail produces ugly mail.  Can you turn off HTML, and
maybe get it to do quoting right?

I turned off html. But i have no clue how to do the latter.
Thanks as always for the insightful reviews..
Murali
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to