Re: [Pvfs2-developers] Review: experimental readcaching code

Murali Vilayannur Thu, 02 Nov 2006 09:22:40 -0800

Hi guys,
I agree. I had no intention of checking this in at all. I only wanted
feedback on the patch.
I usually add comments only after I have tested things out prior to
checking things in..
Perhaps, I should have added it prior to sending it out for review.
sorry about that.
Thanks,
Murali


On 11/2/06, Sam Lang <[EMAIL PROTECTED]> wrote:


On Nov 2, 2006, at 9:12 AM, Walter B. Ligon III wrote:

> Is there a particular reason this isn't documented IN THE CODE?
> There's this thing called comments, that's what it's for!
>
Hi Walt,

I completely agree that documentation should be a high priority, but
aren't we all guilty of not doing it as much as we should?  I've
slogged through some pretty hairy bits of request processing and
state machine code that I wished had a few more comments.

In terms of patches, its nice to have comments in there, but in the
end vetting should be done on the code, not the comments.  I would
argue that this should be handled on a case by case basis.  If you
get a patch you don't understand, ask for clarification as Pete has
done in this case.

-sam


> After slogging through some of the more recent code (like the
> client-core) I'm becomming of the opinion that we should not accept
> patches or new code that isn't properly commented.  The original
> code base was much better (though not perfectly) commented.
> Letting thousands of lines of code creep in uncommented like that
> is a bad trend.
>
> Just a thought ...
>
> Walt
>
> Murali Vilayannur wrote:
>> Hi Pete,
>>> I must have misunderstood then.  I thought that "pages" was a list
>>> of all the ones needed for this particular access, while
>>> "issue_pages" was the (possibly empty) subset that needed to have IO
>>> started on them.  Then if IO failed, you would just want to kill off
>>> the issue_pages, not the entire set.  But sounds like you know
>>> what's going on.
>> pages -> array of all pages needed for the access
>> issue_pages -> array of only those pages for which I/O needs to be
>> done
>> page_list -> linked list of pages for which I/O is needed. this is
>> required by the read_cache_pages() function and is drained after the
>> cb finishes for each page.
>> Hopefully, I have done the right thing there. I will double check
>> today.
>>>
>>> >> Little stuff:  no need to cast return value of kmalloc().
>>> There's
>>> >> also kzalloc() if you're going to do a memset(,0,) next.
>>> >
>>> > Is kzalloc() available on older kernels or should I add a
>>> configure time
>>> > option for that?
>>> > Will change this too..
>>>
>>> True, it's somewhat recent.  Ho hum.  Could write your own kzalloc()
>>> and use the config option, or just not bother and use the older
>>> interfaces.  I don't care.  Probably either is easier than
>>> submitting the code to the kernel so they maintain future changes
>>> like this.  :)
>> yeah.. I think I will add the configure option and kzalloc stuff.
>>> Or some magic per-directory opt-in xattr flag.  I like that
>>> thinking, as it parallels the immutable flag usage nicely.
>> Exactly.
>>>
>>> > In order to do write buffering efficiently, we should try to
>>> avoid doing a
>>> > read before we modify the page and also keep track of the dirty
>>> > ranges, but that seems like a hard thing to do from a module
>>> without vfs
>>> > changes. Aggregating writebacks for efficiency is also
>>> > an issue, but I suspect we could look at the ->writepages
>>> callback and see
>>> > if that will work for us.
>>>
>>> "Avoid doing a read before we modify the page?"  Wouldn't it go
>>> something like this?  I likely misunderstand again:
>>>
>>>     read:
>>>         if IS_CTOO || IS_IMMUTABLE
>>>             if in page cache
>>>                 return data, dirty or not
>>>             else
>>>                 do IO
>>>     write:
>>>         if IS_CTOO
>>>             if in page cache
>>>                 update data, mark dirty
>>>             else
>>>                 do IO, then update and mark dirty
>>>
>>> and just let pagecache writeback take care of its business as
>>> normal.
>>>
>>> Dirty ranges?  Nah, don't bother.  Just do full page granularity.
>> Oh ok.. I guess if we do whole page granularity we might be ok.. not
>> sure. think of the following scenario,
>> I am mostly worried about correctness issues with caching even in the
>> presence of non-overlapping data.
>> Initial file size is 0.
>> Client 1 on node 1 writes to file offsets 0 to 3000 and client 2 on
>> node 2 from 3001 to 6000. With a page size of 4096, they both share
>> file block 0.. What ends up on disk for file block 0 with write
>> buffering? Sadly, we cannot punt and say user error since the program
>> was operating with nonoverlapping offsets.
>> Perhaps, we could ask such users/programs from not using the write
>> buffering algorithm..
>> Will MPI-IO programs be able to prevent this kind of thing from
>> happening magically?
>>>
>>> And efficiency may just happen with the new writepages.  If it
>>> doesn't, I don't think we should go out of our way to make this
>>> usage model particularly fast.
>> sounds good to me.
>>>
>>> > BTW: Do not try the patch just yet. I have done zero testing on
>>> it and it
>>> > might crash your machine.
>>>
>>> Heh.  I'm too busy crashing it myself with SCSI patches.
>> Awesome. I am doing the same at work here :)
>>> P.S.  Your gmail produces ugly mail.  Can you turn off HTML, and
>>> maybe get it to do quoting right?
>> I turned off html. But i have no clue how to do the latter.
>> Thanks as always for the insightful reviews..
>> Murali
>> _______________________________________________
>> Pvfs2-developers mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>
> --
> Dr. Walter B. Ligon III
> Associate Professor
> ECE Department
> Clemson University
> _______________________________________________
> Pvfs2-developers mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers
>

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Re: [Pvfs2-developers] Review: experimental readcaching code

Reply via email to