Re: [PATCH V3 17/18] Btrfs: Full direct I/O and AIO read implementation.

2010-03-25 Thread Andi Kleen
On Wed, Mar 24, 2010 at 11:08:07PM -0400, jim owens wrote:
 Andi Kleen wrote:
   On Tue, Mar 23, 2010 at 05:40:00PM -0400, jim owens wrote:
   Andi Kleen wrote:
  
   With checksums enabled, uncompressed reads aligned on the 4k block
   are classic direct IO to user memory except at EOF.
   
   Hmm, but what happens if the user modifies the memory in parallel?
   Would spurious checksum failures be reported then?
 
 It does put a warning in the log but it does not fail the read
 because I circumvent that by doing the failed-checksum-retry as
 a buffered read and retest.  The checksum passes and we copy
 the data to the user memory (where they can then trash it again).

Ok. That will work I guess.


 I was going to put a comment about that but felt my comment
 density was already over the btrfs style guide limit.  :) 

Hehe.

 
   Same for writing I guess (data end up on disk with wrong checksum)?
 
 Well we don't have any code done yet for writing and that was
 just one interesting challenge that needed to be solved.
 
   Those both would seem like serious flaws to me.
 
 Agree, so the write design needs to prevent bad checksums.

How? Do you have a plan for that?

 
 Read is already correct and if people do not want a log warning
 that the application is misbehaving that can be eliminated.

I guess if it's strictly rate limitted it might be ok.
-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 17/18] Btrfs: Full direct I/O and AIO read implementation.

2010-03-25 Thread jim owens
Andi Kleen wrote:
 On Wed, Mar 24, 2010 at 11:08:07PM -0400, jim owens wrote:
 Agree, so the write design needs to prevent bad checksums.
 
 How? Do you have a plan for that?

Yes... have Josef do it. ;)

The options I considered are:
1 - buffer always for uncompressed, the same as compressed.
2 - checksum before bio_add_page and again before page_cache_release
and fail or do buffering if checksum mismatch.
3 - write protect pages and block them in fault handler.

#3 is done in other operating systems but is painful, so I would
choose #2 because I think we can do more I/O that way at almost
the same cpu cost as #1 and we don't double the memory use.

jim
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 17/18] Btrfs: Full direct I/O and AIO read implementation.

2010-03-24 Thread jim owens
sorry, this time with cc to list

Andi Kleen wrote:
  On Tue, Mar 23, 2010 at 05:40:00PM -0400, jim owens wrote:
  Andi Kleen wrote:
 
  With checksums enabled, uncompressed reads aligned on the 4k block
  are classic direct IO to user memory except at EOF.
  
  Hmm, but what happens if the user modifies the memory in parallel?
  Would spurious checksum failures be reported then?

It does put a warning in the log but it does not fail the read
because I circumvent that by doing the failed-checksum-retry as
a buffered read and retest.  The checksum passes and we copy
the data to the user memory (where they can then trash it again).
I was going to put a comment about that but felt my comment
density was already over the btrfs style guide limit.  :) 

  Same for writing I guess (data end up on disk with wrong checksum)?

Well we don't have any code done yet for writing and that was
just one interesting challenge that needed to be solved.

  Those both would seem like serious flaws to me.

Agree, so the write design needs to prevent bad checksums.

Read is already correct and if people do not want a log warning
that the application is misbehaving that can be eliminated.

jim


--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 17/18] Btrfs: Full direct I/O and AIO read implementation.

2010-03-23 Thread jim owens
Andi Kleen wrote:

 One thing that stroke me while reading is that, except for the out of line no 
 data
 checksum case, this isn't real classical zero copy direct IO because
 you always have to copy through some buffer.

Uh no, unless I really messed up or don't understand what you mean.

Uncompressed data with no checksums only buffers on an error or EOF. 

With checksums enabled, uncompressed reads aligned on the 4k block
are classic direct IO to user memory except at EOF.

With checksums, unaligned reads still go direct to user memory, I
just have to read the extra head and tail to kernel buffers to
make the start and end 4k aligned.  This is efficient for large
reads but maybe not so efficient for small ones.

The special no-checksum EOF buffering is only for consistency, we
could choose to read the whole disk block like classic direct IO.

With checksums, unaligned reads  4K always have some buffered part
for the (4k - user_length) so that may be what you mean.

 It's more like uncached IO
 
 I was wondering that at least for those cases wouldn't it be simpler
 to use the normal page cache IO path and use new hints that disable
 prefetch/write-behind/caching in the page cache after the IO operation?

Maybe.

 Is there any particular reason this wasn't done? Was it because
 of aio?
 
 I know the page cache currently doesn't support that today, but
 presumably it wouldn't be too hard to add.

The only reason I did not do something like that is:
1) I did not want to disturb the page cache with throw-away pages.
2) uncached IO makes it even less like classic direct IO.
3) Writing that page cache code might not be simpler.

As further argument against uncached IO, Chris sent a very simple
patch up to read into page cache then purge it for btrfs direct IO
reads and it was NACKed.
 
 I guess the code would be much simpler if it only did the no checksum
 case.

yes, yes, yes :)

jim
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 17/18] Btrfs: Full direct I/O and AIO read implementation.

2010-03-23 Thread Andi Kleen
On Tue, Mar 23, 2010 at 05:40:00PM -0400, jim owens wrote:
 Andi Kleen wrote:
 
  One thing that stroke me while reading is that, except for the out of line 
  no data
  checksum case, this isn't real classical zero copy direct IO because
  you always have to copy through some buffer.
 
 Uh no, unless I really messed up or don't understand what you mean.

No I misread the code: you don't set temp_pages in that path. 

 
 Uncompressed data with no checksums only buffers on an error or EOF. 
 
 With checksums enabled, uncompressed reads aligned on the 4k block
 are classic direct IO to user memory except at EOF.

Hmm, but what happens if the user modifies the memory in parallel?
Would spurious checksum failures be reported then?

Same for writing I guess (data end up on disk with wrong checksum)?

Those both would seem like serious flaws to me.

  Is there any particular reason this wasn't done? Was it because
  of aio?
  
  I know the page cache currently doesn't support that today, but
  presumably it wouldn't be too hard to add.
 
 The only reason I did not do something like that is:

Ok.

 1) I did not want to disturb the page cache with throw-away pages.
 2) uncached IO makes it even less like classic direct IO.
 3) Writing that page cache code might not be simpler.
4) aio support (although It would be cool if someone finally did
proper aio page cache code)

 As further argument against uncached IO, Chris sent a very simple
 patch up to read into page cache then purge it for btrfs direct IO
 reads and it was NACKed.

I see.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 17/18] Btrfs: Full direct I/O and AIO read implementation.

2010-03-22 Thread Andi Kleen
jim owens owens6...@gmail.com writes:

Hi Jim,

I read through large chunks of that patch. I don't claim to fully
understand all the btrfs infrastructure details enough, so it wasn't
really serious code review.

One thing that stroke me while reading is that, except for the out of line no 
data
checksum case, this isn't real classical zero copy direct IO because
you always have to copy through some buffer.

It's more like uncached IO

I was wondering that at least for those cases wouldn't it be simpler
to use the normal page cache IO path and use new hints that disable
prefetch/write-behind/caching in the page cache after the IO operation?

Is there any particular reason this wasn't done? Was it because
of aio?

I know the page cache currently doesn't support that today, but
presumably it wouldn't be too hard to add.

I guess the code would be much simpler if it only did the no checksum
case.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html