Re: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Nitin Gupta
On 06/22/2010 04:48 AM, Dan Magenheimer wrote:
 [PATCH V3 0/8] Cleancache: overview
 
snip
 
  Documentation/ABI/testing/sysfs-kernel-mm-cleancache |   11 +
  Documentation/vm/cleancache.txt  |  194 
 +++
  fs/btrfs/extent_io.c |9 
  fs/btrfs/super.c |2 
  fs/buffer.c  |5 
  fs/ext3/super.c  |2 
  fs/ext4/super.c  |2 
  fs/mpage.c   |7 
  fs/ocfs2/super.c |3 
  fs/super.c   |7 
  include/linux/cleancache.h   |   88 
  include/linux/fs.h   |5 
  mm/Kconfig   |   22 ++
  mm/Makefile  |1 
  mm/cleancache.c  |  169 
  mm/filemap.c |   11 +
  mm/truncate.c|   10 
  17 files changed, 548 insertions(+)
 
 (following is a copy of Documentation/vm/cleancache.txt)
 
 MOTIVATION
 
 Cleancache can be thought of as a page-granularity victim cache for clean
 pages that the kernel's pageframe replacement algorithm (PFRA) would like
 to keep around, but can't since there isn't enough memory.  So when the
 PFRA evicts a page, it first attempts to put it into a synchronous
 concurrency-safe page-oriented pseudo-RAM device (such as Xen's Transcendent
 Memory, aka tmem, or in-kernel compressed memory, aka zmem, or other
 RAM-like devices) which is not directly accessible or addressable by the
 kernel and is of unknown and possibly time-varying size.  And when a
 cleancache-enabled filesystem wishes to access a page in a file on disk,
 it first checks cleancache to see if it already contains it; if it does,
 the page is copied into the kernel and a disk access is avoided.
 


Since zcache is now one of its use cases, I think the major objection that
remains against cleancache is its intrusiveness -- in particular, need to
change individual filesystems (even though one liners). Changes below should
help avoid these per-fs changes and make it more self contained. I haven't
tested these changes myself, so there might be missed cases or other mysterious
problems:

1. Cleancache requires filesystem specific changes primarily to make a call to
cleancache init and store (per-fs instance) pool_id. I think we can get rid of
these by directly passing 'struct super_block' pointer which is also
sufficient to identify FS instance a page belongs to. This should then be used
as a 'handle' by cleancache_ops provider to find corresponding memory pool or
create a new pool when a new handle is encountered.

This leaves out case of ocfs2 for which cleancache needs 'uuid' to decide if a
shared pool should be created. IMHO, this case (and cleancache.init_shared_fs)
should be removed from cleancache_ops since it is applicable only for Xen's
cleancache_ops provider.

2. I think change in btrfs can be avoided by moving cleancache_get_page()
from do_mpage_reapage() to filemap_fault() and this should work for all
filesystems. See:

handle_pte_fault() - do_(non)linear_fault() - __do_fault()
- vma-vm_ops-fault()

which is defined as filemap_fault() for all filesystems. If some future
filesystem uses its own custom function (why?) then it will have to arrange for
call to cleancache_get_page(), if it wants this feature.

With above changes, cleancache will be fairly self-contained:
 - cleancache_put_page() when page is removed from page-cache
 - cleacacache_get_page() when PF occurs (and after page-cache is searched)
 - cleancache_flush_*() on truncate_*()

Thanks,
Nitin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Minchan Kim
On Fri, Jul 23, 2010 at 4:36 PM, Nitin Gupta ngu...@vflare.org wrote:

 2. I think change in btrfs can be avoided by moving cleancache_get_page()
 from do_mpage_reapage() to filemap_fault() and this should work for all
 filesystems. See:

 handle_pte_fault() - do_(non)linear_fault() - __do_fault()
                                                - vma-vm_ops-fault()

 which is defined as filemap_fault() for all filesystems. If some future
 filesystem uses its own custom function (why?) then it will have to arrange 
 for
 call to cleancache_get_page(), if it wants this feature.


filemap fault works only in case of file-backed page which is mapped
but don't work not-mapped cache page.  So we could miss cache page by
read system call if we move it into filemap_fault.


-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Minchan Kim
On Fri, Jul 23, 2010 at 4:36 PM, Nitin Gupta ngu...@vflare.org wrote:

 2. I think change in btrfs can be avoided by moving cleancache_get_page()
 from do_mpage_reapage() to filemap_fault() and this should work for all
 filesystems. See:

 handle_pte_fault() - do_(non)linear_fault() - __do_fault()
                                                - vma-vm_ops-fault()

 which is defined as filemap_fault() for all filesystems. If some future
 filesystem uses its own custom function (why?) then it will have to arrange 
 for
 call to cleancache_get_page(), if it wants this feature.


filemap fault works only in case of file-backed page which is mapped
but don't work not-mapped cache page.  So we could miss cache page by
read system call if we move it into filemap_fault.


-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:1353

2010-07-23 Thread Daniel J Blueman
On 22 July 2010 19:07, Johannes Hirte johannes.hi...@fem.tu-ilmenau.de wrote:
 Am Montag 19 Juli 2010, 10:01:46 schrieb Miao Xie:
 On Thu, 15 Jul 2010 20:14:51 +0200, Johannes Hirte wrote:
  Am Donnerstag 15 Juli 2010, 02:11:04 schrieb Dave Chinner:
  On Wed, Jul 14, 2010 at 05:25:23PM +0200, Johannes Hirte wrote:
  Am Donnerstag 08 Juli 2010, 16:31:09 schrieb Chris Mason:
  I'm not sure if btrfs is to blame for this error. After the errors I
  switched to XFS on this system and got now this error:
 
  ls -l .kde4/share/apps/akregator/data/
  ls: cannot access .kde4/share/apps/akregator/data/feeds.opml: Structure
  needs cleaning
  total 4
  ?? ? ?    ?        ?            ? feeds.opml
 
  What is the error reported in dmesg when the XFS filesytem shuts down?
 
  Nothing. I double checked the logs. There are only the messages when
  mounting the filesystem. No other errors are reported than the
  inaccessible file and the output from xfs_check.

 Is there anything wrong with your disks or memory?
 Sometimes the bad memory can break the filesystem. I have met this kind of
 problem some time ago.

 I don't think that's the case. I've checked the RAM with memtest86+ and got no
 errors. I got the errors with two different disks, the first one with btrfs 
 the
 second one now with XFS. Before changing to the second disk, I've run
 badblocks on it to be sure it has no errors.

There are some known-buggy chipsets also. One still around is the
Nvidia CK804/MCP55, under certain patterns of spatially-local pending
reads and writes to the memory controller, a 64-byte request would
occasionally be returned with the wrong offset. I was hitting it with
some 27-Gbit adapters and managed to capture it on a PCI-e protocol
analyser. Rsync between network and local disk would hit sometimes
too.
-- 
Daniel J Blueman
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kernel BUG at fs/btrfs/extent-tree.c:1353

2010-07-23 Thread Bob Copeland
On Thu, Jul 22, 2010 at 2:07 PM, Johannes Hirte
johannes.hi...@fem.tu-ilmenau.de wrote:
 Is there anything wrong with your disks or memory?
 Sometimes the bad memory can break the filesystem. I have met this kind of
 problem some time ago.

 I don't think that's the case. I've checked the RAM with memtest86+ and got no
 errors. I got the errors with two different disks, the first one with btrfs 
 the
 second one now with XFS. Before changing to the second disk, I've run
 badblocks on it to be sure it has no errors.

You might also try kmemcheck.  There's a good chance that a bug
that scribbles random memory shows up as FS corruption.  I have had
ext4 corruption due to an inotify bug, which kmemcheck found on the
first try.

-- 
Bob Copeland %% www.bobcopeland.com
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Christoph Hellwig
On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
 CHRISTOPH AND ANDREW, if you disagree and your concerns have
 not been resolved, please speak up.

Anything that need modification of a normal non-shared fs is utterly
broken and you'll get a clear NAK, so the propsal before is a good
one.  There's a couple more issues like the still weird prototypes,
e.g. and i_ino might not be enoug to uniquely identify an inode
on serveral filesystems that use 64-bit inode inode numbers on 32-bit
systems.  Also making the ops vector global is just a bad idea.
There is nothing making this sort of caching inherently global.

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Dan Magenheimer
 From: Christoph Hellwig [mailto:h...@infradead.org]
 Subject: Re: [PATCH V3 0/8] Cleancache: overview
 
 On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
  CHRISTOPH AND ANDREW, if you disagree and your concerns have
  not been resolved, please speak up.

Hi Christoph --

Thanks very much for the quick (instantaneous?) reply!

 Anything that need modification of a normal non-shared fs is utterly
 broken and you'll get a clear NAK, so the propsal before is a good
 one.

Unless/until all filesystems are 100% built on top of VFS,
I have to disagree.  Abstractions (e.g. VFS) are never perfect.
And the relevant filesystem maintainers have acked, so I'm
wondering who you are NAK'ing for?

Nitin's proposal attempts to move the VFS hooks around to fix
usage for one fs (btrfs) that, for whatever reason, has
chosen to not layer itself completely on top of VFS; this
sounds to me like a recipe for disaster.
I think Minchan's reply quickly pointed out one issue...
what other filesystems that haven't been changed might
encounter a rare data corruption issue because cleancache is
transparently enabled for its page cache pages?

It also drops requires support to be dropped entirely for
another fs (ocfs2) which one user (zcache) can't use, but
the other (tmem) makes very good use of.

No, the per-fs opt-in is very sensible; and its design is
very minimal.

Could you please explain your objection further?

 There's a couple more issues like the still weird prototypes,
 e.g. and i_ino might not be enoug to uniquely identify an inode
 on serveral filesystems that use 64-bit inode inode numbers on 32-bit
 systems.

This reinforces my per-fs opt-in point.  Such filesystems
should not enable cleancache (or enable them only on the
appropriate systems).

 Also making the ops vector global is just a bad idea.
 There is nothing making this sort of caching inherently global.

I'm not sure I understand your point, but two very different
users of cleancache have been provided, and more will be
discussed at the MM summit next month.

Do you have a suggestion on how to avoid a global ops
vector while still serving the needs of both existing
users?

Thanks,
Dan

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Dan Magenheimer
 From: Nitin Gupta [mailto:ngu...@vflare.org]
 Sent: Friday, July 23, 2010 9:05 AM
 To: Dan Magenheimer
 Cc: Christoph Hellwig; a...@linux-foundation.org; Chris Mason;
 v...@zeniv.linux.org.uk; adil...@sun.com; ty...@mit.edu;
 mfas...@suse.com; Joel Becker; matt...@wil.cx; linux-
 bt...@vger.kernel.org; linux-ker...@vger.kernel.org; linux-
 fsde...@vger.kernel.org; linux-e...@vger.kernel.org; ocfs2-
 de...@oss.oracle.com; linux...@kvack.org; jer...@goop.org;
 jbeul...@novell.com; Kurt Hackel; npig...@suse.de; Dave Mccracken;
 r...@redhat.com; a...@redhat.com; Konrad Wilk
 Subject: Re: [PATCH V3 0/8] Cleancache: overview
 
 On 07/23/2010 08:14 PM, Dan Magenheimer wrote:
  From: Christoph Hellwig [mailto:h...@infradead.org]
 
 
  Also making the ops vector global is just a bad idea.
  There is nothing making this sort of caching inherently global.
 
  I'm not sure I understand your point, but two very different
  users of cleancache have been provided, and more will be
  discussed at the MM summit next month.
 
  Do you have a suggestion on how to avoid a global ops
  vector while still serving the needs of both existing
  users?
 
 Maybe introduce cleancache_register(struct cleancache_ops *ops)?
 This will allow making cleancache_ops non-global. No value add
 but maybe that's cleaner?

Oh, OK, that seems reasonable.

Dan
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Nitin Gupta
On 07/23/2010 11:07 PM, Dan Magenheimer wrote:
 From: Dan Magenheimer
 Subject: RE: [PATCH V3 0/8] Cleancache: overview

 From: Christoph Hellwig [mailto:h...@infradead.org]
 Subject: Re: [PATCH V3 0/8] Cleancache: overview

 On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
 CHRISTOPH AND ANDREW, if you disagree and your concerns have
 not been resolved, please speak up.

 Hi Christoph --

 Thanks very much for the quick (instantaneous?) reply!

 Anything that need modification of a normal non-shared fs is utterly
 broken and you'll get a clear NAK, so the propsal before is a good
 one.

 Unless/until all filesystems are 100% built on top of VFS,
 I have to disagree.  Abstractions (e.g. VFS) are never perfect.
 
 After thinking about this some more, I can see a way
 to enforce opt-in in the cleancache backend without
 any changes to non-generic fs code.   I think it's a horrible
 hack and we can try it, but I expect fs maintainers
 would prefer the explicit one-line-patch opt-in.
 
 1) Cleancache backend maintains a list of known working
filesystems (those that have been tested).

Checks against known working list indeed looks horrible.
Isn't there any way to identify pagecache - disk I/O boundaries
which every filesystem obeys? I'm not yet sure but if this is
doable, then we won't require such hacks.

 
 2) Nitin's proposed changes pass the *sb as a parameter.
   The string name of the filesystem type is available via
   sb-s_type-name.  This can be compared against
   the known working list.


sb-s_magic could also be used, or better if we can somehow
get rid of these checks  :)

 Using the sb pointer as a handle requires an extra
 table search on every cleancache get/put/flush,
 and fs/super.c changes are required for fs unmount
 notification anyway (e.g. to call cleancache_flush_fs)
 so I'd prefer to keep the cleancache_poolid addition
 to the sb.  I'll assume this is OK since this is in generic
 fs code.
 


I will also try making changes to cleancache so it does not
touch any fs specific code. Though IMHO one liners to fs-code
should really be acceptable but unfortunately this doesn't seem
to be the case.  Maybe generic cleancache will have better
chances.

Thanks,
Nitin
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


btrfsctl returns bad exit status

2010-07-23 Thread K. Richard Pixley
Using btrfsctl from ubuntu maverick, (built and running on 
ubuntu-10.04), I get:


r...@eisenhower btrfsctl -s snap1 /home || echo failed
operation complete
Btrfs Btrfs v0.19
failed
r...@eisenhower ls -las snap1
total 4
4 drwxr-xr-x 1 rootroot32 2010-07-20 18:25 .
0 drwxr-xr-x 1 richrich  1324 2010-07-21 12:47 ..
0 drwxr-xr-x 1 checker build   22 2010-07-20 18:21 checker
0 drwxr-xr-x 1 richrich  1322 2010-07-21 12:44 rich
0 drwxr-xr-x 1 rootroot36 2010-07-21 10:47 za-cb

This would seem to indicate a non-zero exit status despite the fact that 
the snapshotting operation appears to have succeeded.  This makes it 
impossible to programmatically check whether a snapshot was created 
successfully or not as I will need to explicitly discard the exit status 
from btrfsctl.


I get the same results when built from git, except that it identifies 
itself as v0.19-16-g075587c.


I would expect that btrfsctl would return zero exit status when it was 
capable of doing what it was asked and non-zero exit status only when it 
could not.


--rich
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: btrfsctl returns bad exit status

2010-07-23 Thread C Anthony Risinger
Try using btrfs tool; btrfsctl is/will be deprecated I think...  and
it's better anyway.

C Anthony [mobile]

On Jul 23, 2010, at 3:04 PM, K. Richard Pixley r...@noir.com wrote:

 Using btrfsctl from ubuntu maverick, (built and running on
 ubuntu-10.04), I get:

 r...@eisenhower btrfsctl -s snap1 /home || echo failed
 operation complete
 Btrfs Btrfs v0.19
 failed
 r...@eisenhower ls -las snap1
 total 4
 4 drwxr-xr-x 1 rootroot32 2010-07-20 18:25 .
 0 drwxr-xr-x 1 richrich  1324 2010-07-21 12:47 ..
 0 drwxr-xr-x 1 checker build   22 2010-07-20 18:21 checker
 0 drwxr-xr-x 1 richrich  1322 2010-07-21 12:44 rich
 0 drwxr-xr-x 1 rootroot36 2010-07-21 10:47 za-cb

 This would seem to indicate a non-zero exit status despite the fact
 that the snapshotting operation appears to have succeeded.  This
 makes it impossible to programmatically check whether a snapshot was
 created successfully or not as I will need to explicitly discard the
 exit status from btrfsctl.

 I get the same results when built from git, except that it
 identifies itself as v0.19-16-g075587c.

 I would expect that btrfsctl would return zero exit status when it
 was capable of doing what it was asked and non-zero exit status only
 when it could not.

 --rich
 --
 To unsubscribe from this list: send the line unsubscribe linux-
 btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH V3 0/8] Cleancache: overview

2010-07-23 Thread Dan Magenheimer
  On Fri, Jul 23, 2010 at 06:58:03AM -0700, Dan Magenheimer wrote:
   CHRISTOPH AND ANDREW, if you disagree and your concerns have
   not been resolved, please speak up.
 
 Hi Christoph --
 
 Thanks very much for the quick (instantaneous?) reply!
 
  Anything that need modification of a normal non-shared fs is utterly
  broken and you'll get a clear NAK, so the propsal before is a good
  one.
 
 No, the per-fs opt-in is very sensible; and its design is
 very minimal.

Not to belabor the point, but maybe the right way to think about
this is:

Cleancache is a new optional feature provided by the VFS layer
that potentially dramatically increases page cache effectiveness
for many workloads in many environments at a negligible cost.

Filesystems that are well-behaved and conform to certain restrictions
can utilize cleancache simply by making a call to cleancache_init_fs
at mount time.  Unusual, misbehaving, or poorly layered filesystems
must either add additional hooks and/or undergo extensive additional
testing... or should just not enable the optional cleancache.
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html