Bug#569598: disk caches?
On 2010-02-16 23:12, Christoph Hellwig wrote: > On Tue, Feb 16, 2010 at 03:42:25PM -0500, Philipp Weis wrote: > > How could it end up in the page cache for this setup? There is no > > filesystem layer below the loop device. Any write that reaches the > > loop device is passed through to lvm, then md and finally the disk. > > Are you saying that there is extra caching going on at the lvm or md > > layers? > > Yes, the block device node is a cached access mode, and uses the same > pagecache as a filesystem uses. Ugh, thanks for clearing that up. Is there a way to disable the page cache for lvm and md? Or is it just not possible to have this work reliably as long as there's no write barrier support in loop? From what I understand, 2.6.33 will have full barrier support for both md and lvm. Thanks, Philipp signature.asc Description: Digital signature
Bug#569598: disk caches?
On Tue, Feb 16, 2010 at 03:42:25PM -0500, Philipp Weis wrote: > How could it end up in the page cache for this setup? There is no > filesystem layer below the loop device. Any write that reaches the > loop device is passed through to lvm, then md and finally the disk. > Are you saying that there is extra caching going on at the lvm or md > layers? Yes, the block device node is a cached access mode, and uses the same pagecache as a filesystem uses. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100216221248.ga8...@lst.de
Bug#569598: disk caches?
Hi, On 2010-02-16 20:18, Christoph Hellwig wrote: > Hmm, it really does look like the typical write cache corruption. > > Looking a bit at the loop driver as suspsect I think I know what the > problem is: > > - loop writes data either using the address_space operations, or >using ->write, but it never calls does the O_SYNC processing >(just ->fsync in modern kernels, but a little different in 2.6.26) > > So when using loop data simply gets written into the page cache, but > there is no guarantee it ever goes out to disk. How could it end up in the page cache for this setup? There is no filesystem layer below the loop device. Any write that reaches the loop device is passed through to lvm, then md and finally the disk. Are you saying that there is extra caching going on at the lvm or md layers? > This bug is still present in latests upstream - a patch to introduce > barrier support calling ->fsync went in a while ago but caused > mysterious lockups and was reverted, with the patch author never > coming back to try to figure it out. That's really unfortunate. Philipp signature.asc Description: Digital signature
Bug#569598: disk caches?
On Tue, Feb 16, 2010 at 10:58:06AM -0500, Philipp Weis wrote: > On 2010-02-16 11:21, Christoph Hellwig wrote: > > Do you have disk write caches disabled on the machine? Neither md, nor > > dm, nor loop pass through barrier requests. Without disabling the > > volatile write cache on the disks you will lose data everytime the > > machine is not shut down cleanly. The messages you see are typical > > for that kind of corruption. > > The caches are all off. Hmm, it really does look like the typical write cache corruption. Looking a bit at the loop driver as suspsect I think I know what the problem is: - loop writes data either using the address_space operations, or using ->write, but it never calls does the O_SYNC processing (just ->fsync in modern kernels, but a little different in 2.6.26) So when using loop data simply gets written into the page cache, but there is no guarantee it ever goes out to disk. This bug is still present in latests upstream - a patch to introduce barrier support calling ->fsync went in a while ago but caused mysterious lockups and was reverted, with the patch author never coming back to try to figure it out. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100216191800.ga1...@lst.de
Bug#569598: disk caches?
On 2010-02-16 11:21, Christoph Hellwig wrote: > Do you have disk write caches disabled on the machine? Neither md, nor > dm, nor loop pass through barrier requests. Without disabling the > volatile write cache on the disks you will lose data everytime the > machine is not shut down cleanly. The messages you see are typical > for that kind of corruption. The caches are all off. | # hdparm -W /dev/hd[abeg] | | /dev/hda: | write-caching = 0 (off) | | /dev/hdb: | write-caching = 0 (off) | | /dev/hde: | write-caching = 0 (off) | | /dev/hdg: | write-caching = 0 (off) I learned about this and deactivated the caches about half a year back, and luckily didn't run into any problems before then. The machine has been rebooted since then several times without any issues, so I don't think this could be a left-over artifact from back then. Thanks, Philipp -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100216155806.ga11...@zaphod.pweis.com
Bug#569598: disk caches?
Do you have disk write caches disabled on the machine? Neither md, nor dm, nor loop pass through barrier requests. Without disabling the volatile write cache on the disks you will lose data everytime the machine is not shut down cleanly. The messages you see are typical for that kind of corruption. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20100216102114.ga8...@lst.de