Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-21 Thread David Chinner
On Sun, Oct 21, 2007 at 02:24:46PM +1000, Nick Piggin wrote:
> On Saturday 20 October 2007 07:27, Eric W. Biederman wrote:
> > Currently only 
> > metadata is more or less in sync with the contents of /dev/hda1.
> 
> It either is or it isn't, right? And it is, isn't it? (at least
> for the common filesystems).

It is not true for XFS - it's metadata is not in sync with /dev/
at all as all the cached metadata is kept in a different address space
to the raw block device.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-21 Thread Eric W. Biederman
Nick Piggin <[EMAIL PROTECTED]> writes:

> On Sunday 21 October 2007 14:53, Eric W. Biederman wrote:
>> Nick Piggin <[EMAIL PROTECTED]> writes:
>> > On Saturday 20 October 2007 07:27, Eric W. Biederman wrote:
>> >> Andrew Morton <[EMAIL PROTECTED]> writes:
>> >> > I don't think we little angels want to tread here.  There are so many
>> >> > weirdo things out there which will break if we bust the coherence
>> >> > between the fs and /dev/hda1.
>> >>
>> >> We broke coherence between the fs and /dev/hda1 when we introduced
>> >> the page cache years ago,
>> >
>> > Not for metadata. And I wouldn't expect many filesystem analysis
>> > tools to care about data.
>>
>> Well tools like dump certainly weren't happy when we made the change.
>
> Doesn't that give you any suspicion that other tools mightn't
> be happy if we make this change, then?

I read a representative sample of the relevant tools before replying
to Andrew.

>> >> and weird hacky cases like
>> >> unmap_underlying_metadata don't change that.
>> >
>> > unmap_underlying_metadata isn't about raw block device access at
>> > all, though (if you write to the filesystem via the blockdevice
>> > when it isn't expecting it, it's going to blow up regardless).
>>
>> Well my goal with separating things is so that we could decouple two
>> pieces of code that have different usage scenarios, and where
>> supporting both scenarios simultaneously appears to me to needlessly
>> complicate the code.
>>
>> Added to that we could then tune the two pieces of code for their
>> different users.
>
> I don't see too much complication from it. If we can actually
> simplify things or make useful tuning, maybe it will be worth
> doing.

That was my feeling that we could simplify things.  The block layer
page cache operations certainly.

I know in the filesystems that use the buffer cache like reiser and
JBD they could stop worrying about the buffers becoming mysteriously
dirty.  Beyond that I think there is a lot of opportunity I just
haven't looked much yet.

>> >> Currently only
>> >> metadata is more or less in sync with the contents of /dev/hda1.
>> >
>> > It either is or it isn't, right? And it is, isn't it? (at least
>> > for the common filesystems).
>>
>> ext2 doesn't store directories in the buffer cache.
>  
> Oh that's what you mean. OK, agreed there. But for the filesystems
> and types of metadata that can now expect to have coherency, doing
> this will break that expectation.
>
> Again, I have no opinions either way on whether we should do that
> in the long run. But doing it as a kneejerk response to braindead
> rd.c code is wrong because of what *might* go wrong and we don't
> know about.

The rd.c code is perfectly valid if someone wasn't forcing buffer
heads on it's pages.  It is a conflict of expectations.

Regardless I didn't do it as a kneejerk and I don't think that
patch should be merged at this time.  I proposed it because as I
see it that starts untangling the mess that is the buffer cache.
rd.c was just my entry point into understanding how all of those
pieces work.   I was doing my best to completely explore my options
and what the code was doing before settling on the fix for rd.c

>> Journaling filesystems and filesystems that do ordered writes
>> game the buffer cache.  Putting in data that should not yet
>> be written to disk.  That gaming is where reiserfs goes BUG
>> and where JBD moves the dirty bit to a different dirty bit.
>
> Filesystems really want better control of writeback, I think.
> This isn't really a consequence of the unified blockdev pagecache
> / metadata buffer cache, it is just that most of the important
> things they do are with metadata.

Yes.

> If they have their own metadata inode, then they'll need to game
> the cache for it, or the writeback code for that inode somehow
> too.

Yes.  Although they will at least get the guarantee that no one
else is dirtying their pages at strange times. 


>> So as far as I can tell what is in the buffer cache is not really
>> in sync with what should be on disk at any given movement except
>> when everything is clean.
>
> Naturally. It is a writeback cache.

Not that so much as the order in which things go into the cache
does not match the order the blocks go to disk.

>> My suspicion is that actually reading from disk is likely to
>> give a more coherent view of things.  Because there at least
>> we have the writes as they are expected to be seen by fsck
>> to recover the data, and a snapshot there should at least
>> be recoverable.  Whereas a snapshot provides not such guarantees.
>
> ext3 fsck I don't think is supposed to be run under a read/write
> filesystem, so it's going to explode if you do that regardless.

Yes.  I was thinking of dump or something like that here.  Where
we simply read out the data and try to make some coherent sense
of it.  If we see a version of the metadata that points to things
that have not been finished yet or is in the process of being
written to that could b

Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-20 Thread Nick Piggin
On Sunday 21 October 2007 14:53, Eric W. Biederman wrote:
> Nick Piggin <[EMAIL PROTECTED]> writes:
> > On Saturday 20 October 2007 07:27, Eric W. Biederman wrote:
> >> Andrew Morton <[EMAIL PROTECTED]> writes:
> >> > I don't think we little angels want to tread here.  There are so many
> >> > weirdo things out there which will break if we bust the coherence
> >> > between the fs and /dev/hda1.
> >>
> >> We broke coherence between the fs and /dev/hda1 when we introduced
> >> the page cache years ago,
> >
> > Not for metadata. And I wouldn't expect many filesystem analysis
> > tools to care about data.
>
> Well tools like dump certainly weren't happy when we made the change.

Doesn't that give you any suspicion that other tools mightn't
be happy if we make this change, then?


> >> and weird hacky cases like
> >> unmap_underlying_metadata don't change that.
> >
> > unmap_underlying_metadata isn't about raw block device access at
> > all, though (if you write to the filesystem via the blockdevice
> > when it isn't expecting it, it's going to blow up regardless).
>
> Well my goal with separating things is so that we could decouple two
> pieces of code that have different usage scenarios, and where
> supporting both scenarios simultaneously appears to me to needlessly
> complicate the code.
>
> Added to that we could then tune the two pieces of code for their
> different users.

I don't see too much complication from it. If we can actually
simplify things or make useful tuning, maybe it will be worth
doing.


> >> Currently only
> >> metadata is more or less in sync with the contents of /dev/hda1.
> >
> > It either is or it isn't, right? And it is, isn't it? (at least
> > for the common filesystems).
>
> ext2 doesn't store directories in the buffer cache.

Oh that's what you mean. OK, agreed there. But for the filesystems
and types of metadata that can now expect to have coherency, doing
this will break that expectation.

Again, I have no opinions either way on whether we should do that
in the long run. But doing it as a kneejerk response to braindead
rd.c code is wrong because of what *might* go wrong and we don't
know about.


> Journaling filesystems and filesystems that do ordered writes
> game the buffer cache.  Putting in data that should not yet
> be written to disk.  That gaming is where reiserfs goes BUG
> and where JBD moves the dirty bit to a different dirty bit.

Filesystems really want better control of writeback, I think.
This isn't really a consequence of the unified blockdev pagecache
/ metadata buffer cache, it is just that most of the important
things they do are with metadata.

If they have their own metadata inode, then they'll need to game
the cache for it, or the writeback code for that inode somehow
too.


> So as far as I can tell what is in the buffer cache is not really
> in sync with what should be on disk at any given movement except
> when everything is clean.

Naturally. It is a writeback cache.


> My suspicion is that actually reading from disk is likely to
> give a more coherent view of things.  Because there at least
> we have the writes as they are expected to be seen by fsck
> to recover the data, and a snapshot there should at least
> be recoverable.  Whereas a snapshot provides not such guarantees.

ext3 fsck I don't think is supposed to be run under a read/write
filesystem, so it's going to explode if you do that regardless.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-20 Thread Eric W. Biederman
Nick Piggin <[EMAIL PROTECTED]> writes:

> On Saturday 20 October 2007 07:27, Eric W. Biederman wrote:
>> Andrew Morton <[EMAIL PROTECTED]> writes:
>> > I don't think we little angels want to tread here.  There are so many
>> > weirdo things out there which will break if we bust the coherence between
>> > the fs and /dev/hda1.
>>
>> We broke coherence between the fs and /dev/hda1 when we introduced
>> the page cache years ago,
>
> Not for metadata. And I wouldn't expect many filesystem analysis
> tools to care about data.

Well tools like dump certainly weren't happy when we made the change.

>> and weird hacky cases like 
>> unmap_underlying_metadata don't change that.
>
> unmap_underlying_metadata isn't about raw block device access at
> all, though (if you write to the filesystem via the blockdevice
> when it isn't expecting it, it's going to blow up regardless).

Well my goal with separating things is so that we could decouple two
pieces of code that have different usage scenarios, and where
supporting both scenarios simultaneously appears to me to needlessly
complicate the code.

Added to that we could then tune the two pieces of code for their
different users.

>> Currently only 
>> metadata is more or less in sync with the contents of /dev/hda1.
>
> It either is or it isn't, right? And it is, isn't it? (at least
> for the common filesystems).

ext2 doesn't store directories in the buffer cache.

Journaling filesystems and filesystems that do ordered writes
game the buffer cache.  Putting in data that should not yet
be written to disk.  That gaming is where reiserfs goes BUG
and where JBD moves the dirty bit to a different dirty bit.

So as far as I can tell what is in the buffer cache is not really
in sync with what should be on disk at any given movement except
when everything is clean.

My suspicion is that actually reading from disk is likely to
give a more coherent view of things.  Because there at least
we have the writes as they are expected to be seen by fsck
to recover the data, and a snapshot there should at least
be recoverable.  Whereas a snapshot provides not such guarantees.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-20 Thread Nick Piggin
On Saturday 20 October 2007 07:27, Eric W. Biederman wrote:
> Andrew Morton <[EMAIL PROTECTED]> writes:
> > I don't think we little angels want to tread here.  There are so many
> > weirdo things out there which will break if we bust the coherence between
> > the fs and /dev/hda1.
>
> We broke coherence between the fs and /dev/hda1 when we introduced
> the page cache years ago,

Not for metadata. And I wouldn't expect many filesystem analysis
tools to care about data.


> and weird hacky cases like 
> unmap_underlying_metadata don't change that.

unmap_underlying_metadata isn't about raw block device access at
all, though (if you write to the filesystem via the blockdevice
when it isn't expecting it, it's going to blow up regardless).


> Currently only 
> metadata is more or less in sync with the contents of /dev/hda1.

It either is or it isn't, right? And it is, isn't it? (at least
for the common filesystems).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-19 Thread Eric W. Biederman
Nick Piggin <[EMAIL PROTECTED]> writes:

>
> [*] The ramdisk code is simply buggy, right? (and not the buffer
> cache)

>From the perspective of the ramdisk it expects the buffer cache to
simply be a user of the page cache, and thus the buffer cache
is horribly buggy.

>From the perspective of the buffer cache it still the block device
cache in the kernel and it the way it works are the rules for how
caching should be done in the kernel, and it doesn't give any
mind to this new fangled page cache thingy.

> The idea of your patch in theory is OK, but Andrew raises valid
> points about potential coherency problems, I think.

There are certainly implementation issues in various filesystems
to overcome before remounting read-write after doing a fsck
on a read-only filesystem will work as it does today.  So my patch
is incomplete.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-19 Thread Eric W. Biederman
Andrew Morton <[EMAIL PROTECTED]> writes:

>
> I don't think we little angels want to tread here.  There are so many
> weirdo things out there which will break if we bust the coherence between
> the fs and /dev/hda1.

We broke coherence between the fs and /dev/hda1 when we introduced
the page cache years ago, and weird hacky cases like
unmap_underlying_metadata don't change that.  Currently only
metadata is more or less in sync with the contents of /dev/hda1.

> Online resize, online fs checkers, various local
> tools which people have hacked up to look at metadata in a live fs,
> direct-io access to the underlying fs, heaven knows how many boot loader
> installers, etc.  Cerainly I couldn't enumerate  tham all.

Well I took a look at ext3.  For online resize all of the writes are
done by the fs not by the user space tool.  For e2fsck of a read-only
filesystem currently we do cache the buffers for the super block and
reexamine those blocks when we mount read-only.

Which makes my patch by itself unsafe.  If however ext3 and anyone
else who does things like that were to reread the data and not
to merely reexamine the data we should be fine.

Fundamentally doing anything like this requires some form of
synchronization, and if that synchronization does not exist
today there will be bugs.  Further decoupling things only makes that
requirement clearer.

Unfortunately because of things like the ext3 handling of remounting
from ro to rw this doesn't fall into the quick trivial fix category :(

> I don't actually see what the conceptual problem is with the existing
> implementation.  The buffer_head is a finer-grained view onto the
> blockdev's pagecache: it provides additional states and additional locking
> against a finer-grained section of the page.   It works well.

The buffer_head itself seems to be a reasonable entity.

The buffer cache is a monster.  It does not follow the ordinary rules
of the page cache, making it extremely hard to reason about.

Currently in the buffer cache there are buffer_heads we are not
allowed to make dirty which hold dirty data.  Some filesystems
panic the kernel when they notice this.  Others like ext3 use a
different bit to remember that the buffer is dirty.

Because of ordering considerations the buffer cache does not hold a
consistent view of what has been scheduled for being written to disk.
It instead holds partially complete pages.

The only place we should ever clear the dirty bit is just before
calling write_page but try_to_free_buffers clears the dirty bit!

We have buffers on pages without a mapping!

In general the buffer cache violates a primary rule for comprehensible
programming having.  The buffer cache does not have a clear enough
definition that it is clear what things are bugs and what things
are features.

99% of the weird strange behavior in rd.c is because of the buffer
cache not following the normal rules.

> Yeah, the highmem thing is a bit of a problem (but waning in importance). 
> But we can fix that by teaching individual filesystems about kmap and then
> tweak the blockdev's caching policy with mapping_set_gfp_mask() at mount
> time.  If anyone cares, which they don't.

This presumes I want to use a filesystem on my block device.  Where I
would care most is when I am doing things like fsck or mkfs on an
unmounted filesystem.  Where having buffer_heads is just extra memory
pressure slowing things down, and similarly for highmem.  We have
to sync the filesystem before mounting but we have to do that anyway
for all of the non metadata so that isn't new.

Anyway my main objective was to get a good grasp on the buffer cache
and the mm layer again.  Which I now more or less have.  While I think
the buffer cache needs a bunch of tender loving care before it becomes
sane I have other projects that I intend to complete before I try
anything in this area.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-17 Thread Nick Piggin
On Thursday 18 October 2007 13:59, Eric W. Biederman wrote:
> If filesystems care at all they want absolute control over the buffer
> cache.  Controlling which buffers are dirty and when.  Because we
> keep the buffer cache in the page cache for the block device we have
> not quite been giving filesystems that control leading to really weird
> bugs.

Mmm. Like I said, when a live filesystem is mounted on a bdev,
it isn't like you want userspace to go dancing around on it without
knowing exactly what it is doing.

The kernel more or less does the right thing here with respect to
the *state* of the data[*] (that is, buffer heads and pagecache).
It's when you actually start changing the data itself around is when
you'll blow up the filesystem.

[*] The ramdisk code is simply buggy, right? (and not the buffer
cache)

The idea of your patch in theory is OK, but Andrew raises valid
points about potential coherency problems, I think.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-17 Thread Andrew Morton
On Wed, 17 Oct 2007 21:59:02 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote:

> If filesystems care at all they want absolute control over the buffer
> cache.  Controlling which buffers are dirty and when.  Because we
> keep the buffer cache in the page cache for the block device we have
> not quite been giving filesystems that control leading to really weird
> bugs.
> 
> In addition this tieing of the implemetation of block device caching
> and the buffer cache has resulted in a much more complicated and
> limited implementation then necessary.  Block devices for example
> don't need buffer_heads, and it is perfectly reasonable to cache
> block devices in high memory.
> 
> To start untangling the worst of this mess this patch introduces a
> second block device inode for the buffer cache.  All buffer cache
> operations are diverted to that use the new bd_metadata_inode, which
> keeps the weirdness of the metadata requirements isolated in their
> own little world.

I don't think we little angels want to tread here.  There are so many
weirdo things out there which will break if we bust the coherence between
the fs and /dev/hda1.  Online resize, online fs checkers, various local
tools which people have hacked up to look at metadata in a live fs,
direct-io access to the underlying fs, heaven knows how many boot loader
installers, etc.  Cerainly I couldn't enumerate tham all.

The mere thought of all this scares the crap out of me.


I don't actually see what the conceptual problem is with the existing
implementation.  The buffer_head is a finer-grained view onto the
blockdev's pagecache: it provides additional states and additional locking
against a finer-grained section of the page.   It works well.

Yeah, the highmem thing is a bit of a problem (but waning in importance). 
But we can fix that by teaching individual filesystems about kmap and then
tweak the blockdev's caching policy with mapping_set_gfp_mask() at mount
time.  If anyone cares, which they don't.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH] block: Isolate the buffer cache in it's own mappings.

2007-10-17 Thread Eric W. Biederman

If filesystems care at all they want absolute control over the buffer
cache.  Controlling which buffers are dirty and when.  Because we
keep the buffer cache in the page cache for the block device we have
not quite been giving filesystems that control leading to really weird
bugs.

In addition this tieing of the implemetation of block device caching
and the buffer cache has resulted in a much more complicated and
limited implementation then necessary.  Block devices for example
don't need buffer_heads, and it is perfectly reasonable to cache
block devices in high memory.

To start untangling the worst of this mess this patch introduces a
second block device inode for the buffer cache.  All buffer cache
operations are diverted to that use the new bd_metadata_inode, which
keeps the weirdness of the metadata requirements isolated in their
own little world.

This should enable future cleanups to diverge and simplify the
address_space_operations of the buffer cache and block device
page cache.

As a side effect of this cleanup the current ramdisk code should
be safe from dropping pages because we never place any buffer heads
on ramdisk pages.

Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]>
---
 fs/block_dev.c |   45 -
 fs/buffer.c|   17 -
 fs/ext3/dir.c  |2 +-
 fs/ext4/dir.c  |2 +-
 fs/fat/inode.c |2 +-
 include/linux/fs.h |1 +
 6 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 379a446..87a5760 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -59,10 +59,12 @@ static sector_t max_block(struct block_device *bdev)
 /* Kill _all_ buffers and pagecache , dirty or not.. */
 static void kill_bdev(struct block_device *bdev)
 {
-   if (bdev->bd_inode->i_mapping->nrpages == 0)
-   return;
-   invalidate_bh_lrus();
-   truncate_inode_pages(bdev->bd_inode->i_mapping, 0);
+   if (bdev->bd_inode->i_mapping->nrpages) 
+   truncate_inode_pages(bdev->bd_inode->i_mapping, 0);
+   if (bdev->bd_metadata_inode->i_mapping->nrpages) {
+   truncate_inode_pages(bdev->bd_metadata_inode->i_mapping, 0);
+   invalidate_bh_lrus();
+   }
 }  
 
 int set_blocksize(struct block_device *bdev, int size)
@@ -80,6 +82,7 @@ int set_blocksize(struct block_device *bdev, int size)
sync_blockdev(bdev);
bdev->bd_block_size = size;
bdev->bd_inode->i_blkbits = blksize_bits(size);
+   bdev->bd_metadata_inode->i_blkbits = blksize_bits(size);
kill_bdev(bdev);
}
return 0;
@@ -114,7 +117,7 @@ static int
 blkdev_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create)
 {
-   if (iblock >= max_block(I_BDEV(inode))) {
+   if (iblock >= max_block(inode->i_bdev)) {
if (create)
return -EIO;
 
@@ -126,7 +129,7 @@ blkdev_get_block(struct inode *inode, sector_t iblock,
 */
return 0;
}
-   bh->b_bdev = I_BDEV(inode);
+   bh->b_bdev = inode->i_bdev;
bh->b_blocknr = iblock;
set_buffer_mapped(bh);
return 0;
@@ -136,7 +139,7 @@ static int
 blkdev_get_blocks(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create)
 {
-   sector_t end_block = max_block(I_BDEV(inode));
+   sector_t end_block = max_block(inode->i_bdev);
unsigned long max_blocks = bh->b_size >> inode->i_blkbits;
 
if ((iblock + max_blocks) > end_block) {
@@ -152,7 +155,7 @@ blkdev_get_blocks(struct inode *inode, sector_t iblock,
}
}
 
-   bh->b_bdev = I_BDEV(inode);
+   bh->b_bdev = inode->i_bdev;
bh->b_blocknr = iblock;
bh->b_size = max_blocks << inode->i_blkbits;
if (max_blocks)
@@ -167,7 +170,7 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct 
iovec *iov,
struct file *file = iocb->ki_filp;
struct inode *inode = file->f_mapping->host;
 
-   return blockdev_direct_IO_no_locking(rw, iocb, inode, I_BDEV(inode),
+   return blockdev_direct_IO_no_locking(rw, iocb, inode, inode->i_bdev,
iov, offset, nr_segs, blkdev_get_blocks, NULL);
 }
 
@@ -244,7 +247,7 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct 
iovec *iov,
 loff_t pos, unsigned long nr_segs)
 {
struct inode *inode = iocb->ki_filp->f_mapping->host;
-   unsigned blkbits = blksize_bits(bdev_hardsect_size(I_BDEV(inode)));
+   unsigned blkbits = blksize_bits(bdev_hardsect_size(inode->i_bdev);
unsigned blocksize_mask = (1 << blkbits) - 1;
unsigned long seg = 0;  /* iov segment iterator */
unsigned long nvec; /* number of bio vec needed */
@@ -292,7 +295,7 @@ blkdev_direct_IO(int rw, struct kiocb *iocb, const struct 
iovec *iov,