Re: [Cluster-devel] [PATCH] mm/filemap: do not allocate cache pages beyond end of file at read

2019-11-25 Thread Linus Torvalds
On Mon, Nov 25, 2019 at 2:53 AM Steven Whitehouse  wrote:
>
> Linus, is that roughly what you were thinking of?

So the concept looks ok, but I don't really like the new flags as they
seem to be gfs2-specific rather than generic.

That said, I don't _gate_ them either, since they aren't in any
critical code sequence, and it's not like they do anything really odd.

I still think the whole gfs2 approach is broken. You're magically ok
with using stale data from the cache just because it's cached, even if
another client might have truncated the file or something.

So you're ok with saying "the file used to be X bytes in size, so
we'll just give you this data because we trust that the X is correct".

But you're not ok to say "oh, the file used to be X bytes in size, but
we don't want to give you a short read because it might not be correct
any more".

See the disconnect? You trust the size in one situation, but not in another one.

I also don't really see that you *need* the new flag at all. Since
you're doing to do a speculative read and then a real read anyway, and
since the only thing that you seem to care about is the file size
(because the *data* you will trust if it is cached), then why don't
you just use the *existing* generic read, and *IFF* you get a
truncated return value, then you go and double-check that the file
hasn't changed in size?

See what I'm saying? I think gfs2 is being very inconsistent in when
it trusts the file size, and I don't see that you even need the new
behavior that patch gives, because you might as well just use the
existing code (just move the i_size check earlier, and then teach gfs2
to double-check the "I didn't get as much as I expected" case).

 Linus




Re: [Cluster-devel] [PATCH] mm/filemap: do not allocate cache pages beyond end of file at read

2019-11-25 Thread Steven Whitehouse

Hi,

On 22/11/2019 23:59, Andreas Grünbacher wrote:

Hi,

Am Do., 31. Okt. 2019 um 12:43 Uhr schrieb Steven Whitehouse
:

Andreas, Bob, have I missed anything here?

I've looked into this a bit, and it seems that there's a reasonable
way to get rid of the lock taking in ->readpage and ->readpages
without a lot of code duplication. My proposal for that consists of
multiple patches, so I've posted it separately:

https://lore.kernel.org/linux-fsdevel/20191122235324.17245-1-agrue...@redhat.com/T/#t

Thanks,
Andreas


Andreas, thanks for taking a look at this.

Linus, is that roughly what you were thinking of?

Ronnie, Steve, can the same approach perhaps work for CIFS?

Steve.







Re: [Cluster-devel] [RFC PATCH 0/3] Rework the gfs2 read and page fault locking

2019-11-25 Thread Kirill A. Shutemov
On Sat, Nov 23, 2019 at 12:53:21AM +0100, Andreas Gruenbacher wrote:
> Hello,
> 
> this patch series moves the glock lock taking in gfs2 from the
> ->readpage and ->readpages inode operations to the ->read_iter file and
> ->fault vm operations.  To achieve that, we add flags to the
> generic_file_read_iter and filemap_fault generic helpers.
> 
> This proposal was triggered by the following discussion:
> 
> https://lore.kernel.org/linux-fsdevel/157225677483.3442.4227193290486305330.stgit@buzz/
> 
> In that thread, Linus argued that filesystems should make sure the inode
> size is sufficiently up-to-date before calling the generic helpers, and
> that filesystems can do it themselves if they want more than that.
> That's surely doable.  However, implementing those operations properly
> at the filesystem level quickly becomes complicated when it gets to
> things like readahead.  In addition, those slightly modified copies of
> those helpers would surely diverge from their originals over time, and
> maintaining them properly would become hard.  So I hope the relatively
> small changes to make the original helpers slightly more flexible will
> be acceptable instead.
> 
> With the IOCB_CACHED flag added by one of the patches in this series,
> the code that Konstantin's initial patch adds to
> generic_file_buffered_read could be made conditional on the IOCB_CACHED
> flag being cleared.  That way, it won't misfire on filesystems that
> allow a stale inode size.  (I'm not sure if any filesystems other than
> gfs2 are actually affected.)
> 
> Some additional explanation:
> 
> The cache consistency model of filesystems like gfs2 is such that if
> pages are found in an inode's address space, those pages as well as the
> inode size are up to date and can be used without taking any filesystem
> locks.  If a page is not cached, filesystem locks must be taken before
> the page can be read; this will also bring the inode size up to date.
> 
> Thus far, gfs2 has taken the filesystem locks inside the ->readpage and
> ->readpages address space operations.  A better approach seems to be to
> take those locks earlier, in the ->read_iter file and ->fault vm
> operations.  This would also avoid a lock inversion in ->readpages.
> 
> We obviously want to avoid taking the filesystem locks unnecessarily
> when the pages we are looking for are already cached; otherwise, we
> would cripple performance.  So we need to check if those pages are
> present first.  That's actually exactly what the generic_file_read_iter
> and filemap_fault helpers do already anyway, except that they will call
> into ->readpage and ->readpages when they find pages missing.  Instead
> of that, we'd like those helpers to return with an error code that
> allows us to retry the operation after taking the filesystem locks.

Do you see IOCB_CACHED/FAULT_FLAG_CACHED semantics being usable for
anyting beyond gfs2?

-- 
 Kirill A. Shutemov




Re: [Cluster-devel] [RFC PATCH 3/3] gfs2: Rework read and page fault locking

2019-11-25 Thread Kirill A. Shutemov
On Sat, Nov 23, 2019 at 12:53:24AM +0100, Andreas Gruenbacher wrote:
> @@ -778,15 +804,51 @@ static ssize_t gfs2_file_direct_write(struct kiocb 
> *iocb, struct iov_iter *from)
>  
>  static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>  {
> + struct gfs2_inode *ip;
> + struct gfs2_holder gh;
> + size_t written = 0;

'written' in a read routine?

-- 
 Kirill A. Shutemov