On Tue 24-11-20 11:28:14, Borislav Petkov wrote:
> On Tue, Nov 24, 2020 at 11:20:33AM +0100, Jan Kara wrote:
> > On Tue 24-11-20 09:45:07, Borislav Petkov wrote:
> > > On Mon, Nov 23, 2020 at 11:46:51PM +0100, Paweł Jasiak wrote:
> > > > On 23/11/20, Jan Kara wr
points.
>
> Acked-by: Tejun Heo
>
> Andrew, can you please route this one?
I'll queue it to my tree and push it to Linus on Friday since I sometimes
handle writeback stuff myself anyway...
Honza
--
Jan Kara
SUSE Labs, CR
free page
reallocate page for something else
we can even dirty & start to
writeback 'page'
wake_up_page(page)
and we have a "spurious" wake up on 'page'.
Honza
--
Jan Kara
SUSE Labs, CR
On Tue 24-11-20 09:45:07, Borislav Petkov wrote:
> On Mon, Nov 23, 2020 at 11:46:51PM +0100, Paweł Jasiak wrote:
> > On 23/11/20, Jan Kara wrote:
> > > OK, with a help of Boris Petkov I think I have a fix that looks correct
> > > (attach). Can you please try wheth
const char __user *, pathname)
> {
> return do_fanotify_mark(fanotify_fd, flags, mask, dfd, pathname);
> }
> +#endif
>
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE6(fanotify_mark,
>
>
> --
>
> Paweł Jasiak
--
Jan Kara
SUSE Labs, CR
>From f
ext2_free_branches(inode, , +1, 3);
> }
> + break;
> case EXT2_TIND_BLOCK:
> ;
> }
> --
> 2.27.0
>
--
Jan Kara
SUSE Labs, CR
pinning
users might be indeed rare and only those would show regressions in THP
pinning performance...
Honza
--
Jan Kara
SUSE Labs, CR
| ^~~
>
>
> Caused by commit
>
> 32559cea1f55 ("fs/ext2: Use ext2_put_page")
>
> Presumably some missing includes :-(
>
> I have used the ext3 tree from next-20201112 for today.
Yeah, sorry for that. Should be fixed now.
Honza
--
Jan Kara
SUSE Labs, CR
> inode_dec_link_count(old_dir);
> }
> return 0;
>
>
> out_dir:
> - if (dir_de) {
> - kunmap(dir_page);
> - put_page(dir_page);
> - }
> + if (dir_de)
> + ext2_put_page(dir_page);
> out_old:
> - kunmap(old_page);
> - put_page(old_page);
> + ext2_put_page(old_page);
> out:
> return err;
> }
> --
> 2.28.0.rc0.12.gb6a658bd00c9
>
--
Jan Kara
SUSE Labs, CR
for the kernel
> to make the information about DAX availability accessible somewhere.
> >
> > And all this comes about because DAX is a property of the block
> > device, not the filesystem. Hence the only time a DAX capable
> > filesystem on a block device that is DAX capable will not be DAX
> > capable is if the dax=never is set...
> See, it is not property of the block device. It is property of the mount
> point. The availability on the device is one requirement but the
> filesystem options affect availability to the user in the end.
No, it is not really a property of the mountpoint either. If anything it is
a property of the inode. Two different inodes on the very same filesystem,
one may support DAX the other will not (think for example of XFS real-time
volumes, or simply inodes with / without S_DAX flag set). And we are back
at what Dave tries to get accross. As inconvenient as it is
statx(STATX_ATTR_DAX) is the only way to tell.
> > Of course, this is just encoding how existing filesystems behave -
> > it's not a requirement for future filesytsems so they may use other
> > mechanisms for enabling/disabling DAX. Which leaves you with the
> > only reliable mechanism of creating filesystem and checking
> > statx(STATX_ATTR_DAX)
> Or the kernel could just tell the user. But right, information is power,
> and keeping the user in the dark is much more entertaining.
I think it would be more productive if you actually answered Ted's
question: Exactly which application got broken by the change? I know for a
fact that one large DB vendor was parsing mount options in /proc/mounts to
determine whether their DB can use DAX or not (and this was already a
"cleaned up" method because before this they were parsing VMA flags in
/proc//smaps which is even worse). But in this case they also seemed
OK to switch to statx() once it is available...
Honza
--
Jan Kara
SUSE Labs, CR
e, new_dir, 0);
> - else {
> - kunmap(dir_page);
> - put_page(dir_page);
> - }
> + else
> + ext2_put_page(dir_page);
> inode_dec_link_count(old_dir);
> }
> return 0;
>
>
> out_dir:
> - if (dir_de) {
> - kunmap(dir_page);
> - put_page(dir_page);
> - }
> + if (dir_de)
> + ext2_put_page(dir_page);
> out_old:
> - kunmap(old_page);
> - put_page(old_page);
> + ext2_put_page(old_page);
> out:
> return err;
> }
> --
> 2.28.0.rc0.12.gb6a658bd00c9
>
--
Jan Kara
SUSE Labs, CR
On Wed 04-11-20 14:12:35, Jan Kara wrote:
> On Tue 03-11-20 09:16:19, Costa Sapuntzakis wrote:
> > Jan, does this fixup from Hillf look ok to you? You originally argued for
> > lock_buffer/unlock_buffer.
> >
> > I think the problem here is that the ext4 code assumes
76].
> > +*/
> > + watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) /
> > + INOTIFY_WATCH_COST;
> > + watches_max = clamp(watches_max, 8192UL, 1048576UL);
> > +
> > BUILD_BUG_ON(IN_ACCESS != FS_ACCESS);
> > BUILD_BUG_ON(IN_MODIFY != FS_MODIFY);
> > BUILD_BUG_ON(IN_ATTRIB != FS_ATTRIB);
> > @@ -827,7 +848,7 @@ static int __init inotify_user_setup(void)
> >
> > inotify_max_queued_events = 16384;
> > init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
> > - init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192;
> > + init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
> >
> > return 0;
> > }
> > --
> > 2.18.1
> >
--
Jan Kara
SUSE Labs, CR
bbb162e20..c2fce22cfd035 100644
> --- a/Documentation/filesystems/ext2.rst
> +++ b/Documentation/filesystems/ext2.rst
> @@ -1,6 +1,7 @@
> .. SPDX-License-Identifier: GPL-2.0
>
>
> +==
> The Second Extended Filesystem
> ==
>
> --
> 2.28.0
>
--
Jan Kara
SUSE Labs, CR
t;
> Signed-off-by: Yang Shi
Thanks! Looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> mm/migrate.c | 2 +-
> mm/vmscan.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> di
o far I have only been able to reproduce on this Intel platform:
>
> HPE DL560 gen10
> Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz
> 131072 MB memory, 1000 GB disk space (smartpqi nvme)
Did you try running with the debug patch Matthew sent? Any results?
4_superblock_csum(sb, es);
> > - unlock_buffer(EXT4_SB(sb)->s_sbh);
> > + spin_unlock_irqrestore(>s_cs_lock, flags);
> > }
> >
> > ext4_fsblk_t ext4_block_bitmap(struct super_block *sb,
> > --- a/fs/ext4/mballoc.c
> > +++ b/fs/ext4/mballoc.c
> > @@ -2868,6 +2868,7 @@ int ext4_mb_init(struct super_block *sb)
> > i++;
> > } while (i <= sb->s_blocksize_bits + 1);
> >
> > + spin_lock_init(>s_cs_lock);
> > spin_lock_init(>s_md_lock);
> > spin_lock_init(>s_bal_lock);
> > sbi->s_mb_free_pending = 0;
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -1439,6 +1439,7 @@ struct ext4_sb_info {
> > loff_t s_bitmap_maxbytes; /* max bytes for bitmap files */
> > struct buffer_head * s_sbh; /* Buffer containing the super
> > block */
> > struct ext4_super_block *s_es; /* Pointer to the super block in
> > the buffer */
> > + spinlock_t s_cs_lock; /* SB checksum lock */
> > struct buffer_head * __rcu *s_group_desc;
> > unsigned int s_mount_opt;
> > unsigned int s_mount_opt2;
> >
--
Jan Kara
SUSE Labs, CR
ned int, flags,
> __u64, mask, int, dfd,
> const char __user *, pathname)
> {
> return do_fanotify_mark(fanotify_fd, flags, mask, dfd, pathname);
> }
> +#endif
>
> #ifdef CONFIG_COMPAT
> COMPAT_SYSCALL_DEFINE6(fanotify_mark,
>
>
> --
>
> Paweł Jasiak
--
Jan Kara
SUSE Labs, CR
eading the value
> > of the old CPU, which is no longer 0.
> >
> > I already fixed a bunch of that in:
> >
> > baffd723e44d ("lockdep: Revert "lockdep: Use raw_cpu_*() for per-cpu
> > variables"")
> >
> > but clearly this one got crossed.
> >
> > Still, that leaves me puzzled over you seeing this on x86 :/
>
> Hi Peter,
>
> I still get the same issue with 5.10-rc2.
> Is there any non-merged patch I should try, or anything I can help with?
BTW, I've just hit the same deadlock issue with ext4 on generic/390 so I
confirm this isn't btrfs specific issue (as we already knew from the
analysis but still it's good to have that confirmed).
Honza
--
Jan Kara
SUSE Labs, CR
more sanity checking into quota code to verify quota
file headers are not corrupted. Because these corrupted headers cause bogus
return values from get_free_blk() and possibly other quota functions which
then confuse __dquot_initialize().
Honza
--
Jan Kara
SUSE Labs, CR
d?
Brian, any idea whether your series could regress fanotify_mark(2) syscall?
Do we have somewhere documented which syscalls need compat wrappers and how
they should look like?
Honza
[1] https://lists.linux.it/pipermail/ltp/2020-June/017436.html
[2] https://lore.kernel.org/lkml/20200313195144.164260-1-brge...@gmail.com/
--
Jan Kara
SUSE Labs, CR
On Fri 30-10-20 14:02:26, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 05:51:05PM +0100, Jan Kara wrote:
> > > @@ -446,6 +447,12 @@ struct mm_struct {
> > >*/
> > > atomic_t has_pinned;
> > >
> > > + /**
> >
_struct.
>
> Fixes: f3c64eda3e50 ("mm: avoid early COW write protect games during fork()")
> Suggested-by: Linus Torvalds
> Link:
> https://lore.kernel.org/r/CAHk-=wi=icnycarbpgjkvju9eyyez13n64tzyldob8cp5q_...@mail.gmail.com
> Reviewed-by: John Hubbard
>
ng its cast
>
> - The handling of ret and nr_pinned can be streamlined a bit
>
> No functional change.
>
> Signed-off-by: Jason Gunthorpe
Looks good to me. You can add:
Reviewed-by: Jan Kara
watches_max = (((si.totalram - si.totalhigh) / 100) << PAGE_SHIFT) /
> + INOTIFY_WATCH_COST;
^^^ So for machines with > 1TB of memory
watches_max would overflow. So you probably need to use ulong for that.
> + watches_max = min(1048576U, max(watches_max, 8192U));
^^^ use clamp() here?
Honza
--
Jan Kara
SUSE Labs, CR
> > > > And this approximation can be pretty accurate at times.
> > > > For example, on Ubuntu 18.04 kernel 5.4.0:
> > > > inode_cache608
> > > > nfs_inode_cache 1088
> > > > btrfs_inode1168
> > > > xfs_inode 1024
> > > > ext4_inode_cache 1096
> > > Just to clarify, is your original 2 * sizeof(struct inode) figure
> > > include the filesystem inode overhead or there is an additional inode
> > > somewhere that I needs to go to 4 * sizeof(struct inode)?
> > No additional inode.
> >
> > #define INOTIFY_WATCH_COST (sizeof(struct inotify_inode_mark) + \
> >2 * sizeof(struct
> > inode))
> >
> > Not sure if the inotify_inode_mark part matters, but it doesn't hurt.
> > Do note that Jan had a different proposal for fs inode size estimation (1K).
> > I have no objection to this estimation if Jan insists.
> >
> > Thanks,
> > Amir.
> >
> Thanks for the confirmation. 2*sizeof(struct inode) is more than 1k. Besides
> with debugging turned on, the size will increase more. So that figure is
> good enough.
Yeah, the 2*sizeof(struct inode) is fine by me as well. Please don't forget
to update the comment explaining INOTIFY_WATCH_COST. Thanks!
Honza
--
Jan Kara
SUSE Labs, CR
@@ static int __init inotify_user_setup(void)
>
> inotify_max_queued_events = 16384;
> init_user_ns.ucount_max[UCOUNT_INOTIFY_INSTANCES] = 128;
> - init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = 8192;
> + init_user_ns.ucount_max[UCOUNT_INOTIFY_WATCHES] = watches_max;
>
> return 0;
> }
> --
> 2.18.1
>
--
Jan Kara
SUSE Labs, CR
llers that care about partial success. See e.g.
iov_iter_get_pages() usage in fs/direct_io.c:dio_refill_pages() or
bio_iov_iter_get_pages(). These places handle partial success just fine and
not allowing partial success from GUP could regress things...
Honza
--
Jan Kara
SUSE Labs, CR
On Mon 26-10-20 12:17:27, Matthew Wilcox wrote:
> On Mon, Oct 26, 2020 at 11:48:06AM +0100, Jan Kara wrote:
> > > +static inline loff_t page_seek_hole_data(struct page *page,
> > > + loff_t start, loff_t end, bool seek_data)
> > > +{
> > > + if
hirez.programming.kicks-ass.net
>
> Make sure you have commit:
>
> f8e48a3dca06 ("lockdep: Fix preemption WARN for spurious IRQ-enable")
>
> (in Linus' tree by now) and do you have CONFIG_DEBUG_PREEMPT enabled?
Hum, I am at 5.10-rc1 now and above mentioned commit doesn't appear to be
there? Also googling for the title doesn't help...
Honza
--
Jan Kara
SUSE Labs, CR
h if a truncate or hole punch is entirely
> within a single page. We can add some more complex logic to restore
> the optimisation if it proves to be worthwhile.
>
> Signed-off-by: Matthew Wilcox (Oracle)
> Reviewed-by: William Kucharski
The patch looks good to me. You c
that this loop forgets to release the page reference it has got
when doing SEEK_HOLE.
> + }
> + rcu_read_unlock();
> +
> + if (seek_data)
> + return -ENXIO;
> + goto out;
> +
> +unlock:
> + rcu_read_unlock();
> + if (!xa_is_value(page))
> + put_page(page);
> +out:
> + if (start > end)
> + return end;
> + return start;
> +}
Honza
--
Jan Kara
SUSE Labs, CR
but so far I failed.
It's good to know it isn't ext4 specific so we should be searching in the
generic code ;). So far I was concentrating more on ext4 bits...
Honza
[1] https://lore.kernel.org/lkml/d3a33205add2f...@google.com/
--
Jan Kara
SUSE Labs, CR
On Sun 25-10-20 23:19:34, Matthew Wilcox wrote:
> On Thu, Oct 01, 2020 at 09:17:28AM +0200, Jan Kara wrote:
> > > I have a followup patch which isn't part of this series which fixes it:
> > >
> > > http://git.infradead.org/users/
fferent names between their
> prototypes and the kernel-doc markup.
>
> Signed-off-by: Mauro Carvalho Chehab
Thanks for the patch. It looks good. You can add:
Reviewed-by: Jan Kara
Honza
>
ll get to your patch in a
week or two.
Honza
>
> -Original Message-----
> From: Jan Kara [mailto:j...@suse.cz]
> Sent: Wednesday, October 21, 2020 6:25 PM
> To: tianxianting (RD)
> Cc: ty...@mit.edu; adilger.ker...
_rate to 3000
> [ 182.751847] perf: interrupt took too long (80901 > 79856), lowering
> kernel.perf_event_max_sample_rate to 2000
> [ 188.527603] WARNING: missing R10 value at __fsnotify_parent+0x25/0x280
OK, that's an unwinder warning but we don't do anything special in
__fsnotify_parent(). Let's CC x86 guys if they have idea what's going on.
Honza
--
Jan Kara
SUSE Labs, CR
Hum, Al, did this patch get lost?
Honza
On Thu 24-09-20 16:58:56, Jan Kara wrote:
> On Thu 24-09-20 13:59:58, Hao Li wrote:
> > If DCACHE_REFERENCED is set, fast_dput() will return true, and then
> > retain_dentry()
patch. It looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/ext4/page-io.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
> index defd2e10d.
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: d3bb68fa8d43bcd889ce86249f73a70e3ba221aa
Gitweb:
https://git.kernel.org/tip/d3bb68fa8d43bcd889ce86249f73a70e3ba221aa
Author:Jan Kara
AuthorDate:Mon, 21 Sep 2020 15:08:50 +02:00
Committer
The following commit has been merged into the perf/urgent branch of tip:
Commit-ID: 061fe185e17a1519a75eee89462f35a5360ece8b
Gitweb:
https://git.kernel.org/tip/061fe185e17a1519a75eee89462f35a5360ece8b
Author:Jan Kara
AuthorDate:Wed, 30 Sep 2020 17:08:20 +02:00
Committer
count,
> + sbi->s_group_desc = kmalloc_array(db_count,
> sizeof(struct buffer_head *),
> GFP_KERNEL);
> if (sbi->s_group_desc == NULL) {
> --
> 2.17.1
>
--
Jan Kara
SUSE Labs, CR
ST_DIRTY 0x0800
> */
> enum {
> _DQUOT_USAGE_ENABLED = 0, /* Track disk usage for users */
> --
> 2.7.4
>
--
Jan Kara
SUSE Labs, CR
On Thu 15-10-20 11:08:43, Jan Kara wrote:
> On Thu 15-10-20 08:46:01, NeilBrown wrote:
> > On Wed, Oct 14 2020, Jan Kara wrote:
> >
> > > On Wed 14-10-20 16:47:06, kernel test robot wrote:
> > >> Greeting,
> > >>
> > >> FYI, we noti
On Thu 15-10-20 08:46:01, NeilBrown wrote:
> On Wed, Oct 14 2020, Jan Kara wrote:
>
> > On Wed 14-10-20 16:47:06, kernel test robot wrote:
> >> Greeting,
> >>
> >> FYI, we noticed a -15.3% regression of will-it-scale.per_process_op
o if there's any negative performance impact of these changes, they're
likely due to code alignment changes or something like that... So I don't
think there's much to do here since optimal code alignment is highly specific
to a particular CPU etc.
Honza
--
Jan Kara
SUSE Labs, CR
ell
> Reviewed-by: Christoph Hellwig
> Acked-by: Darrick J. Wong
> Acked-by: Theodore Ts'o # for fs/ext4/inode.c
The patch looks good to me. Feel free to add:
Reviewed-by: Jan Kara
Honza
> ---
>
> Changes in v
t_var_event(&(_page)->_refcount, \
> + dax_layout_is_idle_page(_page), \
> + TASK_INTERRUPTIBLE, 0, 0, _wait_cb(_inode))
> +
> #ifdef CONFIG_DEV_DAX_HMEM_DEVICES
> void hmem_register_device(int target_nid, struct resource *r);
> #else
> --
> 2.20.1
>
--
Jan Kara
SUSE Labs, CR
the loop (to out: label) anyway due to the loop termination
condition and why not return the frames we already have? Furthermore
find_vma_intersection() can return NULL which would oops in your check
then. What am I missing?
Honza
> out:
> if (locked)
> --
> 2.28.0
>
--
Jan Kara
SUSE Labs, CR
correctly to fail when
> upgrading to v5.9-rc2 or later.
>
> Fix this by defaulting block_validity to off when
> EXT4_FEATURE_RO_COMPAT_SHARED_BLOCKS is set.
>
> Signed-off-by: Josh Triplett
> Fixes: e7bfb5c9bb3
On Mon 05-10-20 03:16:41, Josh Triplett wrote:
> On Mon, Oct 05, 2020 at 11:46:01AM +0200, Jan Kara wrote:
> > On Mon 05-10-20 01:14:54, Josh Triplett wrote:
> > > Ran into an ext4 regression when testing upgrades to 5.9-rc kernels:
> > >
> > > Commit e7bfb5c
sed to
work.
Anyway, if you can make this go away, sure go ahead :)
Honza
--
Jan Kara
SUSE Labs, CR
eature is up to you
but I don't think that belongs to the upstream kernel since that is correct
as is...
Honza
--
Jan Kara
SUSE Labs, CR
m this check
directly in reiserfs_xattr_get().
> + }
There's no need for additional braces in this 'if'.
>
> inode_lock_nested(d_inode(privroot), I_MUTEX_XATTR);
Honza
--
Jan Kara
SUSE Labs, CR
On Wed 30-09-20 18:23:21, Matthew Wilcox wrote:
> On Wed, Sep 30, 2020 at 07:08:07PM +0200, Jan Kara wrote:
> > On Wed 30-09-20 13:36:37, Matthew Wilcox wrote:
> > > On Wed, Sep 30, 2020 at 02:15:12PM +0200, Jan Kara wrote:
> > > > On Mon 14-09-20 14:00:42,
On Wed 30-09-20 13:36:37, Matthew Wilcox wrote:
> On Wed, Sep 30, 2020 at 02:15:12PM +0200, Jan Kara wrote:
> > On Mon 14-09-20 14:00:42, Matthew Wilcox (Oracle) wrote:
> > > All callers now expect head (and base) pages, and can handle multiple
> > > head pages
tead of open-coding how pvecs behave. This has the side-effect of
> being able to append to a pagevec with existing contents, although we
> don't make use of that functionality anywhere yet.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good to me. You can add:
Reviewed-by: Jan Kara
invalidatepage(page, 0,
> - partial_end);
> - unlock_page(page);
> - put_page(page);
> - }
> +
> + if (index != -1)
> + page = find_lock_head(mapping, index);
Similarly to shmem the use of index is a bit confusing here but it at least
gets used in this case so OK. But I'd still find something like:
if (!tail_page_already_truncated)
page = find_lock_head(mapping, lend >> PAGE_SHIFT);
easier to grasp.
> + if (page) {
> + if (!truncate_inode_partial_page(page, lstart, lend))
> + end = page->index;
> + unlock_page(page);
> + put_page(page);
> }
Honza
--
Jan Kara
SUSE Labs, CR
On Tue 29-09-20 13:48:06, Matthew Wilcox wrote:
> On Tue, Sep 29, 2020 at 10:58:55AM +0200, Jan Kara wrote:
> > On Mon 14-09-20 14:00:35, Matthew Wilcox (Oracle) wrote:
> > > We have three functions (shmem_undo_range(), truncate_inode_pages_range()
> > > and invalidate_
de path.
> It must be a different reason.
Yeah, it seems the bisection got confused because it hit a different error
during the bisection. Looking at the original oops, I think the actual
reason of a crash is that quota file got corrupted in a particular way.
Quota code is not very paranoid
pages today are in-memory, so there are no tagged huge pages today.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> mm/filemap.c | 10 +-
> 1 file
On Mon 14-09-20 14:00:40, Matthew Wilcox (Oracle) wrote:
> pagevec_lookup_entries() is now just a wrapper around find_get_entries()
> so remove it and convert all its callers.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good. You can add:
Reviewed-
On Mon 14-09-20 14:00:39, Matthew Wilcox (Oracle) wrote:
> All callers of find_get_entries() use a pvec, so pass it directly
> instead of manipulating it in the caller.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good. You can add:
Reviewed-
On Mon 14-09-20 14:00:38, Matthew Wilcox (Oracle) wrote:
> All callers want to fetch the full size of the pvec.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good. You can add:
Reviewed-by: Jan Kara
Honza
> --
by: Matthew Wilcox (Oracle)
Looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> include/linux/pagevec.h | 5 ++---
> mm/swap.c | 8
>
do_range() which will try
again so what you did might make a difference with performance but not much
else. But still it would be good to at least comment about this in the
changelog...
Honza
--
Jan Kara
SUSE Labs, CR
rcu_read_lock();
> + while ((page = xas_find_get_entry(, max, XA_PRESENT))) {
> + loff_t pos = xas.xa_index * PAGE_SIZE;
OK, but for ordinary filesystems this could be problematic because of
exceptional entries?
Also for shmem you've dropped the PageUptodate check which
is a simpler function to use than find_get_pages(), so use it instead.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good to me. BTW, I think I've already reviewed this... You can add:
Reviewed-by: Jan Kara
Honza
> -
e)
Looks good. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> mm/filemap.c | 98 +++-
> 1 file changed, 43 insertions(+), 55 deletions(-)
>
> diff --git a/mm/fi
se this could result from UDF image where sparing table is larger
than a block. I've added check of the sparing table size to the mount path.
Honza
--
Jan Kara
SUSE Labs, CR
G: KMSAN: uninit-value in udf_evict_inode+0x382/0x7d0 fs/udf/inode.c:150
Yeah, easy enough. I'll send a fix.
Honza
--
Jan Kara
SUSE Labs, CR
/open.c:1240
> __ia32_compat_sys_openat+0x56/0x70 fs/open.c:1240
> do_syscall_32_irqs_on arch/x86/entry/common.c:80 [inline]
> __do_fast_syscall_32+0x129/0x180 arch/x86/entry/common.c:139
> do_fast_syscall_32+0x6a/0xc0 arch/x86/entry/common.c:162
> do_SYSENTER_32+0x73/0x90 arch/x86/en
will be
> killed and the inode will be evicted. In this way, if we change per-file
> DAX policy, it will take effects automatically after this file is closed
> by all processes.
>
> I also add some comments to make the code more clear.
>
> Signed
gt; Acked-by: Coly Li
> Reviewed-by: Johannes Thumshirn
The patch looks good to me now. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> block/blk-settings.c | 18 --
> block/blk-sysfs.c
On Thu 24-09-20 11:02:37, Jason Gunthorpe wrote:
> On Thu, Sep 24, 2020 at 09:44:09AM +0200, Jan Kara wrote:
> > > After the page is pinned it is prevented from being freed and
> > > recycled. After GUP has the pin it must check that the PTE still
> > > points at the
.kernel.org
> Signed-off-by: Eric Biggers
Looks good. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/ext4/super.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
On Thu 17-09-20 16:10:49, Matthew Wilcox (Oracle) wrote:
> The udf inline data readpage implementation was already synchronous,
> so use AOP_UPDATED_PAGE to avoid cycling the page lock.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good. You can add:
Reviewed-
On Wed 23-09-20 14:12:07, Jason Gunthorpe wrote:
> On Wed, Sep 23, 2020 at 04:20:03PM +0200, Jan Kara wrote:
>
> > I'd hate to take spinlock in the GUP-fast path. Also I don't think this is
> > quite correct because GUP-fast-only can be called from interrupt context
> &
On Thu 17-09-20 18:57:07, Christoph Hellwig wrote:
> We can only scan for partitions on the whole disk, so move the flag
> from struct block_device to struct gendisk.
>
> Signed-off-by: Christoph Hellwig
Makes sense. You can add:
Reviewed-
r, unsigned long end,
> struct dev_pagemap *pgmap = NULL;
> int nr_start = *nr, ret = 0;
> pte_t *ptep, *ptem;
> + spinlock_t *ptl = NULL;
> +
> + /*
> +* More strict with FOLL_PIN, otherwise it could race with fork().
> The
> +* page table lock guarantees that fork() will capture all the pinned
> +* pages when dup_mm() and do proper page copy on them.
> +*/
> + if (flags & FOLL_PIN) {
> + ptl = pte_lockptr(mm, pmd);
> + if (!spin_trylock(ptl))
> + return 0;
> + }
I'd hate to take spinlock in the GUP-fast path. Also I don't think this is
quite correct because GUP-fast-only can be called from interrupt context
and page table locks are not interrupt safe. That being said I don't see
what's wrong with the solution Jason proposed of first setting writeprotect
and then checking page_may_be_dma_pinned() during fork(). That should work
just fine AFAICT... BTW note that GUP-fast code is (and this is deliberated
because e.g. DAX depends on this) first updating page->_refcount and then
rechecking PTE didn't change and the page->_refcount update is actually
done using atomic_add_unless() so that it cannot be reordered wrt the PTE
check. So the fork() code only needs to add barriers to pair with this.
Honza
--
Jan Kara
SUSE Labs, CR
On Wed 23-09-20 09:50:04, Peter Xu wrote:
> On Wed, Sep 23, 2020 at 11:22:05AM +0200, Jan Kara wrote:
> > On Tue 22-09-20 13:01:13, John Hubbard wrote:
> > > On 9/22/20 3:33 AM, Jan Kara wrote:
> > > > On Mon 21-09-20 23:41:16, John Hubbard wrote:
> > >
On Wed 23-09-20 01:12:31, Hui Su wrote:
> the struct name was modified long ago, but the comment still
> use struct handle_s.
>
> Signed-off-by: Hui Su
Thanks for the patch. It looks good to me. You can add:
Reviewed-
sb, data.part_descs_loc[i].rec.block);
> if (ret < 0)
> - return ret;
> + goto out;
> }
>
> - return 0;
> + ret = 0;
> +out:
> + kfree(data.part_descs_loc);
> + return ret;
> }
>
> /*
> --
> 2.25.1
>
--
Jan Kara
SUSE Labs, CR
rkloads but you have to have
some way to recover from crashes so it's mostly used for scratch
filesystems (e.g. in build systems, Google uses this feature a lot for some
of their infrastructure as well).
Honza
--
Jan Kara
SUSE Labs, CR
ees with large_dir feature (mkfs.ext4 -O large_dir). Does
that help?
Honza
--
Jan Kara
SUSE Labs, CR
On Tue 22-09-20 13:01:13, John Hubbard wrote:
> On 9/22/20 3:33 AM, Jan Kara wrote:
> > On Mon 21-09-20 23:41:16, John Hubbard wrote:
> > > On 9/21/20 2:20 PM, Peter Xu wrote:
> > > ...
> > > > + if (unlikely(READ_ONCE(src_mm->has_pinned) &&
s. For file pages mm->has_pinned does not work because the page may be
still pinned by completely unrelated process as Jann already properly
pointed out earlier in the thread. So maybe anon_page_likely_pinned()?
Possibly also assert PageAnon(page) in it if we want to be paranoid...
Honza
--
Jan Kara
SUSE Labs, CR
t; struct timestamp *ts;
>
> outstr = kmalloc(128, GFP_NOFS);
> --
> 2.17.1
>
--
Jan Kara
SUSE Labs, CR
00 00 00 00 0f 1f 44 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 0f
> 83 fd 89 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:7fff59216328 EFLAGS: 0246 ORIG_RAX: 00a6
> RAX: RBX: 00076035 RCX: 00460027
> RDX: 00403188 RSI: 0002 RDI: 7fff592163d0
> RBP: 0333 R08: R09: 000b
> R10: 0005 R11: 0246 R12: 7fff59217460
> R13: 02df2a60 R14: R15: 7fff59217460
>
>
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
--
Jan Kara
SUSE Labs, CR
> device's (%lu -> %lu)\n",
> - q->backing_dev_info->ra_pages,
> - b->backing_dev_info->ra_pages);
> - q->backing_dev_info->ra_pages =
> - b->backing_dev_info->ra_pages;
> - }
> - }
> fixup_discard_if_not_supported(q);
> fixup_write_zeroes(device, q);
> }
--
Jan Kara
SUSE Labs, CR
ood to me. You can add:
Reviewed-by: Jan Kara
I'd just prefer if the changelog explicitely mentioned that this patch
results in enabling readahead for coda, ecryptfs, and orangefs... Just in
case someone bisects some issue down to this patch :).
ig
Looks good. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> drivers/block/aoe/aoeblk.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/aoe/aoeblk.c b/driv
On Mon 21-09-20 10:07:24, Christoph Hellwig wrote:
> Inherit the optimal I/O size setting just like the readahead window,
> as any reason to do larger I/O does not apply to just readahead.
>
> Signed-off-by: Christoph Hellwig
The patch looks good to me. You can add:
Reviewed-
On Mon 21-09-20 18:59:43, Matthew Wilcox wrote:
> On Mon, Sep 21, 2020 at 09:20:25AM -0700, Linus Torvalds wrote:
> > On Mon, Sep 21, 2020 at 2:11 AM Jan Kara wrote:
> > >
> > > Except that on truncate, we have to unmap these
> > > anonymous pages in private f
On Mon 21-09-20 11:23:07, Naresh Kamboju wrote:
> On Fri, 18 Sep 2020 at 11:18, Dan Williams wrote:
> >
> > From: Jan Kara
> >
> > DM was calling generic_fsdax_supported() to determine whether a device
> > referenced in the DM table supports DAX. However this i
> all my local builds are breaking now too with this :(
>
> Was there a proposed patch anywhere for this?
Attached patch should fix the build breakage. I'm sorry for that.
Honza
--
Jan Kara
SUSE Labs, CR
>From 8b8c7d6148b
ous page for that offset, copy to it current
contents of the corresponding file page, and from that moment on it behaves
as an anonymous page. Except that on truncate, we have to unmap these
anonymous pages in private file mappings as well...
Honza
--
Jan Kara
SUSE Labs, CR
f even
ordinary threaded FOLL_PIN users would not have to be that careful about
fork(2) and possible data loss due to COW - we had certainly reports of
O_DIRECT IO loosing data due to fork(2) and COW exactly because it is very
subtle how it behaves... But as I wrote above this is not urgent since that
problematic behavior exists since the beginning of O_DIRECT IO in Linux.
Honza
--
Jan Kara
SUSE Labs, CR
ly to matter. The lock hold times there are long enough
that it would be just lost in the noise.
For other stuff using them like get_online_cpus() or get_online_mems() I'm
not so sure...
Honza
--
Jan Kara
SUSE Labs, CR
ge basis for buffered
IO.
Honza
--
Jan Kara
SUSE Labs, CR
101 - 200 of 6478 matches
Mail list logo