er-page basis for buffered
IO.
Honza
--
Jan Kara
SUSE Labs, CR
be seen in random writes (overwrite). Also bcoz this optimizes
> away the spinlock contention during jbd2 slab cache allocation
> (jbd2_journal_handle). On x86 VM, ~2x perf improvement was observed.
>
> Reported-by: Dan Williams
> Suggested-by: Jan Kara
> Signed-off-by: Ritesh
ake a alot of driver patches to sort it all
> out.
I somewhat fear that some of the users of pin_user_pages() don't bother
with pinned_vm accounting exactly because they don't have mm_struct on
unpin...
Honza
--
Jan Kara
SUSE Labs, CR
need to
unaccount on unpin. And that can happen from a different task context (e.g.
IRQ handler for direct IO) so we won't have proper mm_struct available.
> Could we move pinned_vm out of the drivers/rdma subsystem?
I'd love to because IMO it's a mess...
Honza
--
Jan Kara
SUSE Labs, CR
hat we could add 'read_count_in_irq' to
percpu_rw_semaphore. So callers in normal context would use read_count and
callers in irq context would use read_count_in_irq. And the writer side
would sum over both but we don't care about performance of that one much.
Honza
--
Jan Kara
SUSE Labs, CR
/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1427,10 +1427,6 @@ static int cached_dev_init(struct cached_dev *dc,
> unsigned int block_size)
> if (ret)
> return ret;
>
> - dc->disk.disk->queue->backing_dev_info->ra_pages =
> - max(dc->disk.disk->queue->backing_dev_info->ra_pages,
> - q->backing_dev_info->ra_pages);
> -
So bcache is basically stacking readahead here on top of underlying cache
device. I don't see this being replicated by your patch so it is lost now?
Probably this should be replaced by properly inheriting optimal IO size?
Honza
--
Jan Kara
SUSE Labs, CR
#x27;t this be more logical in bdi_init() than in bdi_alloc()?
Honza
--
Jan Kara
SUSE Labs, CR
..6a8286132751df 100644
> --- a/include/linux/drbd.h
> +++ b/include/linux/drbd.h
> @@ -94,7 +94,6 @@ enum drbd_read_balancing {
> RB_PREFER_REMOTE,
> RB_ROUND_ROBIN,
> RB_LEAST_PENDING,
> - RB_CONGESTED_REMOTE,
> RB_32K_STRIPING,
> RB_64K_STRIPING,
> RB_128K_STRIPING,
> --
> 2.28.0
>
--
Jan Kara
SUSE Labs, CR
On Thu 10-09-20 16:48:22, Christoph Hellwig wrote:
> Ever since the switch to blk-mq, a lower device not used for VM
> writeback will not be marked congested, so the check will never
> trigger.
>
> Signed-off-by: Christoph Hellwig
Looks good to me. You can add:
Review
On Thu 10-09-20 16:48:21, Christoph Hellwig wrote:
> The last user of SB_I_MULTIROOT is disappeared with commit f2aedb713c28
> ("NFS: Add fs_context support.")
>
> Signed-off-by: Christoph Hellwig
> Reviewed-by: Johannes Thumshirn
Nice. You can ad
which always supports cgroup writeback.
>
> Signed-off-by: Christoph Hellwig
> Reviewed-by: Johannes Thumshirn
Makes sense. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> block/blk-core.c| 1 -
> f
ond
> set of block_device_operations as it can switch between modes that
> actually support ->rw_page and those who don't.
>
> Signed-off-by: Christoph Hellwig
The patch looks good to me. You can add:
Reviewed-by: Jan Kara
Hon
eck the flag.
>
> Signed-off-by: Christoph Hellwig
Looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/9p/vfs_file.c| 2 +-
> fs/fs-writeback.c | 7 +++---
> in
wig
The patch looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/fuse/inode.c | 3 ++-
> include/linux/backing-dev.h | 13 +++--
> mm/backing-dev.c| 1 +
> mm
ble_pages_required attribute is not nice but probably it
isn't widely used. Maybe the deprecation message can even mention to use
the queue attribute? Otherwise the patch looks good to me so feel free to
add:
Reviewed-by: Jan Kara
Honza
--
Jan Kara
SUSE Labs, CR
On Thu 10-09-20 16:48:29, Christoph Hellwig wrote:
> There is no point in trying to call bdev_read_page if SWP_SYNCHRONOUS_IO
> is not set, as the device won't support it.
>
> Signed-off-by: Christoph Hellwig
Looks good to me. You can add:
Revi
On Thu 17-09-20 08:37:17, Nikolay Borisov wrote:
> On 17.09.20 г. 4:44 ч., Dave Chinner wrote:
> > On Wed, Sep 16, 2020 at 05:58:51PM +0200, Jan Kara wrote:
> >> On Sat 12-09-20 09:19:11, Amir Goldstein wrote:
> >>> On Tue, Jun 23, 2020 at 8:21 AM Dave Chinne
handler from fs POV and does not need
protection from hole punching (current serialization on page lock and
checking of page->mapping is enough).
That being said I agree this is subtle and the moment someone adds e.g. a
readahead call into filemap_map_pages() we have a real problem. I'm not
sure how to prevent this risk...
Honza
--
Jan Kara
SUSE Labs, CR
ot+187510916eb6a1459...@syzkaller.appspotmail.com
> > > Signed-off-by: Eric Biggers
> >
> > Anyone interested in taking this patch?
>
> Jan, you seem to be taking some reiserfs patches... Any interest in
> taking this one?
Sure, the patch looks good to me so I've added it to my tree. Thanks!
Honza
--
Jan Kara
SUSE Labs, CR
zation between page cache and various fs operations is just
too complex with too many special corner cases. But that's difficult to
change while keeping all the features and performance. So the best
realistic answer I have (and this is not meant to discourage anybody from
trying to implement a simpler scheme of page-cache - filesystem interaction
:) is that we should have added a fstest when XFS fix landed which would
then hopefully catch attention of other fs maintainers (at least those that
do run fstest).
Honza
--
Jan Kara
SUSE Labs, CR
window which includes the goal, or the previous one
> @@ -859,7 +859,7 @@ static int find_next_reservable_window(
> *
> * failed: we failed to find a reservation window in this group
> *
> - * @rsv: the reservation
> + * @my_rsv: the reservation
> *
> * @grp_goal: The goal (group-relative). It is where the search for a
> * free reservable space should start from.
> --
> 2.17.1
>
--
Jan Kara
SUSE Labs, CR
On Wed 09-09-20 19:03:07, Amir Goldstein wrote:
> On Wed, Sep 9, 2020 at 2:11 PM Jan Kara wrote:
> >
> > On Wed 09-09-20 10:36:57, Amir Goldstein wrote:
> > > On Wed, Sep 9, 2020 at 10:00 AM Xiaoming Ni wrote:
> > > >
> > > > On 2020/9/9 11:44, Ami
On Wed 09-09-20 13:58:50, Michael Kerrisk (man-pages) wrote:
> [CC += Neil, since he wrote the text we're talking about]
>
> Hello Jan,
>
> On 9/9/20 1:21 PM, Jan Kara wrote:
> > On Wed 09-09-20 12:52:48, Michael Kerrisk (man-pages) wrote:
> >>> So the err
snotify_open() event (most notably
io_uring, exec, or do_handle_open) and there are others as Xiaoming found
which just don't bother. I'm not sure filp_open() should unconditionally
generate fsnotify_open() event as IMO some of those notifications would be
more confusing than useful.
OTOH it is true that e.g. for core dumping we will generate other fsnotify
events such as FSNOTIFY_CLOSE (which is generated in __fput()) so missing
FSNOTIFY_OPEN is somewhat confusing. So having some consistency in this
(either by generating FSNOTIFY_OPEN or by not generating FSNOTIFY_CLOSE)
would be IMO desirable.
Honza
--
Jan Kara
SUSE Labs, CR
all of the
> open file descriptions connected to the inode? Your thoughts?
The error gets reported once for each "open file description" of the file
(inode) where the error happened. If there are multiple file descriptors
pointing to the same open file description, then only one of those fil
atomic_long_t f_count;
> } __randomize_layout
>__attribute__((aligned(4))); /* lest something weird decides that 2
> is OK */
>
> --
> 2.7.4
>
--
Jan Kara
SUSE Labs, CR
on" in manpages) and
so EIO / ENOSPC is reported once for each file description of the file that
was open before the error happened. Not sure if we want to be so precise in
the manpages or if it just confuses people. Anyway your takeway that no
error on subsequent fsync() does not mean data was written is correct.
Honza
--
Jan Kara
SUSE Labs, CR
->proc_handler")
> Cc: Christoph Hellwig
> Cc: Al Viro
> Signed-off-by: Tobias Klauser
Thanks! The patch looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/fs-writeback.c | 2 +-
&
ext2_dax_fault(struct
> ret = dax_iomap_fault(vmf, PE_SIZE_PTE, NULL, NULL, &ext2_iomap_ops);
>
> up_read(&ei->dax_sem);
> - if (vmf->flags & FAULT_FLAG_WRITE)
> + if (write)
> sb_end_pagefault(inode->i_sb);
> return ret;
> }
>
--
Jan Kara
SUSE Labs, CR
or this and have Jan do one for
> ext2, I just applied these two directly as "ObviouslyCorrect(tm)".
OK, thanks!
Honza
--
Jan Kara
SUSE Labs, CR
On Fri 28-08-20 12:07:55, Jan Kara wrote:
> On Wed 26-08-20 19:48:16, syzbot wrote:
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit:c3d8f220 Merge tag 'kbuild-fixes-v5.9' of git://git.kernel..
> > g
(sizeof(struct buffer_head *) * nr_groups);
> > -
> > - if (size <= PAGE_SIZE)
> > - bitmap = kzalloc(size, GFP_KERNEL);
> > - else
> > - bitmap = vzalloc(size); /* TODO: get rid of vzalloc */
> > + int nr_groups = udf_compute_nr_groups(sb, index);
> >
> > + bitmap = kvzalloc(struct_size(bitmap, s_block_bitmap, nr_groups),
> > + GFP_KERNEL);
> > if (!bitmap)
> > return NULL;
> >
> >
>
--
Jan Kara
SUSE Labs, CR
ps://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
--
Jan Kara
SUSE Labs, CR
care about performance, and want to see a shiny random
> number generator, by all means, use io_schedule().
Honza
--
Jan Kara
SUSE Labs, CR
p_desc and gdb_bh are already
referenced from the superblock so you cannot free them, iloc.bh has been
released in ext4_mark_iloc_dirty().
Honza
--
Jan Kara
SUSE Labs, CR
fio
> 16294 root 20 0 272404 3624 1872 S 1.0 0.0 0:03.60 fio
> 16296 root 20 0 272412 3564 1864 S 1.0 0.0 0:03.60 fio
> 16299 root 20 0 272424 3540 1840 S 1.0 0.0 0:03.62 fio
> 16301 root 20 0 272432 3568
develop new interface like
> io_wait_event_hrtimeout(), then we can use it instead of
> wait_event_interruptible_hrtimeout()?
Yes, that's what I'd do.
Honza
>
> On 08/27/2020 15:55, Jan Kara wrote:
> Hello!
&
r patch is
rather pointless.
Honza
> On 08/26/2020 21:23, Jan Kara wrote:
> On Wed 05-08-20 09:35:51, Xianting Tian wrote:
> > When waiting for the completion of io, we need account iowait time. As
> > wait_for_completion() calls schedule_timeout(), which doesn't account
>
ee = 0,
> };
>
> if (!bdi_has_dirty_io(bdi) || bdi == &noop_backing_dev_info)
> @@ -2538,6 +2536,7 @@ void sync_inodes_sb(struct super_block *sb)
> .done = &done,
> .reason = WB_REASON_SYNC,
> .
tx *ctx, long
> min_nr, long nr,
>* is destroyed.
>*/
> if (!ret)
> - wait_for_completion(&wait.comp);
> + wait_for_completion_io(&wait.comp);
>
> return ret;
> }
> --
> 1.8.3.1
>
--
Jan Kara
SUSE Labs, CR
On Tue 25-08-20 14:28:14, Matthew Wilcox wrote:
> On Tue, Aug 25, 2020 at 02:33:24PM +0200, Jan Kara wrote:
> > On Mon 24-08-20 18:36:39, Matthew Wilcox wrote:
> > > We already have functions in filemap which take a pagevec, eg
> > > page_cache_delete_batch() and d
On Mon 24-08-20 18:36:39, Matthew Wilcox wrote:
> On Mon, Aug 24, 2020 at 06:16:20PM +0200, Jan Kara wrote:
> > On Wed 19-08-20 16:05:54, Matthew Wilcox (Oracle) wrote:
> > > All callers of find_get_entries() use a pvec, so pass it directly
> > > instead of ma
-1)
> break;
> diff --git a/mm/swap.c b/mm/swap.c
> index d4e3ba4c967c..40b23300d353 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -1060,9 +1060,7 @@ unsigned pagevec_lookup_entries(struct pagevec *pvec,
> struct address_space *mapping, pgoff_t start, pgoff_t end,
> pgoff_t *indices)
> {
> - pvec->nr = find_get_entries(mapping, start, end, PAGEVEC_SIZE,
> - pvec->pages, indices);
> - return pagevec_count(pvec);
> + return find_get_entries(mapping, start, end, pvec, indices);
> }
>
> /**
> --
> 2.28.0
>
--
Jan Kara
SUSE Labs, CR
On Wed 19-08-20 16:05:53, Matthew Wilcox (Oracle) wrote:
> All callers want to fetch the full size of the pvec.
>
> Signed-off-by: Matthew Wilcox (Oracle)
Looks good to me. You can add:
Reviewed-by: Jan Kara
On Wed 19-08-20 16:05:52, Matthew Wilcox (Oracle) wrote:
> Simplifies the callers and uses the existing functionality of
> find_get_entries().
>
> Signed-off-by: Matthew Wilcox (Oracle)
The patch looks good to me. You can add:
Reviewed-
> -*/
> > - page_move_anon_rmap(vmf->page, vma);
> > - }
> > - unlock_page(vmf->page);
> > - wp_page_reuse(vmf);
> > - return VM_FAULT_WRITE;
> > }
> > - unlock_page(vmf->page);
> > + /*
> > +* Ok, we've got the only map reference, and the only
> > +* page count reference, and the page is locked,
> > +* it's dark out, and we're wearing sunglasses. Hit it.
> > +*/
> > + wp_page_reuse(vmf);
> > + unlock_page(page);
> > + return VM_FAULT_WRITE;
> > } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
> > (VM_WRITE|VM_SHARED))) {
> > return wp_page_shared(vmf);
> >
>
--
Jan Kara
SUSE Labs, CR
ut someone else
> acquired the mmap_sem and the vma is gone.
>
> Releasing mmap_sem after accessing vma should fix the problem.
>
> Fixes: 692fe62433d4c ("mm: Handle MADV_WILLNEED through vfs_fadvise()")
> Reported-by: syzbot+b90df26038d1d5d85...@syzkaller.appspotmai
On Fri 21-08-20 17:33:06, Matthew Wilcox wrote:
> On Fri, Aug 21, 2020 at 06:07:59PM +0200, Jan Kara wrote:
> > On Wed 19-08-20 16:05:51, Matthew Wilcox (Oracle) wrote:
> > > This simplifies the callers and leads to a more efficient implementation
> > > since the XA
+++ b/mm/swap.c
> @@ -1060,7 +1060,7 @@ unsigned pagevec_lookup_entries(struct pagevec *pvec,
> pgoff_t start, unsigned nr_entries,
> pgoff_t *indices)
> {
> - pvec->nr = find_get_entries(mapping, start, nr_entries,
> + pvec->nr = find_get_entries(mapping, start, ULONG_MAX, nr_entries,
> pvec->pages, indices);
> return pagevec_count(pvec);
> }
> --
> 2.28.0
>
--
Jan Kara
SUSE Labs, CR
> is a simpler function to use than find_get_pages(), so use it instead.
>
> Signed-off-by: Matthew Wilcox (Oracle)
This looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> mm/shmem.c | 11 +--
&
On Fri 21-08-20 05:27:40, Linus Torvalds wrote:
> On Fri, Aug 21, 2020 at 3:13 AM Jan Kara wrote:
> >
> > > + if (page_mapcount(page) != 1 && page_count(page) != 1) {
> >
> > So this condition looks strange to me... Did you mean:
> >
>
/*
> + * Ok, we've got the only map reference, and the only
> + * page count reference, and the page is locked,
> + * it's dark out, and we're wearing sunglasses. Hit it.
> + */
> + wp_page_reuse(vmf);
> + unlock_page(page);
> + return VM_FAULT_WRITE;
> } else if (unlikely((vma->vm_flags & (VM_WRITE|VM_SHARED)) ==
> (VM_WRITE|VM_SHARED))) {
> return wp_page_shared(vmf);
> --
> 2.28.0.218.gc12ef3d349
>
--
Jan Kara
SUSE Labs, CR
e of pages do not have any
> references (and don't want to unmap all the pages of inode).
>
> Hence, create a range version of this function named
> dax_layout_busy_page_range() which can be used to pass a range which
> needs to be unmapped.
>
> Cc: Dan Williams
>
set + length - 1) >> blkbits,
> EXT4_MAX_LOGICAL_BLOCK) - map.m_lblk + 1;
>
> - if (flags & IOMAP_WRITE)
> + if ((flags & IOMAP_WRITE) && !overwrite)
> ret = ext4_iomap_alloc(inode, &map, flags);
> else
>
/ext2/inode.c
> > index 80662e1f7889..de6b97612410 100644
> > --- a/fs/ext2/inode.c
> > +++ b/fs/ext2/inode.c
> > @@ -36,7 +36,6 @@
> > #include
> > #include
> > #include
> > -#include
> > #include "ext2.h"
> > #include "acl.h"
> > #include "xattr.h"
> >
>
--
Jan Kara
SUSE Labs, CR
ned-off-by: Stefan Hajnoczi
> Signed-off-by: Vivek Goyal
> Cc: Christoph Hellwig
> Cc: Dan Williams
> Cc: linux-nvd...@lists.01.org
This patch looks OK to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
>
har *s_mount_opts = kstrndup(sbi->s_es->s_mount_opts,
> + char *s_mount_opts = kmemdup_nul(sbi->s_es->s_mount_opts,
> sizeof(sbi->s_es->s_mount_opts),
> GFP_KERNEL);
> if (!s_mount_opts)
> --
> 1.8.3.1
>
--
Jan Kara
SUSE Labs, CR
or other
> + * get_user_pages() usages.
> + *
> + * It is expected that the filesystem is holding locks to block the
> + * establishment of new mappings in this address_space. I.e. it expects
> + * to be able to run unmap_mapping_range() and subsequently not race
> + * mapping_mapped() becoming true.
> + */
> +struct page *dax_layout_busy_page(struct address_space *mapping)
> +{
> + return dax_layout_busy_page_range(mapping, 0, 0);
Should the 'end' rather be LLONG_MAX?
Otherwise the patch looks good to me.
Honza
--
Jan Kara
SUSE Labs, CR
ade because of errors.
> make[1]: *** [/linux/Makefile:1006: fs] Error 2
> make[1]: Target 'Image' not remade because of errors.
> make: *** [Makefile:152: sub-make] Error 2
> make: Target 'Image' not remade because of errors.
>
> --
> Linaro LKFT
> https://lkft.linaro.org
--
Jan Kara
SUSE Labs, CR
On Wed 15-07-20 11:00:44, brookxu wrote:
> Fix spelling typos in ext4_mb_initialize_context.
>
> Signed-off-by: Chunguang Xu
Looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/ext4/mballoc.c |
ine is loading the previous index file and is
> processing the search request, it can not use buffer io that may squeeze
> the previous index file in use from pagecache, so the serch service must
> use direct I/O read.
>
> Please apply this patch on these kernel versions, or please use th
ree to add:
Reviewed-by: Jan Kara
Honza
>
> 发自我的iPhone
>
> > 在 2020年6月29日,下午5:45,Jiang Ying 写道:
> >
> > This patch is used to fix ext4 direct I/O read error when
> > the read size is not ali
drm_kms_helper
> evdev r8169 snd_hda_intel syscopyarea snd_intel_dspcfg realtek
> snd_hda_codec libphy crc32_pclmul sysfillrect serio_raw sysimgblt
> snd_hwdep fb_sys_fops snd_hda_core drm snd_pcm fan thermal
> drm_panel_orientation_quirks snd_timer intel_gtt 8250 agpgart snd
> 8250_base ehci_pci serial_core button ehci_hcd video soundcore
> i2c_i801 lpc_ich mfd_core mei_me mei loop
> [99390.044800] ---[ end trace 2ca57858c52a0ad4 ]---
>
> --
> nirinA
--
Jan Kara
SUSE Labs, CR
struct super_block *sb;
> + struct buffer_head *sbh;
> +
> + sb = handle->h_transaction->t_journal->j_private;
> + sbh = EXT4_SB(sb)->s_sbh;
> + if (unlikely(!buffer_mapped(sbh))) {
> + return -EIO;
> + }
> +
> err = jbd2_journal_get_write_access(handle, bh);
> if (err)
> ext4_journal_abort_handle(where, line, __func__, bh,
> --
> 1.8.3.1
>
--
Jan Kara
SUSE Labs, CR
76,7 @@ static int ext4_create_inline_data(handle_t *handle,
> len = 0;
> }
>
> - /* Insert the the xttr entry. */
> + /* Insert the xttr entry. */
> i.value = value;
> i.value_len = len;
>
> --
> 2.17.1
>
--
Jan Kara
SUSE Labs, CR
On Mon 27-07-20 16:56:16, Al Viro wrote:
> On Mon, Jul 27, 2020 at 02:41:27PM +0200, Jan Kara wrote:
> > On Sun 26-07-20 18:04:01, Christoph Hellwig wrote:
> > > Fold the misaligned u64 workarounds into the main quotactl flow instead
> > > of implementing a sepa
so feel
free to add:
Acked-by: Jan Kara
Honza
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 2 +-
> fs/quota/Kconfig | 5 --
> fs/quota/Makefile |
c_type);
> +
> goto out;
> }
> iinfo->i_unique = 0;
>
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
Jan Kara
SUSE Labs, CR
fold fsnotify() call into fsnotify_parent()
> 71d734103edf fsnotify: Rearrange fast path to minimise overhead when
> there is no watcher
> 47aaabdedf36 fanotify: Avoid softlockups when reading many events
>
> Not only did I not observe a regression with the reported commit,
> but t
On Sun 19-07-20 17:14:31, Randy Dunlap wrote:
> Drop the repeated word "than" in a comment.
>
> Signed-off-by: Randy Dunlap
> Cc: Jan Kara
> Cc: Jeff Mahoney
> Cc: reiserfs-de...@vger.kernel.org
Thanks! Applied.
On Sun 19-07-20 17:13:27, Randy Dunlap wrote:
> Change the repeated word "the" in "it the the" to "it is the".
> Fix typo "recentl" to "recently".
> Fix verb "give" to "gives".
>
> Signed-of
On Sun 19-07-20 17:14:55, Randy Dunlap wrote:
> Drop the repeated word "struct" in a comment.
>
> Signed-off-by: Randy Dunlap
> Cc: Jan Kara
Thanks! Applied.
Honza
> ---
> fs/udf/osta_udf.h |
more looking at the call stack
__jbd2_journal_insert_checkpoint() already holds the journal_head we are
interested in so it rather looks like we race with invalidation of the
block device buffer cache after NBD device disappeared. There were some
changes in the lifetime of the block devices after 4.14. Can you reproduce
the issue with some more recent kernel because I suspect the problem may be
already fixed. Anyway the right fix is to make sure NBD does not destroy
buffers while the filesystem is still using them...
> jh = bh2jh(bh);
> jh->b_jcount++;
> }
>
Honza
--
Jan Kara
SUSE Labs, CR
On Sat 18-07-20 08:57:37, Xianting Tian wrote:
> Remove unnecessary blank.
>
> Signed-off-by: Xianting Tian
Looks fine. Feel free to add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/jbd2/journal.c | 12 ++-
ks_needed != 1)
> >>kfree(un);
> >>
> >> Because the kcalloc failure falls back to using unf_single,
> >> the if-check for the free is wrong.
> > I think you mean "Because clang's static analysis is limited, it
> > warns incorrectly a
lot. That will probably silence the softlockup for you as well
(although it's not really fixing the underlying issue).
We'll have a look what we can do about this :)
Honza
--
Jan Kara
SUSE Labs, CR
@@
> * This code is based on version 2.00 of the UDF specification,
> * and revision 3 of the ECMA 167 standard [equivalent to ISO 13346].
> *http://www.osta.org/
> - *http://www.ecma.ch/
> - *http://www.iso.org/
> + *https://www.ecma.ch/
> + *https://www.iso.org/
> *
> * COPYRIGHT
> * This file is distributed under the terms of the GNU General Public
> --
> 2.27.0
>
--
Jan Kara
SUSE Labs, CR
t; more
> > +https://lwn.net/Articles/208755/ and http://people.suug.ch/~tgr/libnl/ for
> > more
>
> That other link is 404, no reason to keep it around...
I've already queued a patch that replaces the second link with a working
one...
Honza
--
Jan Kara
SUSE Labs, CR
uota mini-HOWTO, available from
> - <http://www.tldp.org/docs.html#howto>, or the documentation provided
> + <https://www.tldp.org/docs.html#howto>, or the documentation provided
> with the quota tools. Probably the quota support is only useful for
> multi user systems. If unsure, say N.
>
> --
> 2.27.0
>
--
Jan Kara
SUSE Labs, CR
Thanks. The fixes look good. You can add:
Reviewed-by: Jan Kara
Ted, I think this patch has fallen through the cracks...
Honza
> ---
> fs/ext4/extents.c | 10 +-
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
3>] d_alloc+0x21/0xb0 fs/dcache.c:1788
> [<e0349988>] __lookup_hash+0x67/0xc0 fs/namei.c:1441
> [<907d6c36>] filename_create+0xa5/0x1c0 fs/namei.c:3459
> [<25ebf47f>] user_path_create fs/namei.c:3516 [inline]
> [<25ebf47f>] do_symlinkat+0x70/0x180 fs/namei.c:3973
> [<d872d7cc>] do_syscall_64+0x4c/0xe0 arch/x86/entry/common.c:359
> [<5c62d8da>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
--
Jan Kara
SUSE Labs, CR
eudo inodes
> ...
>
> I double checked by reverting this commit on top of v5.8-rc3 and the
> segmentation faults are gone.
We've already reverted that commit today... Thanks for report!
Honza
--
Jan Kara
SUSE Labs, CR
entry, &anon_ops);
> path.mnt = mntget(mnt);
> d_instantiate(path.dentry, inode);
> - file = alloc_file(&path, flags | FMODE_NONOTIFY, fops);
> + file = alloc_file(&path, flags, fops);
> if (IS_ERR(file)) {
> ihold(inode);
> path_put(&path);
--
Jan Kara
SUSE Labs, CR
could get used (possibly accidentally) and so after this
Chromium experience I think we just have to revert the change and live with
generating notification events for pipes to avoid userspace regressions.
Thoughts?
Honza
--
Jan Kara
SUSE Labs, CR
On Mon 29-06-20 08:17:02, Eric Dumazet wrote:
> On 6/16/20 12:47 AM, Jan Kara wrote:
> > On Mon 15-06-20 19:26:38, Amir Goldstein wrote:
> >>> This patch changes alloc_file_pseudo() to always opt out of fsnotify by
> >>> setting FMODE_NONOTIFY flag so that no ch
f and then back to LRU list would
increase the contention on the LRU list locks and generally cost
performance so for short term pins it is not desirable... Otherwise I agree
that conceptually it would make some sence although I'm not sure some
places wouldn't get confused by e.g. page cache
ck in early
> 2018. Even though we don't have the full file lease + pin_user_pages()
> solution in place.
>
> That's because reclaim is what triggers the problems that we saw. And
> with this patch, we bail out of reclaim early.
I agree that with this change, some races will become much less likely for
some usecases. But as you say, it's not a full solution.
Honza
--
Jan Kara
SUSE Labs, CR
t's outside the noise so
> > while marginal, there is still some small benefit to ignoring fsnotify
> > for files allocated via alloc_file_pseudo in some cases.
> >
> > Signed-off-by: Mel Gorman
>
> Reviewed-by: Amir Goldstein
Thanks for the patch Mel and for review Amir! I've added the patch to my
tree with small amendments to the changelog.
Honza
--
Jan Kara
SUSE Labs, CR
e_sb_err'
>
> Fixes: 3b0311e7ca71 ("vfs: track per-sb writeback errors and report them to
> syncfs")
> Signed-off-by: Mauro Carvalho Chehab
Thanks for the fix! It looks good to me. You can add:
Reviewed-by: Jan Kara
f there is a need for this.
I don't think using fsnotify on pipe inodes is sane in any way. You'd
possibly only get the MODIFY or ACCESS events and even those would not be
quite reliable because with pipes stuff like splicing etc. is much more
common and that currently completely bypasses fsnotify subsystem. So
overall I'm fine with completely ignoring fsnotify on such inodes.
Honza
--
Jan Kara
SUSE Labs, CR
ot;)
> Fixes: 021ada7dff22 ("procfs: switch /proc/self away from proc_dir_entry")
> Fixes: 51f0885e5415 ("vfs,proc: guarantee unique inodes in /proc")
> Signed-off-by: "Eric W. Biederman"
Thanks for analysing this! I agree with the analysis and the patch look
arent(struct dentry *dentry, __u32 mask, const void
> *data,
> +extern int __fsnotify_parent(struct dentry *dentry, __u32 mask, const void
> *data,
> int data_type);
> extern void __fsnotify_inode_delete(struct inode *inode);
> extern void __fsnotify_vfsmount_delete(struct vfsmount *mnt);
> @@ -541,7 +541,7 @@ static inline int fsnotify(struct inode *to_tell, __u32
> mask, const void *data,
> return 0;
> }
>
> -static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
> +static inline int __fsnotify_parent(struct dentry *dentry, __u32 mask,
> const void *data, int data_type)
> {
> return 0;
--
Jan Kara
SUSE Labs, CR
s the reference
you pass to it is somewhat subtle and surprising so I think we are better
off getting rid of that.
Honza
--
Jan Kara
SUSE Labs, CR
459.409415]
> [ 459.409679] Freed by task 1262:
> [ 459.410212] __kasan_slab_free+0x129/0x170
> [ 459.410919] kmem_cache_free+0xb2/0x2a0
> [ 459.411564] rcu_process_callbacks+0xbb2/0x2320
> [ 459.412318] __do_softirq+0x225/0x8ac
>
> Fix this by delaying bdput() to th
inode evictions?
You have to have an equivalent of write access to the file to be able to
trigger d_mark_dontcache(). So you can e.g. delete it. Or you could
fadvise / madvise regarding its page cache. I don't see the ability to push
inode out of cache as stronger than the abilities you already have...
Honza
--
Jan Kara
SUSE Labs, CR
rivers/vhost/vhost.c
> has a "pin, write to page, set page dirty, unpin" case.
>
> Add a fifth case, to help explain that there is a general pattern
> that requires pin_user_pages*() API calls.
>
> Cc: Vlastimil Babka
> Cc: Jan Kara
> Cc: Jérôme Glisse
> Cc:
_user_pages.rst
>
> [2] "Explicit pinning of user-space pages":
> https://lwn.net/Articles/807108/
>
> Cc: Michael S. Tsirkin
> Cc: Jason Wang
> Cc: k...@vger.kernel.org
> Cc: virtualizat...@lists.linux-foundation.org
> Cc: net...@vger.kernel.or
On Fri 29-05-20 21:37:50, Martijn Coenen wrote:
> Hi Jan,
>
> On Fri, May 29, 2020 at 5:20 PM Jan Kara wrote:
> > I understand. I have written a fix (attached). Currently its under testing
> > together with other cleanups. If everything works fine, I plan to submit
>
Hello Martinj!
On Wed 27-05-20 10:14:09, Martijn Coenen wrote:
> On Mon, May 25, 2020 at 9:31 AM Jan Kara wrote:
> > Well, most importantly filesystems like ext4, xfs, btrfs don't hold i_rwsem
> > when writing back inode and that's deliberate because of performance. W
sert becomes
> + * If kcalloc failed, max_to_insert becomes
>* zero and it means we only have space for
> * one block
>*/
> --
> 2.9.5
>
--
Jan Kara
SUSE Labs, CR
201 - 300 of 2413 matches
Mail list logo