Honza
> [1] sanity check: people actually do this:
> - https://unix.stackexchange.com/a/17613
> - https://www.g-loaded.eu/2005/11/10/packet-writing-on-cdrw-and-dvdrw-media/
> - https://www.altlinux.org/WritingLargeFilesOnDVD
> -
> https://computador.docow.com/como-usair-dvd-rw-udf-tanto-no-windows-quanto-no-linux.html
> (case 4)
--
Jan Kara
SUSE Labs, CR
mem is going to work. Thanks for
explanation! So can I imagine this as guest mmaping the host file and
providing the mapped range as "NVDIMM pages" to the kernel inside the
guest? Or is it more complex?
Honza
--
Jan Kara
SUSE Labs, CR
art
picking patches for fsdax to my tree if you are too busy but I think your
tree is easier as there are less chances for conflicts etc.
In either case this patch looks OK to me so feel free to add
Reviewed-by: Jan Kara
Honza
&g
moves the reference to the next transaction or drops it. So what you
observe rather seems like some bug in reference counting of journal
heads and your patch isn't going to help.
Honza
--
Jan Kara
SUSE Labs, CR
; resources like the page cache.
Right. Thinking about this I would be more concerned about the fact that
guest can effectively pin amount of host's page cache upto size of the
device/file passed to guest as PMEM, can't it Pankaj? Or is there some QEMU
magic that avoids this?
Honza
--
Jan Kara
SUSE Labs, CR
re rather be some generic way of doing this? Having
virtio_pmem_host_cache_enabled() check in filesystem code just looks like
filesystem sniffing into details is should not care about... Maybe just
naming this (or having a wrapper) dax_dev_map_sync_supported()?
Honza
--
Jan Kara
SUSE Labs, CR
On Tue 08-01-19 12:49:08, Dmitry Vyukov wrote:
> On Tue, Jan 8, 2019 at 12:24 PM Jan Kara wrote:
> >
> > On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> > > On 2019/01/03 2:26, Jan Kara wrote:
> > > > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> >
; jh->b_committed_data = NULL;
> @@ -944,7 +945,6 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> jh->b_frozen_triggers = NULL;
> }
>
> - spin_lock(&journal->j_list_lock);
>
for.
> >
> > The function name fanotify_should_send_event() has also been updated so
> > that it's more relevant to what it has been designed to do.
> >
> > Signed-off-by: Matthew Bobrowski
> > Reviewed-by: Amir Goldstein
> > Signed-off-by: Jan Kara
>
On Tue 08-01-19 19:04:06, Tetsuo Handa wrote:
> On 2019/01/03 2:26, Jan Kara wrote:
> > On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> >> On 2019/01/02 23:40, Jan Kara wrote:
> >>> I had a look into this and the only good explanation for this I have is
> >>&
use below change? If
> > > > > agree, I will send v2 patch.
> > > > >
> > > > > @@ -63,7 +64,11 @@ static int fanotify_get_response(struct
> > > > > fsnotify_group *group,
> > > > >
> > > > > pr_debug(&qu
On Wed 02-01-19 20:55:33, Jerome Glisse wrote:
> On Wed, Dec 19, 2018 at 12:08:56PM +0100, Jan Kara wrote:
> > On Tue 18-12-18 21:07:24, Jerome Glisse wrote:
> > > On Tue, Dec 18, 2018 at 03:29:34PM -0800, John Hubbard wrote:
> > > > OK, so let's take another loo
On Thu 03-01-19 01:07:25, Tetsuo Handa wrote:
> On 2019/01/02 23:40, Jan Kara wrote:
> > I had a look into this and the only good explanation for this I have is
> > that sb->s_blocksize is different from (1 <<
> > sb->s_bdev->bd_inode->i_blkbits).
> &g
On Fri 28-12-18 22:34:13, Tetsuo Handa wrote:
> On 2018/08/06 19:09, Jan Kara wrote:
> > On Tue 31-07-18 00:07:22, Tetsuo Handa wrote:
> >> On 2018/07/21 5:06, Andrew Morton wrote:
> >>> On Fri, 20 Jul 2018 19:36:23 +0900 Tetsuo Handa
> >>> wrote:
>
On Thu 20-12-18 09:33:12, Dave Chinner wrote:
> On Wed, Dec 19, 2018 at 12:35:40PM +0100, Jan Kara wrote:
> > On Wed 19-12-18 21:28:25, Dave Chinner wrote:
> > > On Tue, Dec 18, 2018 at 08:03:29PM -0700, Jason Gunthorpe wrote:
> > > > On Wed, Dec 19, 2018 at 10:42:
On Fri 14-12-18 10:56:13, Jan Kara wrote:
> On Thu 13-12-18 01:06:29, Javier Barrio wrote:
> > Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic
> > helpers for quotaon and quotaoff) made possible to call
> > quotactl(Q_XQUOTAON/OFF) on ext4 filesystems with
On Wed 19-12-18 10:42:54, Dave Chinner wrote:
> On Tue, Dec 18, 2018 at 11:33:06AM +0100, Jan Kara wrote:
> > On Mon 17-12-18 08:58:19, Dave Chinner wrote:
> > > On Fri, Dec 14, 2018 at 04:43:21PM +0100, Jan Kara wrote:
> > > > Yes, for filesystem it is too late. B
lving the problems with RDMA... Because
nobody wants to go through those couple hundred get_user_pages() users in
the kernel twice...
Honza
--
Jan Kara
SUSE Labs, CR
again".
You are conflating two things here. Bounce buffer (or a way to stop DMA
from happening) is needed because think what happens when RAID5 computes
its stripe checksum while someone modifies the data through DMA. Checksum
mismatch and all fun arising from that.
Notifying filesystem about th
appily. The how to handle truncate
> is a per existing GUP user discussion to see what they want to do for
> that.
>
> Obviously a bit deeper analysis of all spot that use mapcount is needed
> to check that we are not breaking anything but from the top of my head
> i can not think of anything bad (migrate will abort and other things will
> assume the page is mapped even it is only in hardware page table, ...).
Hum, grepping for page_mapped() and page_mapcount(), this is actually going
to be non-trivial to get right AFAICT.
Honza
--
Jan Kara
SUSE Labs, CR
On Mon 17-12-18 08:58:19, Dave Chinner wrote:
> On Fri, Dec 14, 2018 at 04:43:21PM +0100, Jan Kara wrote:
> > Hi!
> >
> > On Thu 13-12-18 08:46:41, Dave Chinner wrote:
> > > On Wed, Dec 12, 2018 at 10:03:20AM -0500, Jerome Glisse wrote:
> > > > On Mon,
This writeback decision is pretty much independent from the mechanism by
which we are going to identify pinned pages. Whether that's going to be
separate counter in struct page, using page->_mapcount, or separately
allocated data structure as you know promote.
I currently like the most the _mapcount suggestion from Jerome but I'm not
really attached to any solution as long as it performs reasonably and
someone can make it working :) as I don't have time to implement it at
least till January.
Honza
--
Jan Kara
SUSE Labs, CR
e Chinner wrote:
> > >>> On Wed, Dec 12, 2018 at 04:59:31PM -0500, Jerome Glisse wrote:
> > >>>> On Thu, Dec 13, 2018 at 08:46:41AM +1100, Dave Chinner wrote:
> > >>>>> On Wed, Dec 12, 2018 at 10:03:20AM -0500, Jerome Glisse wrote:
> > &
> Signed-off-by: Horst Schirmeier
The patch looks good to me. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/inode.c | 9 +++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/inode.c
Hi!
On Thu 13-12-18 08:46:41, Dave Chinner wrote:
> On Wed, Dec 12, 2018 at 10:03:20AM -0500, Jerome Glisse wrote:
> > On Mon, Dec 10, 2018 at 11:28:46AM +0100, Jan Kara wrote:
> > > On Fri 07-12-18 21:24:46, Jerome Glisse wrote:
> > > So this approach doesn't
d copy old real page contents to
the dummy page contents. Then it will be equivalent to the current behavior
and if the hardware can do the swapping, then I'm fine with such
solution...
Honza
--
Jan Kara
SUSE Labs, CR
- return (cmd == Q_QUOTAON) || (cmd == Q_QUOTAOFF);
> + return (cmd == Q_QUOTAON) || (cmd == Q_QUOTAOFF) ||
> + (cmd == Q_XQUOTAON) || (cmd == Q_XQUOTAOFF);
> }
>
> /*
> -- 2.17.1
>
>
--
Jan Kara
SUSE Labs, CR
we can now return
> VM_FAULT_MAJOR|VM_FAULT_RETRY whereas we used to return ENOMEM.
Yes, but once we've dropped mmap_sem, there's no way safely return -ENOMEM.
So VM_FAULT_RETRY is really the only option to tell the caller that
mmap_sem is not held anymore...
So the patch looks good to me now. You can add:
Reviewed-by: Jan Kara
Honza
--
Jan Kara
SUSE Labs, CR
ched
patch fixes the issue for me.
Unless someone objects, I'm going to merge the patch to my tree and push it
to Linus in the coming merge window.
Honza
--
Jan Kara
SUSE Labs, CR
>From d524d42ca22801418c3f2c1cb1bf6e79c1
permission event FAN_OPEN_EXEC will get properly generated which I
guess is desirable (support for it is sitting in my tree waiting for the
merge window) - adding some audit people involved in FAN_OPEN_EXEC to
CC. Just an idea...
Honza
--
Jan Kara
SUSE Labs, CR
in but an error here causes us to lose track of
> that elevated refcount?
Yeah, that looks like a bug to me as well.
> > }
> >
> > - if (!lock_page_or_retry(page, vmf->vma->vm_mm, vmf->flags)) {
> > + if (!lock_page_maybe_drop_mmap(vmf, page, &fpin)) {
> > put_page(page);
> > return ret | VM_FAULT_RETRY;
> > }
And here can be the same problem. Generally if we went through 'goto
retry_find', we may have file ref already taken but some exit paths don't
drop that ref properly...
Honza
--
Jan Kara
SUSE Labs, CR
ff-by: Josef Bacik
The patch looks good. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> mm/filemap.c | 28 ++--
> 1 file changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/mm
code paths like in task2.
> But that not what I see for many wait_on_page_writeback() users: it usally
> called with the page locked. I see it for truncate, shmem, swapfile,
> splice...
>
> Maybe the problem is within task2 codepath after all?
So ->writepages() methods in filesys
really possible when DAX is in use...
Honza
--
Jan Kara
SUSE Labs, CR
On Tue 11-12-18 11:08:53, Josef Bacik wrote:
> On Tue, Dec 11, 2018 at 10:40:34AM +0100, Jan Kara wrote:
> > > The lock_page_or_retry() case in particular gets hit a lot with
> > > multi-threaded applications that got paged out because of heavy memory
> > > pressure.
On Mon 10-12-18 13:44:39, Josef Bacik wrote:
> On Fri, Dec 07, 2018 at 12:01:38PM +0100, Jan Kara wrote:
> > On Fri 30-11-18 14:58:11, Josef Bacik wrote:
> > > @@ -2433,9 +2458,32 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
> > >
not guaranteed. So, I
> believe the intent, Willy correct me if I am wrong, was to keep all
> waits "exclusive" for some sense of symmetry, but this one can and
> should be a non-exclusive wait.
Agreed. I didn't realize this when reviewing Matthew's patch and
misunderstood his comment to this end.
Honza
--
Jan Kara
SUSE Labs, CR
s no help...
So this approach doesn't look like a win to me over using counter in struct
page and I'd rather try looking into squeezing HMM public page usage of
struct page so that we can fit that gup counter there as well. I know that
it may be easier said than done...
Honza
--
Jan Kara
SUSE Labs, CR
that is not a regular file, right? Directories cannot be
written anyway and for pipes and character devices same logic applies as
for block devices.
Honza
--
Jan Kara
SUSE Labs, CR
On Fri 07-12-18 11:24:47, Alexander Lochmann wrote:
> Am 05.12.18 um 16:32 schrieb Jan Kara:
> >
> > Thinking more about this I'm not sure if this is actually the right
> > solution. Because for example the write(2) can set S_NOSEC flag wrongly
> > when it would
fference. If there's no
> difference do you want me to drop this? Thanks,
If there's no difference, I'd like to drop this as well. It just
complicates the fault state handling which is already complex enough.
Honza
--
Jan Kara
SUSE Labs, CR
is allowed in do_user_addr_fault(). So the second time we get to
filemap_fault(), we will not have FAULT_FLAG_ALLOW_RETRY set and thus do
blocking locking. So I think your code needs to catch common cases you
observe in practice but not those super-rare corner cases...
Honza
--
Jan Kara
SUSE Labs, CR
On Fri 07-12-18 10:57:50, Jan Kara wrote:
> On Fri 30-11-18 14:58:10, Josef Bacik wrote:
> > If we do not have a page at filemap_fault time we'll do this weird
> > forced page_cache_read thing to populate the page, and then drop it
> > again and loop around and find it.
also don't have to unlock and then lock again the page... And you can still
delete all the code you've deleted.
Honza
--
Jan Kara
SUSE Labs, CR
_t fanotify_read(struct file *file, char
> > __user *buf,
> > continue;
> > }
> >
> > - ret = copy_event_to_user(group, kevent, buf);
> > + ret = copy_event_to_user(group, kevent, buf, count);
> > if (unlikely(ret == -EOPENSTALE)) {
> > /*
> > * We cannot report events with stale fd so drop it.
> > --
> > 2.17.1
> >
> >
> > --
> > Kees Cook
--
Jan Kara
SUSE Labs, CR
, count);
> if (unlikely(ret == -EOPENSTALE)) {
> /*
>* We cannot report events with stale fd so drop it.
> --
> 2.17.1
>
>
> --
> Kees Cook
--
Jan Kara
SUSE Labs, CR
applies), and the
> dma_pinned_count.
So single page bit could help you with performance. In 99% of cases there's
just one reference from GUP. So if you could store that info in page flags,
you could safe yourself a relatively expensive removal from LRU and putting
it back to make space in struct page for proper refcount. But since you
report that the performance isn't that horrible, I'd leave this idea on a
backburner. We can always implement it later in case we find in future we
need to improve the performance.
Honza
--
Jan Kara
SUSE Labs, CR
. doesn't shaming conflict with the CoC?
> Just remove this sentence - it adds no valuable information.
> While at it, please fix grammar "a mark".
Agreed, just send a patch to remove that sentence and fixup the article.
Btw, for the fun of it Eric has added this comment himself so I don't think
this would really qualify as a CoC violation ;)
Honza
--
Jan Kara
SUSE Labs, CR
On Tue 27-11-18 17:57:12, PanBian wrote:
> On Tue, Nov 27, 2018 at 10:25:51AM +0100, Jan Kara wrote:
> > On Sun 25-11-18 08:15:23, Pan Bian wrote:
> > > After calling dput(new_dentry), new_dentry is passed to fsnotify_move.
> > > This may result in a use-after-free bu
On Fri 23-11-18 16:30:43, Colin King wrote:
> From: Colin Ian King
>
> There are several lines that are indented too far, clean these
> up by removing the tabs.
>
> Signed-off-by: Colin Ian King
The patch looks good. You can add:
Rev
new_is_dir, NULL, new_dentry);
> }
> }
> + dput(new_dentry);
> release_dentry_name_snapshot(&old_name);
>
> return error;
> --
> 2.7.4
>
>
--
Jan Kara
SUSE Labs, CR
e(bh);
> if (!(bh && header == HDR(bh)))
> kfree(header);
> + brelse(bh);
> up_write(&EXT2_I(inode)->xattr_sem);
>
> return error;
> --
> 2.7.4
>
>
--
Jan Kara
SUSE Labs, CR
n Bian
> Fixes: daf647d2dd5("ext4: add lockdep annotations for i_data_sem")
Thanks for the fix! The patch looks good. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> fs/ext4/super.c | 2 +-
> 1 file changed, 1 insert
On Fri 23-11-18 16:40:53, Colin King wrote:
> From: Colin Ian King
>
> There is a statement that is indented with spaces, replace it with
> a tab.
>
> Signed-off-by: Colin Ian King
Looks good. You can add:
Rev
a proper
> internface. While this has been worked on and it will be fixed properly,
> it seems that our wording could see some refinement and be more vocal
> about semantic aspect of these flags as well.
>
> Cc: Jan Kara
> Cc: Dan Williams
> Cc: David Rientjes
> Signed-off-by:
On Fri 16-11-18 14:33:14, Sudip Mukherjee wrote:
> > From e1a7680b960fe25821f2419b4c0b1215f8ab2f9b Mon Sep 17 00:00:00 2001
> > From: Jan Kara
> > Date: Fri, 16 Nov 2018 13:43:17 +0100
> > Subject: [PATCH] udf: Allow mounting volumes with incorrect identification
>
(s_len >= i_len) {
> - pr_err("incorrect dstring lengths (%d/%d)\n",
> -s_len, i_len);
> - return -EINVAL;
> - }
> - }
>
> return udf_name_from_CS0(sb, utf_o, o_len, o
It creates unneeded negative feelings to those who wanted to be in
> > > > this list, but for any reason they don't. Those reviewers will feel
> > > > "untrusted".
> > >
> > > Yeah, perhaps something like "most active reviewers" would sound
> > > better.
> >
> > I would recommend to remove this section at all.
> > New maintainers won't come out of blue, but will be come
> > from existing community and such individuals for sure will see
> > and judge by themselves to whom they trust and to whom not.
>
> I see your point, but, on the other hand, having a list with the ones
> that are actively doing reviews helps newcomers.
How exactly? Do you expect people to CC patches to these directly? And if
yes, why is not picking patches from the mailing list good enough?
Honza
--
Jan Kara
SUSE Labs, CR
ry.
>
> Cc: Matthew Wilcox
> Cc: Michal Hocko
> Cc: Christopher Lameter
> Cc: Jason Gunthorpe
> Cc: Dan Williams
> Cc: Jan Kara
> Signed-off-by: John Hubbard
> ---
> include/linux/mm.h | 19 +--
> mm/gup.c | 55 +++
On Mon 12-11-18 14:40:06, Andrey Ryabinin wrote:
>
>
> On 11/12/18 2:31 PM, Jan Kara wrote:
> > On Mon 12-11-18 12:57:34, Pavel Tikhomirov wrote:
> >> If all pages are deleted from the mapping by memory reclaim and also
> >> moved to the cleancache:
> >>
already ready for
> nrpages == 0 && nrexceptional == 0 case and just invalidates inode.
>
> Fixes: commit 91b0abe36a7b ("mm + fs: store shadow entries in page cache")
> To: Andrew Morton
> Cc: Johannes Weiner
> Cc: Mel Gorman
> Cc: Jan Kara
> Cc:
On Wed 31-10-18 22:13:00, Vasily Averin wrote:
> Fixes 3f2571c1f91f ("ext4: factor out xattr moving")
> cc: Jan Kara
> however issue was present in original ext4_expand_extra_isize_ea()
> Fixes 6dd4ee7cab7e ("ext4: Expand extra_inodes space per ...") # 2.6.23
>
On Tue 06-11-18 13:47:15, Dave Chinner wrote:
> On Mon, Nov 05, 2018 at 04:26:04PM -0800, John Hubbard wrote:
> > On 11/5/18 1:54 AM, Jan Kara wrote:
> > > Hmm, have you tried larger buffer sizes? Because synchronous 8k IO isn't
> > > going to max-out NVME iops b
tf("Usage: %s \n", argv[0]);
> return 1;
> }
> char *filename = argv[1];
> unsigned iterations = strtoul(argv[2], 0, 0);
>
> /* Not using O_SYNC for now, anyway. */
> int fd = open(filename, O_DIRECT | O_RDWR);
> if (fd < 0) {
> printf("Failed to open %s: %m\n", filename);
> return 2;
> }
>
> printf("File: %s, data size: %u, interations: %u\n",
> filename, FULL_DATA_SIZE, iterations);
>
> for (int count = 0; count < iterations; count++) {
> read_and_write(fd, FULL_DATA_SIZE, buf);
> }
>
> close(fd);
> return 0;
> }
>
>
> thanks,
> --
> John Hubbard
> NVIDIA
--
Jan Kara
SUSE Labs, CR
D(CONFIG_CGROUP_WRITEBACK) check here.
It just looks too arbitrary. Could we perhaps change the code like
struct dirty_throttle_control * const mdtc = &mdtc_stor;
And then replace checks for !mtdc in the function to !mdtc_valid(mdtc)?
That is the same thing as currently and it should make it obvious to the
compiler as well as human what is going on... Tejun?
Honza
--
Jan Kara
SUSE Labs, CR
On Sun 04-11-18 23:17:58, John Hubbard wrote:
> On 10/22/18 12:43 PM, Jason Gunthorpe wrote:
> > On Thu, Oct 11, 2018 at 06:23:24PM -0700, John Hubbard wrote:
> >> On 10/11/18 6:20 AM, Jason Gunthorpe wrote:
> >>> On Thu, Oct 11, 2018 at 10:49:29AM +0200, Jan Kara
l function call would matter (correct me if I'm wrong here). So
I'd rather see these tables and associated functions in some C file.
> +static inline unsigned char fs_dtype(int filetype)
This function name is not very descriptive and consistent with the other
two. It should be like fs_ftype_to_dtype(), right?
> +{
> + if (filetype >= FT_MAX)
> + return DT_UNKNOWN;
> +
> + return fs_dtype_by_ftype[filetype];
> +}
Otherwise the patch looks good to me.
Honza
--
Jan Kara
SUSE Labs, CR
d a look at the patches which is a good sign
;) and if patches are reviewed by respective fs maintainers.
Honza
--
Jan Kara
SUSE Labs, CR
On Thu 11-10-18 20:53:34, John Hubbard wrote:
> On 10/11/18 6:23 PM, John Hubbard wrote:
> > On 10/11/18 6:20 AM, Jason Gunthorpe wrote:
> >> On Thu, Oct 11, 2018 at 10:49:29AM +0200, Jan Kara wrote:
> >>
> >>>> This is a real worry. If someone uses a mis
d not want to change the UAPI _ALL_ constants.
> This is how we plan to solve it:
> https://github.com/amir73il/linux/commit/8c2b1acadb88ee4505ccc8bfdc665863111fb4cc
Yeah, after fanotify experience I find foo_ALL constants useless if not
dangerous for userspace. Kernel internal constants like this are IMO
useful.
Honza
--
Jan Kara
SUSE Labs, CR
end of
> > get_user_pages(),
> > unceremoniously rips the pages out of the LRU, as a prerequisite to using
> > either of the page->dma_pinned_* fields.
>
> How is that safe? If you've ripped the page out of the LRU, it's no
> longer being tracked by the page cache aging and reclaim algorithms.
> Patch 6 doesn't appear to put these pages back in the LRU, either,
> so it looks to me like this just dumps them on the ground after the
> gup reference is dropped. How do we reclaim these page cache pages
> when there is memory pressure if they aren't in the LRU?
Yeah, that's a bug in patch 6/6 (possibly in ClearPageDmaPinned). It should
return the page to the LRU from put_user_page().
Honza
--
Jan Kara
SUSE Labs, CR
Hi Dmirty!
On Mon 15-10-18 14:29:14, Dmitry Vyukov wrote:
> On Mon, Oct 15, 2018 at 2:15 PM, Jan Kara wrote:
> > Hello,
> >
> > On Mon 15-10-18 04:32:02, syzbot wrote:
> >> syzbot found the following crash on:
> >>
> >> HEAD commit:90ad18418c2
3
> Sending NMI from CPU 0 to CPUs 1:
> NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0x6/0x10
> arch/x86/include/asm/irqflags.h:57
>
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
--
Jan Kara
SUSE Labs, CR
t drops below
bare minimum for given user pin count which would be able to catch some
issues but it won't be 100% reliable. So at this point I'm more leaning
towards making get_user_pages() return a different type than just
struct page * to make it much harder for refcount to go wrong...
Honza
--
Jan Kara
SUSE Labs, CR
On Wed 10-10-18 16:23:35, John Hubbard wrote:
> On 10/10/18 1:59 AM, Jan Kara wrote:
> > On Tue 09-10-18 17:42:09, John Hubbard wrote:
> >> On 10/8/18 5:14 PM, Andrew Morton wrote:
> >>> Also, maintainability. What happens if someone now uses put_page() by
> &
ple and pass user_page references all around.
So this way we could maintain reasonable confidence that refcounts didn't
get mixed up. Thoughts?
Honza
--
Jan Kara
SUSE Labs, CR
ser_page(pages[index]);
> +}
> +EXPORT_SYMBOL(put_user_pages);
> +
> /*
> * get_kernel_pages() - pin kernel pages in memory
> * @kiov:An array of struct kvec structures
> --
> 2.19.1
>
--
Jan Kara
SUSE Labs, CR
we'd have to propagate this different type e.g. through the IO path so
that IO completion routines could properly call put_user_pages(). So I'm
not sure it's really worth it.
Honza
--
Jan Kara
SUSE Labs, CR
gt; CC: Doug Ledford
> CC: Jason Gunthorpe
> CC: Mike Marciniszyn
> CC: Dennis Dalessandro
> CC: Christian Benvenuti
>
> CC: linux-r...@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux...@kvack.org
> Signed-off-by: John Hubbard
Looks good to me. You can
.cz
> Follow-up discussions.
>
> CC: Matthew Wilcox
> CC: Michal Hocko
> CC: Christopher Lameter
> CC: Jason Gunthorpe
> CC: Dan Williams
> CC: Jan Kara
> CC: Al Viro
> CC: Jerome Glisse
> CC: Christoph Hellwig
> CC: Ralph Campbell
> Signed-of
before we unlock j_list_lock as
otherwise we can race with jbd2_journal_try_to_free_buffers(). I'll send a
fix.
Honza
--
Jan Kara
SUSE Labs, CR
ts);
> + smp_mb__after_atomic();
> + }
> +
> WRITE_ONCE(node->head, head);
> list_add(&node->list, &head->list);
> spin_unlock(&head->lock);
> @@ -212,6 +250,15 @@ void dlock_lists_del(struct dlock_list_node *node)
> s
furthermore it makes the code author think more whether he needs
set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
and hoping the function magically does the right thing for him.
Honza
--
Jan Kara
SUSE Labs, CR
ut_user_pages() for symmetry with
put_user_page()? I don't really care too much but it would look natural to
me.
Honza
--
Jan Kara
SUSE Labs, CR
or and what we declare as
unsupported as DAX is new and it was never really used with RDMA and stuff.
> > I'm also pretty sure we already explained this a long time ago when the
> > issue came up last year, so I'm not sure why this is even still
> > contentious.
>
> I suspect that it's simply because these discussions have been
> spread across different groups and not everyone is aware of what the
> other groups are discussing...
Yes, I have the same feeling.
Honza
--
Jan Kara
SUSE Labs, CR
hile reclaiming likely make sense,
> the blocking free buffer maybe not.
Well, as John wrote, these page refcount are fragile (and actually
filesystem dependent as some filesystems hold page reference from their
page->private data and some don't). So I think we really need a new
reliable mechanism for tracking page references from GUP. And John works
towards that.
Honza
--
Jan Kara
SUSE Labs, CR
On Thu 27-09-18 22:44:12, Souptick Joarder wrote:
> These codes can be replaced with new inline vmf_error().
>
> Signed-off-by: Souptick Joarder
Looks good. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> m
fsnotify_marks if they happen to be cache cold.
Or it could be just code layout differences (i.e., compiler is not able to
optimize resulting code as good or the code layout just happens to align
with cache lines in a wrong way or something like that). Anyway, without
being able to reproduce this and do detailed comparison of perf profiles I
don't think we'll be able to tell.
Honza
--
Jan Kara
SUSE Labs, CR
would have not reached
> the extra fsnotify_first_mark() call.
> Can you confirm or disprove the assumption that an fanotify mount mark
> is present during the test?
This would be good to know.
Honza
>
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 422fbc6dffde..8d45d82e09ff 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -246,6 +246,9 @@ static struct fsnotify_mark
> *fsnotify_first_mark(struct fsnotify_mark_connector
> struct fsnotify_mark_connector *conn;
> struct hlist_node *node = NULL;
>
> + if (!*connp)
> + return NULL;
> +
> conn = srcu_dereference(*connp, &fsnotify_mark_srcu);
> if (conn)
> node = srcu_dereference(conn->list.first,
> &fsnotify_mark_srcu);
--
Jan Kara
SUSE Labs, CR
ent_env
> CS: 0010 DS: ES: CR0: 80050033
> kobject: 'loop3' (91672ed1): fill_kobj_path: path =
> '/devices/virtual/block/loop3'
> CR2: 001b3202d000 CR3: 0001cad9f000 CR4: 001406e0
> DR0: DR1: 0000
s it's
currently userspace triggerable Oops if you try hard enough. And with RDMA
you don't even have to try that hard. Properly dealing with private
mappings should not be that hard once the infrastructure is there I hope
but I didn't seriously look into that. I've added Miklos and John to CC as
they are interested as well. John was working on fixing this problem -
https://lkml.org/lkml/2018/7/9/158 - but I didn't hear from him for quite a
while so I'm not sure whether it died off or what's the current situation.
Honza
--
Jan Kara
SUSE Labs, CR
> Signed-off-by: Mauro Carvalho Chehab
Looks good to me. Thanks for fixing this up. You can add:
Reviewed-by: Jan Kara
Honza
> ---
> Documentation/filesystems/dax.txt | 2 +-
> Documentation/filesy
>do_symlinkat+0x242/0x2d0 fs/namei.c:4154
>__do_sys_symlink fs/namei.c:4173 [inline]
>__se_sys_symlink fs/namei.c:4171 [inline]
>__x64_sys_symlink+0x59/0x80 fs/namei.c:4171
>do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>entry_SYSCALL_64_after_hwframe+0x49/0xbe
Honza
--
Jan Kara
SUSE Labs, CR
000246 R12:
>
> [ 24.010839] R13: R14: 7ffdd66b1a58 R15:
>
> [ 24.011020]
> [ 24.011147] Allocated by task 0:
> [ 24.011209] (stack is not available)
> [ 24.011277]
> [ 24.011314] Freed by task 0:
> [ 24.011359] (stack is not available)
> [ 24.011413]
> [ 24.011457] The buggy address belongs to the object at 880067e82100
> [ 24.011457] which belongs to the cache kmalloc-16 of size 16
> [ 24.011662] The buggy address is located 0 bytes inside of
> [ 24.011662] 16-byte region [880067e82100, 880067e82110)
> [ 24.011839] The buggy address belongs to the page:
> [ 24.012064] page:ea00019fa080 count:1 mapcount:0
> mapping:88006c001b40 index:0x0
> [ 24.012318] flags: 0x1000100(slab)
> [ 24.012614] raw: 01000100 dead0100 dead0200
> 88006c001b40
> [ 24.012744] raw: 80800080 0001
>
> [ 24.012991] page dumped because: kasan: bad access detected
> [ 24.013105]
> [ 24.013162] Memory state around the buggy address:
> [ 24.013453] 880067e82000: fb fb fc fc 00 00 fc fc 00 00 fc fc 00 00
> fc fc
> [ 24.013581] 880067e82080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc
> [ 24.013700] >880067e82100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc
> [ 24.013851]^
> [ 24.013912] 880067e82180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc
> [ 24.014012] 880067e82200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> fc fc
> [ 24.014132]
> ==
> [ 24.014250] Disabling lock debugging due to kernel taint
> mount: mounting /dev/sda on /mnt failed: Invalid argument
> [ 24.027931] exe (1090) used greatest stack depth: 19824 bytes left
>
>
>
> BusyBox v1.27.2 (Ubuntu 1:1.27.2-2ubuntu3) built-in shell (ash)
> Enter 'help' for a list of built-in commands.
>
> /bin/sh: can't access tty; job control turned off
> / # [6n
--
Jan Kara
SUSE Labs, CR
t error codes back
and you don't know in advance which are interesting.
One solution for passing error codes we could use with vm_fault_t is a
scheme similar to ERR_PTR. So we can store full error code in vm_fault_t
and still have a plenty of space for the special VM_FAULT_ return codes...
With that scheme converting block_page_mkwrite() would be trivial.
Honza
--
Jan Kara
SUSE Labs, CR
On Thu 06-09-18 00:37:06, Souptick Joarder wrote:
> On Wed, Sep 5, 2018 at 2:25 PM Jan Kara wrote:
> >
> > On Wed 05-09-18 00:13:02, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit
On Wed 05-09-18 15:20:16, Souptick Joarder wrote:
> On Wed, Sep 5, 2018 at 2:25 PM Jan Kara wrote:
> >
> > On Wed 05-09-18 00:13:02, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit
now.
Souptick, can I ask you to run 'fstests' for at least common filesystems
like ext4, xfs, btrfs when you change generic filesystem code please? That
would catch a bug like this immediately. Thanks.
Honza
--
Jan Kara
SUSE Labs, CR
ode'
>
> Fixes: 073931017b49d ("posix_acl: Clear SGID bit when setting file
> permissions")
>
> Signed-off-by: Randy Dunlap
> Cc: Jan Kara
> Cc: Andreas Gruenbacher
> Cc: Alexander Viro
> Cc: linux-fsde...@vger.kernel.org
> Acked-by: Andreas Gru
;
>
> ---
> This bug is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkal...@googlegroups.com.
>
> syzbot will keep track of this bug report. See:
> https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with
> syzbot.
> syzbot can test patches for this bug, for details see:
> https://goo.gl/tpsmEJ#testing-patches
>
--
Jan Kara
SUSE Labs, CR
601 - 700 of 2969 matches
Mail list logo