Re: [PATCH] nfs lockd reclaimer: Convert to kthread API

2007-04-19 Thread Dave Hansen
On Thu, 2007-04-19 at 17:19 -0400, Trond Myklebust wrote: > > With pid namespaces all kernel threads will disappear so how do > > we cope with the problem when the sysadmin can not see the kernel > > threads? Do they actually always disappear, or do we keep them in the init_pid_namespace? -- Dave

controlling mmap()'d vs read/write() pages

2007-03-20 Thread Dave Hansen
On Sun, 2007-03-18 at 11:42 -0600, Eric W. Biederman wrote: > Dave Hansen <[EMAIL PROTECTED]> writes: > > To me, a process sitting there doing constant reads of 10 pages has the > > same overhead to the VM as a process sitting there with a 10 page file > > mmaped, and r

Re: [PATCH 1/7] Introduce the pagetable_operations and associated helper macros.

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > > +#define has_pt_op(vma, op) \ > + ((vma)->pagetable_ops && (vma)->pagetable_ops->op) > +#define pt_op(vma, call) \ > + ((vma)->pagetable_ops->call) Can you get rid of these macros? I think they make it a wee bit harder to read

Re: [PATCH 4/7] unmap_page_range for hugetlb

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > Signed-off-by: Adam Litke <[EMAIL PROTECTED]> > --- > > fs/hugetlbfs/inode.c|3 ++- > include/linux/hugetlb.h |4 ++-- > mm/hugetlb.c| 12 > mm/memory.c | 10 -- > 4 files changed,

Re: [PATCH 0/7] [RFC] hugetlb: pagetable_operations API (V2)

2007-03-20 Thread Dave Hansen
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote: > For the common case (vma->pagetable_ops == NULL), we do almost the > same thing as the current code: load and test. The third instruction > is different in that we jump for the common case instead of jumping in > the hugetlb case. I don't thi

Re: controlling mmap()'d vs read/write() pages

2007-03-23 Thread Dave Hansen
On Fri, 2007-03-23 at 04:12 -0600, Eric W. Biederman wrote: > Would any of them work on a system on which every filesystem was on > ramfs, and there was no swap? If not then they are not memory attacks > but I/O attacks. I truly understand your point here. But, I don't think this thought exercis

Re: 2.6.21-rc2-mm2 hang

2007-03-07 Thread Dave Hansen
I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2. It manifests itself by the waitpid06 test in LTP hanging. This is very, very reproducible in about 5 seconds by adding '-s wait' to the ltp command line. I see 4 waitpid06 processes on my 4-way machine spinning in userspace. But, the weird pa

Re: 2.6.21-rc2-mm2 hang

2007-03-07 Thread Dave Hansen
On Wed, 2007-03-07 at 14:16 -0800, Siddha, Suresh B wrote: > On Wed, Mar 07, 2007 at 02:12:16PM -0800, Dave Hansen wrote: > > I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2. It manifests > > itself by the waitpid06 test in LTP hanging. This is very, very > > repro

Re: [PATCH 0/2] resource control file system - aka containers on top of nsproxy!

2007-03-07 Thread Dave Hansen
On Wed, 2007-03-07 at 15:59 -0600, Serge E. Hallyn wrote: > Space saving was the only reason for nsproxy to exist. > > Now of course it also provides the teensiest reduction in # instructions > since every clone results in just one reference count inc for the > nsproxy rather than one for each nam

Re: [patch] add file position info to proc

2007-03-27 Thread Dave Hansen
On Sun, 2007-03-25 at 15:45 -0800, Andrew Morton wrote: > On Sat, 24 Mar 2007 23:04:09 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote: > > > This patch adds support for finding out the current file position, > > open flags and possibly other info in the future. > > > > These new entries are added

Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 08:54 -0700, Christoph Lameter wrote: > > BTW there is no guarantee the node size is a multiple of 128MB so > > you likely need to handle the overlap case. Otherwise we can > > get cache corruptions > > How does sparsemem handle that? It doesn't. :) In practice, this situ

Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 08:37 -0700, Christoph Lameter wrote: > You want a benchmark to prove that the removal of memory references and > code improves performance? Yes, please. ;) I completely agree, it looks like it should be faster. The code certainly has potential benefits. But, to add this

Re: [PATCH 1/2] Generic Virtual Memmap suport for SPARSEMEM

2007-04-02 Thread Dave Hansen
First of all, nice set of patches. On Sat, 2007-03-31 at 23:10 -0800, Christoph Lameter wrote: > --- linux-2.6.21-rc5-mm2.orig/include/asm-generic/memory_model.h > 2007-03-31 22:47:14.0 -0700 > +++ linux-2.6.21-rc5-mm2/include/asm-generic/memory_model.h 2007-03-31 > 22:59:35.0

Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 13:30 -0700, Christoph Lameter wrote: > On Mon, 2 Apr 2007, Dave Hansen wrote: > > I completely agree, it looks like it should be faster. The code > > certainly has potential benefits. But, to add this neato, apparently > > more performant feature, we

Re: [PATCH 1/2] Generic Virtual Memmap suport for SPARSEMEM

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 14:00 -0700, Christoph Lameter wrote: > On Mon, 2 Apr 2007, Dave Hansen wrote: > > > + } else > > > + return __alloc_bootmem_node(NODE_DATA(node), size, size, > > > + __pa(MAX_DMA_ADDRESS)); > > >

Re: [PATCH 1/2] Generic Virtual Memmap suport for SPARSEMEM

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 14:31 -0700, Christoph Lameter wrote: > On Mon, 2 Apr 2007, Dave Hansen wrote: > > > > > Hmmm. Can we combine this with sparse_index_alloc()? Also, why not > > > > just use the slab for this? > > > > > > Use a slab for p

Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 14:28 -0700, Christoph Lameter wrote: > I do not care what its called as long as it > covers all the bases and is not a glaring performance regresssion (like > SPARSEMEM so far). I honestly don't doubt that there are regressions, somewhere. Could you elaborate, and perhap

Re: [PATCH 1/2] Generic Virtual Memmap suport for SPARSEMEM

2007-04-02 Thread Dave Hansen
On Mon, 2007-04-02 at 14:53 -0700, Christoph Lameter wrote: > > > Well think about how to handle the case that the allocatiopn of a page > > > table page or a vmemmap block fails. Once we have that sorted out then we > > > can cleanup the higher layers. > > > > I think it is best to just complet

[PATCH] x86, fpu: do not BUG_ON() in early FPU code

2016-07-20 Thread Dave Hansen
From: Dave Hansen I don't think it is really possible to have a system where CPUID enumerates support for XSAVE but that it does not have FP/SSE (they are "legacy" features and always present). But, I did manage to hit this case in qemu when I enabled its somewhat shaky XSAV

[PATCH] [v2] x86, fpu: do not BUG_ON() in early FPU code

2016-07-20 Thread Dave Hansen
From: Dave Hansen I don't think it is really possible to have a system where CPUID enumerates support for XSAVE but that it does not have FP/SSE (they are "legacy" features and always present). But, I did manage to hit this case in qemu when I enabled its somewhat shaky XSAV

Re: [PATCH] make __section_nr more efficient

2016-07-20 Thread Dave Hansen
On 07/19/2016 09:18 PM, Zhou Chengming wrote: > When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get > the section number with a subtraction directly. Does this actually *do* anything? It was a long time ago, but if I remember correctly, the entire loop in __section_nr() goes away beca

Re: [PATCH] make __section_nr more efficient

2016-07-21 Thread Dave Hansen
On 07/20/2016 06:55 PM, zhouchengming wrote: > Thanks for your reply. I don't know the compiler will optimize the loop. > But when I see the assembly code of __section_nr, it seems to still have > the loop in it. Oh, well. I guess it got broken in the last decade or so. Your patch looks good to

Re: Minor PKRU bug?

2016-07-21 Thread Dave Hansen
On 07/12/2016 03:59 PM, Andy Lutomirski wrote: > On Tue, Jul 12, 2016 at 3:55 PM, H. Peter Anvin wrote: >> On 07/12/16 08:32, Dave Hansen wrote: >>> On 07/09/2016 02:27 PM, Andy Lutomirski wrote: >>>> is_prefetch in arch/x86/mm/fault.c can be called on a user addres

Re: Minor PKRU bug?

2016-07-21 Thread Dave Hansen
On 07/21/2016 02:48 PM, H. Peter Anvin wrote: >> >I like it, except that reading just a single byte is a bit silly. >> >OTOH, that's what the current code needs and I see no fundamental >> >reason to change it until there's a real user. >>> > The thing is that we can't actually test this, since th

[RFC][PATCH 2/2] x86, pkeys: allow configuration of init_pkru

2016-07-22 Thread Dave Hansen
As discussed in the previous patch, there is a reliability benefit to allowing an init value for the Protection Keys Rights User register (PKRU) which differs from what the XSAVE hardware provides. But, having PKRU be 0 (its init value) provides some nonzero amount of optimization potential to th

[RFC][PATCH 1/2] x86, pkeys: default to a restrictive init PKRU

2016-07-22 Thread Dave Hansen
Andy Lutomirski brought this up as a potential issue. It's straightforward to fix, but has potential performance implications. This applies on top of the previous pkeys syscall code that I posted, but I think we should probably discuss these on their own and not as a part of the larger series.

[PATCH 2/3] x86: add some better documentation for probe_kernel_address()

2016-07-22 Thread Dave Hansen
From: Dave Hansen probe_kernel_address() has an unfortunate name since it is used to probe kernel *and* userspace addresses. Add a comment explaining some of the situation to help the next developer who might make the silly assumption that it is for probing kernel addresses. Signed-off-by

[PATCH 1/3] x86, tracing: fix x86 exceptions trace header

2016-07-22 Thread Dave Hansen
From: Dave Hansen The various tracing headers pass some variables into the tracing code itself to indicate things like the name of the tracing directory where the tracepoints should be located in debugfs. The general pattern is to #undef them before redefining them. But, if all instances don&#

[PATCH 0/3] x86, pkeys: fix prefetch/pkeys interaction

2016-07-22 Thread Dave Hansen
The first two patches here are useful in any case, I think. But, as for the third: There are no known prefetch errata on processors that support memory protection keys. There have not been any that I can find in any recent generations, either. But, if there were a future erratum, we would need

[PATCH 3/3] x86, pkeys: allow instruction fetches in presence of pkeys

2016-07-22 Thread Dave Hansen
From: Dave Hansen Thanks to Andy Lutomirski for pointing out the potential issue here. Memory protection keys only affect data access. They do not affect instruction fetches. So, an instruction may not be readable, while it *is* executable. The fault prefetch checking code directly reads

Re: [PATCH 2/3] x86: add some better documentation for probe_kernel_address()

2016-07-22 Thread Dave Hansen
On 07/22/2016 11:10 AM, Andy Lutomirski wrote: > On Jul 22, 2016 11:03 AM, "Dave Hansen" wrote: >> From: Dave Hansen >> >> probe_kernel_address() has an unfortunate name since it is used >> to probe kernel *and* userspace addresses. Add a comment >> e

Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

2016-09-20 Thread Dave Hansen
On 09/20/2016 07:45 AM, Rui Teng wrote: > On 9/17/16 12:25 AM, Dave Hansen wrote: >> >> That's an interesting data point, but it still doesn't quite explain >> what is going on. >> >> It seems like there might be parts of gigantic pages that have >>

Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

2016-09-20 Thread Dave Hansen
On 09/20/2016 08:52 AM, Rui Teng wrote: > On 9/20/16 10:53 PM, Dave Hansen wrote: ... >> That's good, but aren't we still left with a situation where we've >> offlined and dissolved the _middle_ of a gigantic huge page while the >> head page is still in pl

Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size

2016-09-20 Thread Dave Hansen
On 09/20/2016 10:37 AM, Mike Kravetz wrote: > > Their approach (I believe) would be to fail the offline operation in > this case. However, I could argue that failing the operation, or > dissolving the unused huge page containing the area to be offlined is > the right thing to do. I think the rig

Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

2016-09-21 Thread Dave Hansen
On 09/21/2016 05:05 AM, Michal Hocko wrote: > On Tue 20-09-16 10:43:13, Dave Hansen wrote: >> On 09/20/2016 08:52 AM, Rui Teng wrote: >>> On 9/20/16 10:53 PM, Dave Hansen wrote: >> ... >>>> That's good, but aren't we still left with a situation where

Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

2016-09-21 Thread Dave Hansen
On 09/21/2016 09:27 AM, Michal Hocko wrote: > That was not my point. I wasn't very clear probably. Offlining can fail > which shouldn't be really surprising. There might be a kernel allocation > in the particular block which cannot be migrated so failures are to be > expected. I just do not see how

Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size

2016-09-21 Thread Dave Hansen
On 09/21/2016 11:20 AM, Michal Hocko wrote: > I would even question the per page block offlining itself. Why would > anybody want to offline few blocks rather than the whole node? What is > the usecase here? The original reason was so that you could remove a DIMM or a riser card full of DIMMs, whi

Re: [PATCH v3] mm/hugetlb: fix memory offline with hugepage size > memory block size

2016-09-22 Thread Dave Hansen
On 09/22/2016 09:29 AM, Gerald Schaefer wrote: > static void dissolve_free_huge_page(struct page *page) > { > + struct page *head = compound_head(page); > + struct hstate *h = page_hstate(head); > + int nid = page_to_nid(head); > + > spin_lock(&hugetlb_lock); > - if (PageHug

Re: [PATCH v3 2/2] Documentation/filesystems/proc.txt: Add more description for maps/smaps

2016-09-23 Thread Dave Hansen
On 09/23/2016 06:12 AM, Robert Ho wrote: > +Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's > +possible in race conditions, that the mappings printed may not be that > +up-to-date, because during each read walking, the task's mappings may have > +changed, this typically happens in

Re: [PATCH] mm: warn about allocations which stall for too long

2016-09-23 Thread Dave Hansen
On 09/23/2016 01:15 AM, Michal Hocko wrote: > + /* Make sure we know about allocations which stall for too long */ > + if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + > stall_timeout)) { > + pr_warn("%s: page alloction stalls for %ums: order:%u > mode:%#x(%

Re: [RFC PATCH] mm/hugetlb: Avoid soft lockup in set_max_huge_pages()

2016-07-27 Thread Dave Hansen
On 07/26/2016 06:39 PM, hejianet wrote: >>> >> and you choose to patch both of the alloc_*() functions. Why not just >> fix it at the common call site? Seems like that >> spin_lock(&hugetlb_lock) could be a cond_resched_lock() which would fix >> both cases. > I agree to move the cond_resched() to

Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-27 Thread Dave Hansen
On 07/26/2016 06:23 PM, Liang Li wrote: > + vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT; > + vb->pfn_limit = min(vb->pfn_limit, get_max_pfn()); > + vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) / > + BITS_PER_BYTE + 2 * sizeof(unsigned long); > + hdr_len = sizeof(str

Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info

2016-07-27 Thread Dave Hansen
On 07/26/2016 06:23 PM, Liang Li wrote: > + for_each_migratetype_order(order, t) { > + list_for_each(curr, &zone->free_area[order].free_list[t]) { > + pfn = page_to_pfn(list_entry(curr, struct page, lru)); > + if (pfn >= start_pfn && pfn <= en

Re: [PATCH 1/2] mm: page_alloc.c: Add tracepoints for slowpath

2016-07-27 Thread Dave Hansen
On 07/27/2016 08:23 AM, Steven Rostedt wrote: >> > + >> > + trace_mm_slowpath_end(page); >> > + > I'm thinking you only need one tracepoint, and use function_graph > tracer for the length of the function call. > > # cd /sys/kernel/debug/tracing > # echo __alloc_pages_nodemask > set_ftrace_filte

Re: [PATCH v2 repost 6/7] mm: add the related functions to get free page info

2016-07-27 Thread Dave Hansen
On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote: > On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote: >> On 07/26/2016 06:23 PM, Liang Li wrote: >>> + for_each_migratetype_order(order, t) { >>> + list_for_each(curr, &zon

Re: [PATCH v2 repost 3/7] mm: add a function to get the max pfn

2016-07-27 Thread Dave Hansen
On 07/27/2016 03:08 PM, Michael S. Tsirkin wrote: >> > +unsigned long get_max_pfn(void) >> > +{ >> > + return max_pfn; >> > +} >> > +EXPORT_SYMBOL(get_max_pfn); >> > + > > This needs a coment that this can change at any time. > So it's only good as a hint e.g. for sizing data structures. Or, if

[PATCH] x86, pkeys: remove protection keys' XSAVE buffer manipulation

2016-07-27 Thread Dave Hansen
From: Dave Hansen The Memory Protection Keys "rights register" (PKRU) is XSAVE-managed, and is saved/restored along with the FPU state. When kernel code accesses FPU regsisters, it does a delicate dance with preempt. Otherwise, the context switching code can get confused as to w

Re: [PATCH V2] mm/hugetlb: Avoid soft lockup in set_max_huge_pages()

2016-07-28 Thread Dave Hansen
Looks fine to me. Acked-by: Dave Hansen

Re: [PATCH 0/3] new feature: monitoring page cache events

2016-07-28 Thread Dave Hansen
On 07/25/2016 08:47 PM, George Amvrosiadis wrote: > 21 files changed, 2424 insertions(+), 1 deletion(-) I like the idea, but yikes, that's a lot of code. Have you considered using or augmenting the kernel's existing tracing mechanisms? Have you considered using something like netlink for transp

Re: [PATCH 0/3] new feature: monitoring page cache events

2016-07-29 Thread Dave Hansen
On 07/28/2016 08:47 PM, George Amvrosiadis wrote: > On Thu, Jul 28, 2016 at 02:02:45PM -0700, Dave Hansen wrote: >> On 07/25/2016 08:47 PM, George Amvrosiadis wrote: >>> 21 files changed, 2424 insertions(+), 1 deletion(-) >> >> I like the idea, but yikes, that&#

[PATCH 00/10] [v6] System Calls for Memory Protection Keys

2016-07-29 Thread Dave Hansen
qemu64,+pku,+xsave, and make sure to apply this patch[1] to qemu. === diffstat === Dave Hansen (10): x86, pkeys: add fault handling for PF_PK page fault bit mm: implement new pkey_mprotect() system call x86, pkeys: make mprotect_key() mask off additional vm_flags x86, pkeys

[PATCH 05/10] x86: wire up protection keys system calls

2016-07-29 Thread Dave Hansen
From: Dave Hansen This is all that we need to get the new system calls themselves working on x86. Signed-off-by: Dave Hansen Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org Cc: x...@kernel.org Cc: torva...@linux-foundation.org Cc: a...@linux-foundation.org

[PATCH 01/10] x86, pkeys: add fault handling for PF_PK page fault bit

2016-07-29 Thread Dave Hansen
From: Dave Hansen PF_PK means that a memory access violated the protection key access restrictions. It is unconditionally an access_error() because the permissions set on the VMA don't matter (the PKRU value overrides it), and we never "resolve" PK faults (like how a COW can

[PATCH 04/10] x86, pkeys: allocation/free syscalls

2016-07-29 Thread Dave Hansen
From: Dave Hansen This patch adds two new system calls: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); These implement an "allocator" for the protection keys themselves, which can be thought of as analogous to the allo

[PATCH 09/10] x86, pkeys: allow configuration of init_pkru

2016-07-29 Thread Dave Hansen
From: Dave Hansen As discussed in the previous patch, there is a reliability benefit to allowing an init value for the Protection Keys Rights User register (PKRU) which differs from what the XSAVE hardware provides. But, having PKRU be 0 (its init value) provides some nonzero amount of

[PATCH 07/10] pkeys: add details of system call use to Documentation/

2016-07-29 Thread Dave Hansen
From: Dave Hansen This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org Cc: x

[PATCH 08/10] x86, pkeys: default to a restrictive init PKRU

2016-07-29 Thread Dave Hansen
From: Dave Hansen PKRU is the register that lets you disallow writes or all access to a given protection key. The XSAVE hardware defines an "init state" of 0 for PKRU: its most permissive state, allowing access/writes to everything. Since we start off all new processes with the init

[PATCH 10/10] x86, pkeys: add self-tests

2016-07-29 Thread Dave Hansen
From: Dave Hansen This code should be a good demonstration of how to use the new system calls as well as how to use protection keys in general. This code shows how to: 1. Manipulate the Protection Keys Rights User (PKRU) register 2. Set a protection key on memory 3. Fetch and/or modify PKRU

[PATCH 02/10] mm: implement new pkey_mprotect() system call

2016-07-29 Thread Dave Hansen
From: Dave Hansen pkey_mprotect() is just like mprotect, except it also takes a protection key as an argument. On systems that do not support protection keys, it still works, but requires that key=0. Otherwise it does exactly what mprotect does. I expect it to get used like this, if you want

[PATCH 06/10] generic syscalls: wire up memory protection keys syscalls

2016-07-29 Thread Dave Hansen
From: Dave Hansen These new syscalls are implemented as generic code, so enable them for architectures like arm64 which use the generic syscall table. According to Arnd: Even if the support is x86 specific for the forseeable future, it may be good to reserve the number just in

[PATCH 03/10] x86, pkeys: make mprotect_key() mask off additional vm_flags

2016-07-29 Thread Dave Hansen
From: Dave Hansen Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE. Three of those bits: READ/WRITE/EXEC get translated directly in to vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in mprotect()'s 'prot' argument then it must be cleared in vma-&g

Re: [PATCH 08/10] x86, pkeys: default to a restrictive init PKRU

2016-07-29 Thread Dave Hansen
On 07/29/2016 10:29 AM, Andy Lutomirski wrote: >> > In the end, this ensures that threads which do not know how to >> > manage their own pkey rights can not do damage to data which is >> > pkey-protected. > I think you missed the fpu__clear() caller in kernel/fpu/signal.c. > > ISTM it might be mor

Re: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

2016-07-29 Thread Dave Hansen
On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote: >> > If 1MB is too big, how about 512K, or 256K? 32K seems too small. >> > > It's only small because it makes you rescan the free list. > So maybe you should do something else. > I looked at it a bit. Instead of scanning the free list, how about >

Re: [PATCH 0/3] new feature: monitoring page cache events

2016-08-01 Thread Dave Hansen
On 07/30/2016 10:31 AM, George Amvrosiadis wrote: > Dave, I can produce a patch that adds the extra two tracepoints and exports > all four tracepoint symbols. This would be a short patch that would just > extend existing tracing functionality. What do you think? Adding those tracepoints is probabl

Re: [PATCH 08/10] x86, pkeys: default to a restrictive init PKRU

2016-08-01 Thread Dave Hansen
On 08/01/2016 07:42 AM, Vlastimil Babka wrote: > On 07/29/2016 06:30 PM, Dave Hansen wrote: >> This does not cause any practical problems with applications >> using protection keys because we require them to specify initial >> permissions for each key when it is allocate

Re: [PATCH V3] mm: Add sysfs interface to dump each node's zonelist information

2016-09-06 Thread Dave Hansen
On 09/06/2016 01:31 AM, Anshuman Khandual wrote: > [NODE (0)] > ZONELIST_FALLBACK > (0) (node 0) (zone DMA c140c000) > (1) (node 1) (zone DMA c001) > (2) (node 2) (zone DMA c002) > (3) (node 3) (zone DMA c003) >

Re: [PATCH] Fix region lost in /proc/self/smaps

2016-09-07 Thread Dave Hansen
On 09/06/2016 11:51 PM, Xiao Guangrong wrote: > In order to fix this bug, we make 'file->version' indicate the next VMA > we want to handle This new approach makes it more likely that we'll skip a new VMA that gets inserted in between the read()s. But, I guess that's OK. We don't exactly claim t

Re: [PATCH] Fix region lost in /proc/self/smaps

2016-09-08 Thread Dave Hansen
On 09/07/2016 08:36 PM, Xiao Guangrong wrote:>> The user will see two VMAs in their output: >> >> A: 0x1000->0x2000 >> C: 0x1000->0x3000 >> >> Will it confuse them to see the same virtual address range twice? Or is >> there something preventing that happening that I'm missing? >> > > You

Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information

2016-09-08 Thread Dave Hansen
On 09/07/2016 07:46 PM, Anshuman Khandual wrote: > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. Doesn't this violate the "one value per file" sysfs rule? Doe

Re: [PATCH v2] mm, proc: Fix region lost in /proc/self/smaps

2016-09-13 Thread Dave Hansen
On 09/13/2016 07:59 AM, Oleg Nesterov wrote: > On 09/12, Michal Hocko wrote: >> > Considering how this all can be tricky and how partial reads can be >> > confusing and even misleading I am really wondering whether we >> > should simply document that only full reads will provide a sensible >> > res

Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

2016-09-13 Thread Dave Hansen
On 09/13/2016 01:39 AM, Rui Teng wrote: > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 87e11d8..64b5f81 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1442,7 +1442,7 @@ static int free_pool_huge_page(struct hstate *h, > nodemask_t *nodes_allowed, > static void dissolve_free_huge_page(

Re: [PATCH v2 07/33] x86/intel_rdt: Add support for Cache Allocation detection

2016-09-13 Thread Dave Hansen
On 09/08/2016 02:57 AM, Fenghua Yu wrote: > --- a/arch/x86/include/asm/disabled-features.h > +++ b/arch/x86/include/asm/disabled-features.h > @@ -57,6 +57,7 @@ > #define DISABLED_MASK15 0 > #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE) > #define DISABLED_MASK17 0 > -#define

Re: [PATCH v2 22/33] x86/intel_rdt.c: Extend RDT to per cache and per resources

2016-09-13 Thread Dave Hansen
On 09/08/2016 02:57 AM, Fenghua Yu wrote: > +static int __init rdt_setup(char *str) > +{ > + char *tok; > + > + while ((tok = strsep(&str, ",")) != NULL) { > + if (!*tok) > + return -EINVAL; > + > + if (strcmp(tok, "simulate_cat_l3") == 0) { > +

Re: [PATCH v2 07/33] x86/intel_rdt: Add support for Cache Allocation detection

2016-09-13 Thread Dave Hansen
On 09/13/2016 03:52 PM, Luck, Tony wrote: > On Tue, Sep 13, 2016 at 03:40:18PM -0700, Dave Hansen wrote: >> Are you sure you don't want to add RDT to disabled-features.h? You have >> a config option for it, so it seems like you should also be able to >> optimize some of

Re: [PATCH v2 26/33] Task fork and exit for rdtgroup

2016-09-13 Thread Dave Hansen
On 09/08/2016 02:57 AM, Fenghua Yu wrote: > +void rdtgroup_fork(struct task_struct *child) > +{ > + struct rdtgroup *rdtgrp; > + > + INIT_LIST_HEAD(&child->rg_list); > + if (!rdtgroup_mounted) > + return; > + > + mutex_lock(&rdtgroup_mutex); > + > + rdtgrp = current-

Re: [PATCH v2 26/33] Task fork and exit for rdtgroup

2016-09-14 Thread Dave Hansen
On 09/13/2016 04:35 PM, Luck, Tony wrote: > On Tue, Sep 13, 2016 at 04:13:04PM -0700, Dave Hansen wrote: >> Yikes, is this a new global lock and possible atomic_inc() on a shared >> variable in the fork() path? Has there been any performance or >> scalability testing done on

Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache

2016-09-14 Thread Dave Hansen
On 09/14/2016 12:19 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. Hi, I had some questions about this the

Re: [kernel-hardening] [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache

2016-09-14 Thread Dave Hansen
> On 09/02/2016 10:39 PM, Dave Hansen wrote: >> On 09/02/2016 04:39 AM, Juerg Haefliger wrote: >> Does this >> just mean that kernel allocations usually have to pay the penalty to >> convert a page? > > Only pages that are allocated for userspace (gfp & GFP_HI

Re: [PATCH] memory-hotplug: Fix bad area access on dissolve_free_huge_pages()

2016-09-14 Thread Dave Hansen
On 09/14/2016 09:33 AM, Rui Teng wrote: > > How about return the size of page freed from dissolve_free_huge_page(), > and jump such step on pfn? That would be a nice improvement. But, as far as describing the initial problem, can you explain how the tail pages still ended up being PageHuge()? S

Re: [PATCH v2 1/3] syscalls,x86 Expose arch_prctl on x86-32.

2016-09-14 Thread Dave Hansen
On 09/14/2016 02:01 PM, Kyle Huey wrote: > Signed-off-by: Kyle Huey > --- > arch/x86/entry/syscalls/syscall_32.tbl | 1 + > arch/x86/kernel/process.c | 80 > ++ > arch/x86/kernel/process_64.c | 66 > 3 files cha

Re: [PATCH v2 2/3] x86 Test and expose CPUID faulting capabilities in /proc/cpuinfo

2016-09-14 Thread Dave Hansen
On 09/14/2016 02:01 PM, Kyle Huey wrote: > Xen advertises the underlying support for CPUID faulting but not does pass > through writes to the relevant MSR, nor does it virtualize it, so it does > not actually work. For now mask off the relevant bit on MSR_PLATFORM_INFO. That needs to make it into

Re: [PATCH v2 1/3] syscalls,x86 Expose arch_prctl on x86-32.

2016-09-14 Thread Dave Hansen
On 09/14/2016 02:35 PM, Kyle Huey wrote: > It's not quite a plain move. To leave the existing arch_prctls only > accessible to 64 bit callers, I added the is_32 bit and the four early > returns for each existing ARCH_BLAH. These cases are now > conditionally compiled out in a 32 bit kernel, so we

Re: [RFC PATCH v2 2/3] xpfo: Only put previous userspace pages into the hot cache

2016-09-02 Thread Dave Hansen
On 09/02/2016 04:39 AM, Juerg Haefliger wrote: > Allocating a page to userspace that was previously allocated to the > kernel requires an expensive TLB shootdown. To minimize this, we only > put non-kernel pages into the hot cache to favor their allocation. But kernel allocations do allocate from

Re: [PATCH v3 kernel 0/7] Extend virtio-balloon for fast (de)inflating & fast live migration

2016-08-08 Thread Dave Hansen
On 08/07/2016 11:35 PM, Liang Li wrote: > Dave Hansen suggested a new scheme to encode the data structure, > because of additional complexity, it's not implemented in v3. FWIW, I don't think it takes any additional complexity here, at least in the guest implementation side. The t

[PATCH 09/10] x86, pkeys: allow configuration of init_pkru

2016-08-08 Thread Dave Hansen
From: Dave Hansen As discussed in the previous patch, there is a reliability benefit to allowing an init value for the Protection Keys Rights User register (PKRU) which differs from what the XSAVE hardware provides. But, having PKRU be 0 (its init value) provides some nonzero amount of

[PATCH 00/10] [v6] System Calls for Memory Protection Keys

2016-08-08 Thread Dave Hansen
he series and integrated in to kselftests. Folks wishing to run this code can do so with the new PKU support in qemu >=2.6. Just boot with -cpu qemu64,+pku,+xsave, and make sure to apply this patch[1] to qemu. === diffstat === Dave Hansen (10): x86, pkeys: add fault handling for PF_PK page

[PATCH 03/10] x86, pkeys: make mprotect_key() mask off additional vm_flags

2016-08-08 Thread Dave Hansen
From: Dave Hansen Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE. Three of those bits: READ/WRITE/EXEC get translated directly in to vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in mprotect()'s 'prot' argument then it must be cleared in vma-&g

[PATCH 07/10] pkeys: add details of system call use to Documentation/

2016-08-08 Thread Dave Hansen
From: Dave Hansen This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org Cc: x

[PATCH 02/10] mm: implement new pkey_mprotect() system call

2016-08-08 Thread Dave Hansen
From: Dave Hansen pkey_mprotect() is just like mprotect, except it also takes a protection key as an argument. On systems that do not support protection keys, it still works, but requires that key=0. Otherwise it does exactly what mprotect does. I expect it to get used like this, if you want

[PATCH 06/10] generic syscalls: wire up memory protection keys syscalls

2016-08-08 Thread Dave Hansen
From: Dave Hansen These new syscalls are implemented as generic code, so enable them for architectures like arm64 which use the generic syscall table. According to Arnd: Even if the support is x86 specific for the forseeable future, it may be good to reserve the number just in

[PATCH 08/10] x86, pkeys: default to a restrictive init PKRU

2016-08-08 Thread Dave Hansen
From: Dave Hansen PKRU is the register that lets you disallow writes or all access to a given protection key. The XSAVE hardware defines an "init state" of 0 for PKRU: its most permissive state, allowing access/writes to everything. Since we start off all new processes with the init

[PATCH 10/10] x86, pkeys: add self-tests

2016-08-08 Thread Dave Hansen
From: Dave Hansen This code should be a good demonstration of how to use the new system calls as well as how to use protection keys in general. This code shows how to: 1. Manipulate the Protection Keys Rights User (PKRU) register 2. Set a protection key on memory 3. Fetch and/or modify PKRU

[PATCH 01/10] x86, pkeys: add fault handling for PF_PK page fault bit

2016-08-08 Thread Dave Hansen
From: Dave Hansen PF_PK means that a memory access violated the protection key access restrictions. It is unconditionally an access_error() because the permissions set on the VMA don't matter (the PKRU value overrides it), and we never "resolve" PK faults (like how a COW can

[PATCH 04/10] x86, pkeys: allocation/free syscalls

2016-08-08 Thread Dave Hansen
From: Dave Hansen This patch adds two new system calls: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); These implement an "allocator" for the protection keys themselves, which can be thought of as analogous to the allo

[PATCH 05/10] x86: wire up protection keys system calls

2016-08-08 Thread Dave Hansen
From: Dave Hansen This is all that we need to get the new system calls themselves working on x86. Signed-off-by: Dave Hansen Cc: linux-...@vger.kernel.org Cc: linux-a...@vger.kernel.org Cc: linux...@kvack.org Cc: x...@kernel.org Cc: torva...@linux-foundation.org Cc: a...@linux-foundation.org

Re: [PATCH v6] x86/hpet: Reduce HPET counter read contention

2016-08-25 Thread Dave Hansen
On 08/12/2016 05:59 PM, Waiman Long wrote: > + * The lock and the hpet value are stored together and can be read in a > + * single atomic 64-bit read. It is explicitly assumed that arch_spinlock_t > + * is 32 bits in size. This requirement forces us to give up all of the goodness of lockdep. Is th

[PATCH] x86, syscalls: use SYSCALL_DEFINE() macros for sys_modify_ldt()

2017-10-17 Thread Dave Hansen
We do not have tracepoints for sys_modify_ldt() because we define it directly instead of using the normal SYSCALL_DEFINEx() macros. However, there is a reason sys_modify_ldt() does not use the macros: it has an 'int' return type instead of 'unsigned long'. This is a bug, but it's a bug cemented

Re: [PATCHv1, RFC 0/8] Boot-time switching between 4- and 5-level paging

2017-05-26 Thread Dave Hansen
On 05/26/2017 11:24 AM, h...@zytor.com wrote: > The only case where that even has any utility is for an application > to want more than 128 TiB address space on a machine with no more > than 64 TiB of RAM. It is kind of a narrow use case, I think. Doesn't more address space increase the effective

[PATCH] x86, mm: make alternatives code do stronger TLB flush

2017-10-31 Thread Dave Hansen
From: Dave Hansen local_flush_tlb() does a CR3 write. But, that kind of TLB flush is not guaranteed to invalidate global pages. The entire kernel is mapped with global pages. Also, now that we have PCIDs, local_flush_tlb() will only flush the *current* PCID. It would not flush the entries

[PATCH 00/23] KAISER: unmap most of the kernel from userspace page tables

2017-10-31 Thread Dave Hansen
tl;dr: KAISER makes it harder to defeat KASLR, but makes syscalls and interrupts slower. These patches are based on work from a team at Graz University of Technology posted here[1]. The major addition is support for Intel PCIDs which builds on top of Andy Lutomorski's PCID work merged for 4.14.

<    5   6   7   8   9   10   11   12   13   14   >