On Thu, 2007-04-19 at 17:19 -0400, Trond Myklebust wrote:
> > With pid namespaces all kernel threads will disappear so how do
> > we cope with the problem when the sysadmin can not see the kernel
> > threads?
Do they actually always disappear, or do we keep them in the
init_pid_namespace?
-- Dave
On Sun, 2007-03-18 at 11:42 -0600, Eric W. Biederman wrote:
> Dave Hansen <[EMAIL PROTECTED]> writes:
> > To me, a process sitting there doing constant reads of 10 pages has the
> > same overhead to the VM as a process sitting there with a 10 page file
> > mmaped, and r
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
>
> +#define has_pt_op(vma, op) \
> + ((vma)->pagetable_ops && (vma)->pagetable_ops->op)
> +#define pt_op(vma, call) \
> + ((vma)->pagetable_ops->call)
Can you get rid of these macros? I think they make it a wee bit harder
to read
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> Signed-off-by: Adam Litke <[EMAIL PROTECTED]>
> ---
>
> fs/hugetlbfs/inode.c|3 ++-
> include/linux/hugetlb.h |4 ++--
> mm/hugetlb.c| 12
> mm/memory.c | 10 --
> 4 files changed,
On Mon, 2007-03-19 at 13:05 -0700, Adam Litke wrote:
> For the common case (vma->pagetable_ops == NULL), we do almost the
> same thing as the current code: load and test. The third instruction
> is different in that we jump for the common case instead of jumping in
> the hugetlb case. I don't thi
On Fri, 2007-03-23 at 04:12 -0600, Eric W. Biederman wrote:
> Would any of them work on a system on which every filesystem was on
> ramfs, and there was no swap? If not then they are not memory attacks
> but I/O attacks.
I truly understand your point here. But, I don't think this thought
exercis
I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2. It manifests
itself by the waitpid06 test in LTP hanging. This is very, very
reproducible in about 5 seconds by adding '-s wait' to the ltp command
line.
I see 4 waitpid06 processes on my 4-way machine spinning in userspace.
But, the weird pa
On Wed, 2007-03-07 at 14:16 -0800, Siddha, Suresh B wrote:
> On Wed, Mar 07, 2007 at 02:12:16PM -0800, Dave Hansen wrote:
> > I'm seeing weird hangs running ltp on 2.6.21-rc2-mm2. It manifests
> > itself by the waitpid06 test in LTP hanging. This is very, very
> > repro
On Wed, 2007-03-07 at 15:59 -0600, Serge E. Hallyn wrote:
> Space saving was the only reason for nsproxy to exist.
>
> Now of course it also provides the teensiest reduction in # instructions
> since every clone results in just one reference count inc for the
> nsproxy rather than one for each nam
On Sun, 2007-03-25 at 15:45 -0800, Andrew Morton wrote:
> On Sat, 24 Mar 2007 23:04:09 +0100 Miklos Szeredi <[EMAIL PROTECTED]> wrote:
>
> > This patch adds support for finding out the current file position,
> > open flags and possibly other info in the future.
> >
> > These new entries are added
On Mon, 2007-04-02 at 08:54 -0700, Christoph Lameter wrote:
> > BTW there is no guarantee the node size is a multiple of 128MB so
> > you likely need to handle the overlap case. Otherwise we can
> > get cache corruptions
>
> How does sparsemem handle that?
It doesn't. :)
In practice, this situ
On Mon, 2007-04-02 at 08:37 -0700, Christoph Lameter wrote:
> You want a benchmark to prove that the removal of memory references and
> code improves performance?
Yes, please. ;)
I completely agree, it looks like it should be faster. The code
certainly has potential benefits. But, to add this
First of all, nice set of patches.
On Sat, 2007-03-31 at 23:10 -0800, Christoph Lameter wrote:
> --- linux-2.6.21-rc5-mm2.orig/include/asm-generic/memory_model.h
> 2007-03-31 22:47:14.0 -0700
> +++ linux-2.6.21-rc5-mm2/include/asm-generic/memory_model.h 2007-03-31
> 22:59:35.0
On Mon, 2007-04-02 at 13:30 -0700, Christoph Lameter wrote:
> On Mon, 2 Apr 2007, Dave Hansen wrote:
> > I completely agree, it looks like it should be faster. The code
> > certainly has potential benefits. But, to add this neato, apparently
> > more performant feature, we
On Mon, 2007-04-02 at 14:00 -0700, Christoph Lameter wrote:
> On Mon, 2 Apr 2007, Dave Hansen wrote:
> > > + } else
> > > + return __alloc_bootmem_node(NODE_DATA(node), size, size,
> > > + __pa(MAX_DMA_ADDRESS));
> > >
On Mon, 2007-04-02 at 14:31 -0700, Christoph Lameter wrote:
> On Mon, 2 Apr 2007, Dave Hansen wrote:
>
> > > > Hmmm. Can we combine this with sparse_index_alloc()? Also, why not
> > > > just use the slab for this?
> > >
> > > Use a slab for p
On Mon, 2007-04-02 at 14:28 -0700, Christoph Lameter wrote:
> I do not care what its called as long as it
> covers all the bases and is not a glaring performance regresssion (like
> SPARSEMEM so far).
I honestly don't doubt that there are regressions, somewhere. Could you
elaborate, and perhap
On Mon, 2007-04-02 at 14:53 -0700, Christoph Lameter wrote:
> > > Well think about how to handle the case that the allocatiopn of a page
> > > table page or a vmemmap block fails. Once we have that sorted out then we
> > > can cleanup the higher layers.
> >
> > I think it is best to just complet
From: Dave Hansen
I don't think it is really possible to have a system where CPUID
enumerates support for XSAVE but that it does not have FP/SSE
(they are "legacy" features and always present).
But, I did manage to hit this case in qemu when I enabled its
somewhat shaky XSAV
From: Dave Hansen
I don't think it is really possible to have a system where CPUID
enumerates support for XSAVE but that it does not have FP/SSE
(they are "legacy" features and always present).
But, I did manage to hit this case in qemu when I enabled its
somewhat shaky XSAV
On 07/19/2016 09:18 PM, Zhou Chengming wrote:
> When CONFIG_SPARSEMEM_EXTREME is disabled, __section_nr can get
> the section number with a subtraction directly.
Does this actually *do* anything?
It was a long time ago, but if I remember correctly, the entire loop in
__section_nr() goes away beca
On 07/20/2016 06:55 PM, zhouchengming wrote:
> Thanks for your reply. I don't know the compiler will optimize the loop.
> But when I see the assembly code of __section_nr, it seems to still have
> the loop in it.
Oh, well. I guess it got broken in the last decade or so. Your patch
looks good to
On 07/12/2016 03:59 PM, Andy Lutomirski wrote:
> On Tue, Jul 12, 2016 at 3:55 PM, H. Peter Anvin wrote:
>> On 07/12/16 08:32, Dave Hansen wrote:
>>> On 07/09/2016 02:27 PM, Andy Lutomirski wrote:
>>>> is_prefetch in arch/x86/mm/fault.c can be called on a user addres
On 07/21/2016 02:48 PM, H. Peter Anvin wrote:
>> >I like it, except that reading just a single byte is a bit silly.
>> >OTOH, that's what the current code needs and I see no fundamental
>> >reason to change it until there's a real user.
>>>
> The thing is that we can't actually test this, since th
As discussed in the previous patch, there is a reliability
benefit to allowing an init value for the Protection Keys Rights
User register (PKRU) which differs from what the XSAVE hardware
provides.
But, having PKRU be 0 (its init value) provides some nonzero
amount of optimization potential to th
Andy Lutomirski brought this up as a potential issue. It's
straightforward to fix, but has potential performance
implications.
This applies on top of the previous pkeys syscall code that I
posted, but I think we should probably discuss these on their own
and not as a part of the larger series.
From: Dave Hansen
probe_kernel_address() has an unfortunate name since it is used
to probe kernel *and* userspace addresses. Add a comment
explaining some of the situation to help the next developer who
might make the silly assumption that it is for probing kernel
addresses.
Signed-off-by
From: Dave Hansen
The various tracing headers pass some variables into the tracing
code itself to indicate things like the name of the tracing
directory where the tracepoints should be located in debugfs.
The general pattern is to #undef them before redefining them.
But, if all instances don
The first two patches here are useful in any case, I think.
But, as for the third: There are no known prefetch errata on
processors that support memory protection keys. There have not
been any that I can find in any recent generations, either.
But, if there were a future erratum, we would need
From: Dave Hansen
Thanks to Andy Lutomirski for pointing out the potential issue
here.
Memory protection keys only affect data access. They do not
affect instruction fetches. So, an instruction may not be
readable, while it *is* executable.
The fault prefetch checking code directly reads
On 07/22/2016 11:10 AM, Andy Lutomirski wrote:
> On Jul 22, 2016 11:03 AM, "Dave Hansen" wrote:
>> From: Dave Hansen
>>
>> probe_kernel_address() has an unfortunate name since it is used
>> to probe kernel *and* userspace addresses. Add a comment
>> e
On 09/20/2016 07:45 AM, Rui Teng wrote:
> On 9/17/16 12:25 AM, Dave Hansen wrote:
>>
>> That's an interesting data point, but it still doesn't quite explain
>> what is going on.
>>
>> It seems like there might be parts of gigantic pages that have
>>
On 09/20/2016 08:52 AM, Rui Teng wrote:
> On 9/20/16 10:53 PM, Dave Hansen wrote:
...
>> That's good, but aren't we still left with a situation where we've
>> offlined and dissolved the _middle_ of a gigantic huge page while the
>> head page is still in pl
On 09/20/2016 10:37 AM, Mike Kravetz wrote:
>
> Their approach (I believe) would be to fail the offline operation in
> this case. However, I could argue that failing the operation, or
> dissolving the unused huge page containing the area to be offlined is
> the right thing to do.
I think the rig
On 09/21/2016 05:05 AM, Michal Hocko wrote:
> On Tue 20-09-16 10:43:13, Dave Hansen wrote:
>> On 09/20/2016 08:52 AM, Rui Teng wrote:
>>> On 9/20/16 10:53 PM, Dave Hansen wrote:
>> ...
>>>> That's good, but aren't we still left with a situation where
On 09/21/2016 09:27 AM, Michal Hocko wrote:
> That was not my point. I wasn't very clear probably. Offlining can fail
> which shouldn't be really surprising. There might be a kernel allocation
> in the particular block which cannot be migrated so failures are to be
> expected. I just do not see how
On 09/21/2016 11:20 AM, Michal Hocko wrote:
> I would even question the per page block offlining itself. Why would
> anybody want to offline few blocks rather than the whole node? What is
> the usecase here?
The original reason was so that you could remove a DIMM or a riser card
full of DIMMs, whi
On 09/22/2016 09:29 AM, Gerald Schaefer wrote:
> static void dissolve_free_huge_page(struct page *page)
> {
> + struct page *head = compound_head(page);
> + struct hstate *h = page_hstate(head);
> + int nid = page_to_nid(head);
> +
> spin_lock(&hugetlb_lock);
> - if (PageHug
On 09/23/2016 06:12 AM, Robert Ho wrote:
> +Note: for both /proc/PID/maps and /proc/PID/smaps readings, it's
> +possible in race conditions, that the mappings printed may not be that
> +up-to-date, because during each read walking, the task's mappings may have
> +changed, this typically happens in
On 09/23/2016 01:15 AM, Michal Hocko wrote:
> + /* Make sure we know about allocations which stall for too long */
> + if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start +
> stall_timeout)) {
> + pr_warn("%s: page alloction stalls for %ums: order:%u
> mode:%#x(%
On 07/26/2016 06:39 PM, hejianet wrote:
>>>
>> and you choose to patch both of the alloc_*() functions. Why not just
>> fix it at the common call site? Seems like that
>> spin_lock(&hugetlb_lock) could be a cond_resched_lock() which would fix
>> both cases.
> I agree to move the cond_resched() to
On 07/26/2016 06:23 PM, Liang Li wrote:
> + vb->pfn_limit = VIRTIO_BALLOON_PFNS_LIMIT;
> + vb->pfn_limit = min(vb->pfn_limit, get_max_pfn());
> + vb->bmap_len = ALIGN(vb->pfn_limit, BITS_PER_LONG) /
> + BITS_PER_BYTE + 2 * sizeof(unsigned long);
> + hdr_len = sizeof(str
On 07/26/2016 06:23 PM, Liang Li wrote:
> + for_each_migratetype_order(order, t) {
> + list_for_each(curr, &zone->free_area[order].free_list[t]) {
> + pfn = page_to_pfn(list_entry(curr, struct page, lru));
> + if (pfn >= start_pfn && pfn <= en
On 07/27/2016 08:23 AM, Steven Rostedt wrote:
>> > +
>> > + trace_mm_slowpath_end(page);
>> > +
> I'm thinking you only need one tracepoint, and use function_graph
> tracer for the length of the function call.
>
> # cd /sys/kernel/debug/tracing
> # echo __alloc_pages_nodemask > set_ftrace_filte
On 07/27/2016 03:05 PM, Michael S. Tsirkin wrote:
> On Wed, Jul 27, 2016 at 09:40:56AM -0700, Dave Hansen wrote:
>> On 07/26/2016 06:23 PM, Liang Li wrote:
>>> + for_each_migratetype_order(order, t) {
>>> + list_for_each(curr, &zon
On 07/27/2016 03:08 PM, Michael S. Tsirkin wrote:
>> > +unsigned long get_max_pfn(void)
>> > +{
>> > + return max_pfn;
>> > +}
>> > +EXPORT_SYMBOL(get_max_pfn);
>> > +
>
> This needs a coment that this can change at any time.
> So it's only good as a hint e.g. for sizing data structures.
Or, if
From: Dave Hansen
The Memory Protection Keys "rights register" (PKRU) is
XSAVE-managed, and is saved/restored along with the FPU state.
When kernel code accesses FPU regsisters, it does a delicate
dance with preempt. Otherwise, the context switching code can
get confused as to w
Looks fine to me.
Acked-by: Dave Hansen
On 07/25/2016 08:47 PM, George Amvrosiadis wrote:
> 21 files changed, 2424 insertions(+), 1 deletion(-)
I like the idea, but yikes, that's a lot of code.
Have you considered using or augmenting the kernel's existing tracing
mechanisms? Have you considered using something like netlink for
transp
On 07/28/2016 08:47 PM, George Amvrosiadis wrote:
> On Thu, Jul 28, 2016 at 02:02:45PM -0700, Dave Hansen wrote:
>> On 07/25/2016 08:47 PM, George Amvrosiadis wrote:
>>> 21 files changed, 2424 insertions(+), 1 deletion(-)
>>
>> I like the idea, but yikes, that
qemu64,+pku,+xsave, and make
sure to apply this patch[1] to qemu.
=== diffstat ===
Dave Hansen (10):
x86, pkeys: add fault handling for PF_PK page fault bit
mm: implement new pkey_mprotect() system call
x86, pkeys: make mprotect_key() mask off additional vm_flags
x86, pkeys
From: Dave Hansen
This is all that we need to get the new system calls themselves
working on x86.
Signed-off-by: Dave Hansen
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Cc: x...@kernel.org
Cc: torva...@linux-foundation.org
Cc: a...@linux-foundation.org
From: Dave Hansen
PF_PK means that a memory access violated the protection key
access restrictions. It is unconditionally an access_error()
because the permissions set on the VMA don't matter (the PKRU
value overrides it), and we never "resolve" PK faults (like
how a COW can
From: Dave Hansen
This patch adds two new system calls:
int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
int pkey_free(int pkey);
These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allo
From: Dave Hansen
As discussed in the previous patch, there is a reliability
benefit to allowing an init value for the Protection Keys Rights
User register (PKRU) which differs from what the XSAVE hardware
provides.
But, having PKRU be 0 (its init value) provides some nonzero
amount of
From: Dave Hansen
This spells out all of the pkey-related system calls that we have
and provides some example code fragments to demonstrate how we
expect them to be used.
Signed-off-by: Dave Hansen
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Cc: x
From: Dave Hansen
PKRU is the register that lets you disallow writes or all access
to a given protection key.
The XSAVE hardware defines an "init state" of 0 for PKRU: its
most permissive state, allowing access/writes to everything.
Since we start off all new processes with the init
From: Dave Hansen
This code should be a good demonstration of how to use the new
system calls as well as how to use protection keys in general.
This code shows how to:
1. Manipulate the Protection Keys Rights User (PKRU) register
2. Set a protection key on memory
3. Fetch and/or modify PKRU
From: Dave Hansen
pkey_mprotect() is just like mprotect, except it also takes a
protection key as an argument. On systems that do not support
protection keys, it still works, but requires that key=0.
Otherwise it does exactly what mprotect does.
I expect it to get used like this, if you want
From: Dave Hansen
These new syscalls are implemented as generic code, so enable
them for architectures like arm64 which use the generic syscall
table.
According to Arnd:
Even if the support is x86 specific for the forseeable
future, it may be good to reserve the number just in
From: Dave Hansen
Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE.
Three of those bits: READ/WRITE/EXEC get translated directly in to
vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in
mprotect()'s 'prot' argument then it must be cleared in vma-&g
On 07/29/2016 10:29 AM, Andy Lutomirski wrote:
>> > In the end, this ensures that threads which do not know how to
>> > manage their own pkey rights can not do damage to data which is
>> > pkey-protected.
> I think you missed the fpu__clear() caller in kernel/fpu/signal.c.
>
> ISTM it might be mor
On 07/28/2016 02:51 PM, Michael S. Tsirkin wrote:
>> > If 1MB is too big, how about 512K, or 256K? 32K seems too small.
>> >
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about
>
On 07/30/2016 10:31 AM, George Amvrosiadis wrote:
> Dave, I can produce a patch that adds the extra two tracepoints and exports
> all four tracepoint symbols. This would be a short patch that would just
> extend existing tracing functionality. What do you think?
Adding those tracepoints is probabl
On 08/01/2016 07:42 AM, Vlastimil Babka wrote:
> On 07/29/2016 06:30 PM, Dave Hansen wrote:
>> This does not cause any practical problems with applications
>> using protection keys because we require them to specify initial
>> permissions for each key when it is allocate
On 09/06/2016 01:31 AM, Anshuman Khandual wrote:
> [NODE (0)]
> ZONELIST_FALLBACK
> (0) (node 0) (zone DMA c140c000)
> (1) (node 1) (zone DMA c001)
> (2) (node 2) (zone DMA c002)
> (3) (node 3) (zone DMA c003)
>
On 09/06/2016 11:51 PM, Xiao Guangrong wrote:
> In order to fix this bug, we make 'file->version' indicate the next VMA
> we want to handle
This new approach makes it more likely that we'll skip a new VMA that
gets inserted in between the read()s. But, I guess that's OK. We don't
exactly claim t
On 09/07/2016 08:36 PM, Xiao Guangrong wrote:>> The user will see two
VMAs in their output:
>>
>> A: 0x1000->0x2000
>> C: 0x1000->0x3000
>>
>> Will it confuse them to see the same virtual address range twice? Or is
>> there something preventing that happening that I'm missing?
>>
>
> You
On 09/07/2016 07:46 PM, Anshuman Khandual wrote:
> after memory or node hot[un]plug is desirable. This change adds one
> new sysfs interface (/sys/devices/system/memory/system_zone_details)
> which will fetch and dump this information.
Doesn't this violate the "one value per file" sysfs rule? Doe
On 09/13/2016 07:59 AM, Oleg Nesterov wrote:
> On 09/12, Michal Hocko wrote:
>> > Considering how this all can be tricky and how partial reads can be
>> > confusing and even misleading I am really wondering whether we
>> > should simply document that only full reads will provide a sensible
>> > res
On 09/13/2016 01:39 AM, Rui Teng wrote:
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 87e11d8..64b5f81 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1442,7 +1442,7 @@ static int free_pool_huge_page(struct hstate *h,
> nodemask_t *nodes_allowed,
> static void dissolve_free_huge_page(
On 09/08/2016 02:57 AM, Fenghua Yu wrote:
> --- a/arch/x86/include/asm/disabled-features.h
> +++ b/arch/x86/include/asm/disabled-features.h
> @@ -57,6 +57,7 @@
> #define DISABLED_MASK15 0
> #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE)
> #define DISABLED_MASK17 0
> -#define
On 09/08/2016 02:57 AM, Fenghua Yu wrote:
> +static int __init rdt_setup(char *str)
> +{
> + char *tok;
> +
> + while ((tok = strsep(&str, ",")) != NULL) {
> + if (!*tok)
> + return -EINVAL;
> +
> + if (strcmp(tok, "simulate_cat_l3") == 0) {
> +
On 09/13/2016 03:52 PM, Luck, Tony wrote:
> On Tue, Sep 13, 2016 at 03:40:18PM -0700, Dave Hansen wrote:
>> Are you sure you don't want to add RDT to disabled-features.h? You have
>> a config option for it, so it seems like you should also be able to
>> optimize some of
On 09/08/2016 02:57 AM, Fenghua Yu wrote:
> +void rdtgroup_fork(struct task_struct *child)
> +{
> + struct rdtgroup *rdtgrp;
> +
> + INIT_LIST_HEAD(&child->rg_list);
> + if (!rdtgroup_mounted)
> + return;
> +
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdtgrp = current-
On 09/13/2016 04:35 PM, Luck, Tony wrote:
> On Tue, Sep 13, 2016 at 04:13:04PM -0700, Dave Hansen wrote:
>> Yikes, is this a new global lock and possible atomic_inc() on a shared
>> variable in the fork() path? Has there been any performance or
>> scalability testing done on
On 09/14/2016 12:19 AM, Juerg Haefliger wrote:
> Allocating a page to userspace that was previously allocated to the
> kernel requires an expensive TLB shootdown. To minimize this, we only
> put non-kernel pages into the hot cache to favor their allocation.
Hi, I had some questions about this the
> On 09/02/2016 10:39 PM, Dave Hansen wrote:
>> On 09/02/2016 04:39 AM, Juerg Haefliger wrote:
>> Does this
>> just mean that kernel allocations usually have to pay the penalty to
>> convert a page?
>
> Only pages that are allocated for userspace (gfp & GFP_HI
On 09/14/2016 09:33 AM, Rui Teng wrote:
>
> How about return the size of page freed from dissolve_free_huge_page(),
> and jump such step on pfn?
That would be a nice improvement.
But, as far as describing the initial problem, can you explain how the
tail pages still ended up being PageHuge()? S
On 09/14/2016 02:01 PM, Kyle Huey wrote:
> Signed-off-by: Kyle Huey
> ---
> arch/x86/entry/syscalls/syscall_32.tbl | 1 +
> arch/x86/kernel/process.c | 80
> ++
> arch/x86/kernel/process_64.c | 66
> 3 files cha
On 09/14/2016 02:01 PM, Kyle Huey wrote:
> Xen advertises the underlying support for CPUID faulting but not does pass
> through writes to the relevant MSR, nor does it virtualize it, so it does
> not actually work. For now mask off the relevant bit on MSR_PLATFORM_INFO.
That needs to make it into
On 09/14/2016 02:35 PM, Kyle Huey wrote:
> It's not quite a plain move. To leave the existing arch_prctls only
> accessible to 64 bit callers, I added the is_32 bit and the four early
> returns for each existing ARCH_BLAH. These cases are now
> conditionally compiled out in a 32 bit kernel, so we
On 09/02/2016 04:39 AM, Juerg Haefliger wrote:
> Allocating a page to userspace that was previously allocated to the
> kernel requires an expensive TLB shootdown. To minimize this, we only
> put non-kernel pages into the hot cache to favor their allocation.
But kernel allocations do allocate from
On 08/07/2016 11:35 PM, Liang Li wrote:
> Dave Hansen suggested a new scheme to encode the data structure,
> because of additional complexity, it's not implemented in v3.
FWIW, I don't think it takes any additional complexity here, at least in
the guest implementation side. The t
From: Dave Hansen
As discussed in the previous patch, there is a reliability
benefit to allowing an init value for the Protection Keys Rights
User register (PKRU) which differs from what the XSAVE hardware
provides.
But, having PKRU be 0 (its init value) provides some nonzero
amount of
he series and integrated in to
kselftests.
Folks wishing to run this code can do so with the new PKU support
in qemu >=2.6. Just boot with -cpu qemu64,+pku,+xsave, and make
sure to apply this patch[1] to qemu.
=== diffstat ===
Dave Hansen (10):
x86, pkeys: add fault handling for PF_PK page
From: Dave Hansen
Today, mprotect() takes 4 bits of data: PROT_READ/WRITE/EXEC/NONE.
Three of those bits: READ/WRITE/EXEC get translated directly in to
vma->vm_flags by calc_vm_prot_bits(). If a bit is unset in
mprotect()'s 'prot' argument then it must be cleared in vma-&g
From: Dave Hansen
This spells out all of the pkey-related system calls that we have
and provides some example code fragments to demonstrate how we
expect them to be used.
Signed-off-by: Dave Hansen
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Cc: x
From: Dave Hansen
pkey_mprotect() is just like mprotect, except it also takes a
protection key as an argument. On systems that do not support
protection keys, it still works, but requires that key=0.
Otherwise it does exactly what mprotect does.
I expect it to get used like this, if you want
From: Dave Hansen
These new syscalls are implemented as generic code, so enable
them for architectures like arm64 which use the generic syscall
table.
According to Arnd:
Even if the support is x86 specific for the forseeable
future, it may be good to reserve the number just in
From: Dave Hansen
PKRU is the register that lets you disallow writes or all access
to a given protection key.
The XSAVE hardware defines an "init state" of 0 for PKRU: its
most permissive state, allowing access/writes to everything.
Since we start off all new processes with the init
From: Dave Hansen
This code should be a good demonstration of how to use the new
system calls as well as how to use protection keys in general.
This code shows how to:
1. Manipulate the Protection Keys Rights User (PKRU) register
2. Set a protection key on memory
3. Fetch and/or modify PKRU
From: Dave Hansen
PF_PK means that a memory access violated the protection key
access restrictions. It is unconditionally an access_error()
because the permissions set on the VMA don't matter (the PKRU
value overrides it), and we never "resolve" PK faults (like
how a COW can
From: Dave Hansen
This patch adds two new system calls:
int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
int pkey_free(int pkey);
These implement an "allocator" for the protection keys
themselves, which can be thought of as analogous to the allo
From: Dave Hansen
This is all that we need to get the new system calls themselves
working on x86.
Signed-off-by: Dave Hansen
Cc: linux-...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux...@kvack.org
Cc: x...@kernel.org
Cc: torva...@linux-foundation.org
Cc: a...@linux-foundation.org
On 08/12/2016 05:59 PM, Waiman Long wrote:
> + * The lock and the hpet value are stored together and can be read in a
> + * single atomic 64-bit read. It is explicitly assumed that arch_spinlock_t
> + * is 32 bits in size.
This requirement forces us to give up all of the goodness of lockdep.
Is th
We do not have tracepoints for sys_modify_ldt() because we define
it directly instead of using the normal SYSCALL_DEFINEx() macros.
However, there is a reason sys_modify_ldt() does not use the macros:
it has an 'int' return type instead of 'unsigned long'. This is
a bug, but it's a bug cemented
On 05/26/2017 11:24 AM, h...@zytor.com wrote:
> The only case where that even has any utility is for an application
> to want more than 128 TiB address space on a machine with no more
> than 64 TiB of RAM. It is kind of a narrow use case, I think.
Doesn't more address space increase the effective
From: Dave Hansen
local_flush_tlb() does a CR3 write. But, that kind of TLB flush is
not guaranteed to invalidate global pages. The entire kernel is
mapped with global pages.
Also, now that we have PCIDs, local_flush_tlb() will only flush the
*current* PCID. It would not flush the entries
tl;dr:
KAISER makes it harder to defeat KASLR, but makes syscalls and
interrupts slower. These patches are based on work from a team at
Graz University of Technology posted here[1]. The major addition is
support for Intel PCIDs which builds on top of Andy Lutomorski's PCID
work merged for 4.14.
901 - 1000 of 5336 matches
Mail list logo