Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
Documentation/memcontrol.txt | 193 +++
1 file changed, 193 insertions(+)
diff -puN /dev/null Documentation/memcontrol.txt
--- /dev/null 2007-06-01 20:42:04.0 +0530
+++ linux-2.6.23-r
parameter which was being filled by add_uevent_var() is now gone.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordo
Randy Dunlap wrote:
> On Wed, 22 Aug 2007 18:36:12 +0530 Balbir Singh wrote:
>
>> Documentation/memcontrol.txt | 193
>> +++
>
> Is there some sub-dir that is appropriate for this, such as
> vm/ or accounting/ or containers/
, &env->buflen);
>> if (!cp)
>> return -ENODEV;
>>
>> _
>>
>> have done?
>
> Does replacing "&length" with "NULL" work? That's what's in the updated
> patch.
>
Hi, Kay,
replacing &length
Paul Menage wrote:
> On 8/22/07, Balbir Singh <[EMAIL PROTECTED]> wrote:
>>
>> Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
>> ---
>>
>> Documentation/memcontrol.txt | 193
>> +++
>> 1
Kay Sievers wrote:
> On Thu, 2007-08-23 at 00:34 +0530, Balbir Singh wrote:
>> Kay Sievers wrote:
>>>> gargh, sorry, that's probably due to my screwed up attempt to fix Kay's
>>>> screwed up
>>>> gregkh-driver-driver-core-change-add_uevent_
o 4k). The log shows that OOM occurred several times.
Kamalesh, how much memory do you have the on the system and what test were
you running when you hit this problem? Is the problem reproducible? What is
the configured swap size?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
s into
the documentation (I think it will help developers and users alike).
> Writing all above may be too much :)
>
> I'm sorry if I say something pointless.
>
No.. not at all! Thank you for reading the documentation and commenting
on it.
> Thanks,
> -Kame
>
>
-
Kay Sievers wrote:
> On Thu, 2007-08-23 at 00:34 +0530, Balbir Singh wrote:
>> Kay Sievers wrote:
>>>> gargh, sorry, that's probably due to my screwed up attempt to fix Kay's
>>>> screwed up
>>>> gregkh-driver-driver-core-change-add_uevent_
t-of-memory
mem-control-choose-rss-vs-rss-and-pagecache
mem-control-per-container-page-referenced
mem-control-documentation
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel"
PROTECTED]>
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
include/linux/res_counter.h | 102 +
init/Kconfig|7 ++
kernel/Makefile |1
kernel/res_counter.c| 120 +++
Changelong
1. use depends instead of select in init/Kconfig
2. Port to v11
3. Clean up the usage of names (container files) for v11
Setup the memory container and add basic hooks and controls to integrate
and work with the container.
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
i
ad of using
mem_container_from_cont() along with task_container.
Basic setup routines, the mm_struct has a pointer to the container that
it belongs to and the the page has a page_container associated with it.
Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]>
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]&
Srinivasan <[EMAIL PROTECTED]>
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
include/linux/memcontrol.h | 20 +
mm/filemap.c | 12 ++-
mm/memcontrol.c| 166 -
mm/memory.c| 43 +++
Allow tasks to migrate from one container to the other. We migrate
mm_struct's mem_container only when the thread group id migrates.
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
mm/memcontrol.c | 35 +++
1 file changed, 35 insertions(+)
d
ned-off-by: Pavel Emelianov <[EMAIL PROTECTED]>
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
include/linux/memcontrol.h | 12 +++
include/linux/res_counter.h | 23 +++
include/linux/swap.h|3
mm/memcontrol.c | 135 +
ndling.
Signed-off-by: Pavel Emelianov <[EMAIL PROTECTED]>
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
include/linux/memcontrol.h |1 +
mm/memcontrol.c|1 +
mm/oom_kill.c | 42 ++
3 files changed, 4
Choose if we want cached pages to be accounted or not. By default both
are accounted for. A new set of tunables are added.
echo -n 1 > mem_control_type
switches the accounting to account for only mapped pages
echo -n 3 > mem_control_type
switches the behaviour back
Signed-off-by:
when they are not actively referenced from the container that brought
them in
Signed-off-by: Balbir Singh <[EMAIL PROTECTED]>
---
include/linux/memcontrol.h |6 ++
include/linux/rmap.h |5 +++--
mm/memcontrol.c|5 +
mm/rmap.c
Changelog since version 1
1. Wording and punctuation comments - Randy Dunlap
2. Differentiate between RSS and Page Cache - Paul Menage
3. Add detailed description of features - KAMEZAWA Hiroyuki
4. Fix a typo (drop_pages should be drop_caches) - YAMAMOTO Takshi
Signed-off-by: Balbir Singh
Guillaume Chazarain wrote:
> Le Mon, 20 Aug 2007 22:31:08 +0530,
> Balbir Singh <[EMAIL PROTECTED]> a écrit :
>
>>> --- a/kernel/taskstats.cSat Aug 18 17:15:17 2007 -0700
>>> +++ b/kernel/taskstats.cSun Aug 19 17:20:15 2007 +0200
>>> @@ -246,6
ently, we treat the mm as owned by the thread group leader.
But this policy can be easily adapted to any other desired policy.
Would you like to see it change to something else?
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
-
To unsubscribe fr
HEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
> ONE("schedstat", S_IRUGO, proc_pid_schedstat),
> #endif
> #ifdef CONFIG_LATENCYTOP
> --
The change looks reasonable, from what I can understand you want these
changes so that you can use /proc//schedstat instead of the
ne
On Thu, Apr 5, 2018 at 8:56 PM, Anshuman Khandual
wrote:
> There are certian platforms which would like to use SWIOTLB based DMA API
> for bouncing purpose without actually requiring an IOMMU back end. But the
> virtio core does not allow such mechanism. Right now DMA MAP API is only
> selected fo
> }
>
> +/**
> + * mm_update_memcg - Update the memory cgroup of a mm_struct
> + * @mm: mm struct
> + * @new: new memory cgroup value
> + *
> + * Called whenever mm->memcg needs to change. Consumes a reference
> + * to new (unless new is NULL). The reference to the old memory
> + * cgroup is decreased.
> + */
> +void mm_update_memcg(struct mm_struct *mm, struct mem_cgroup *new)
> +{
> + /* This is the only place where mm->memcg is changed */
> + struct mem_cgroup *old;
> +
> + old = xchg(&mm->memcg, new);
> + if (old)
> + css_put(&old->css);
> +}
> +
> +static void task_update_memcg(struct task_struct *tsk, struct mem_cgroup
> *new)
> +{
> + struct mm_struct *mm;
> + task_lock(tsk);
> + mm = tsk->mm;
> + if (mm && !(tsk->flags & PF_KTHREAD))
> + mm_update_memcg(mm, new);
> + task_unlock(tsk);
> +}
> +
> +static void mem_cgroup_attach(struct cgroup_taskset *tset)
> +{
> + struct cgroup_subsys_state *css;
> + struct task_struct *tsk;
> +
> + cgroup_taskset_for_each(tsk, css, tset) {
> + struct mem_cgroup *new = mem_cgroup_from_css(css);
> + css_get(css);
> + task_update_memcg(tsk, new);
I'd have to go back and check and I think your comment refers to this,
but we don't expect non tgid tasks to show up here? My concern is I can't
find the guaratee that task_update_memcg(tsk, new) is not
1. Duplicated for each thread in the process or attached to the mm
2. Do not update mm->memcg to point to different places, so the one
that sticks is the one that updated things last.
Balbir Singh
.c.
>
> use-case 3: There is a relocation in the lp that cannot be automatically
> resolved similarly as 2, but no annotation was provided in the
> livepatch, triggering an error during compilation. Reproducible by
> removing the KLP_MODULE_RELOC / KLP_SYMPOS annotation sections in
> lib/
On Tue, Feb 5, 2019 at 10:24 PM Michael Ellerman wrote:
>
> Balbir Singh writes:
> > On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh wrote:
> >>
> >> On Tue, Jan 22, 2019 at 10:57:21AM -0500, Joe Lawrence wrote:
> >> > From: Nicolai Stange
> >>
e looks good to me as well.
>
> Reviewed-by: Alistair Popple
>
I checked the three callers of set_pte_at_notify and the assumption
seems correct
Reviewed-by: Balbir Singh
On Wed, Feb 6, 2019 at 3:44 PM Michael Ellerman wrote:
>
> Balbir Singh writes:
> > On Tue, Feb 5, 2019 at 10:24 PM Michael Ellerman
> > wrote:
> >> Balbir Singh writes:
> >> > On Sat, Feb 2, 2019 at 12:14 PM Balbir Singh
> >> > wrote:
&g
gt; this newly-added memory can be selected by its unique NUMA
> node.
NUMA is distance based topology, does HMAT solve these problems?
How do we prevent fallback nodes of normal nodes being pmem nodes?
On an unexpected crash/failure is there a scrubbing mechanism
or do we rely on the allocator to do the right thing prior to
reallocating any memory. Will frequent zero'ing hurt NVDIMM/pmem's
life times?
Balbir Singh.
icts in ioctl numbers,
inability to check the types of the parameters passed in and out makes it
not so good. Not to mention versioning issues, with the genl interface we have
the flexibility to version requests. I would really hate to have two ways to
do the same thing.
The overhead is there, do you see the overhead of 20ms per 10,000 calls
significant?
Does it affect your use case significantly?
Balbir Singh
On Sun, Jan 31, 2021 at 05:16:47PM +0800, Weiping Zhang wrote:
> On Wed, Jan 27, 2021 at 7:13 PM Balbir Singh wrote:
> >
> > On Fri, Jan 22, 2021 at 10:07:50PM +0800, Weiping Zhang wrote:
> > > Hello Balbir Singh,
> > >
> > > Could you help review thi
On Thu, Feb 04, 2021 at 10:37:20PM +0800, Weiping Zhang wrote:
> On Thu, Feb 4, 2021 at 6:20 PM Balbir Singh wrote:
> >
> > On Sun, Jan 31, 2021 at 05:16:47PM +0800, Weiping Zhang wrote:
> > > On Wed, Jan 27, 2021 at 7:13 PM Balbir Singh
> > > wrote:
> >
at boot time, second
by the application
- Rename l1d_flush_out/L1D_FLUSH_OUT to l1d_flush/L1D_FLUSH
- Implement other review recommendations
Changelog v3:
- Implement the SIGBUS mechansim
- Update and fix the documentation
Balbir Singh (5):
x86/smp: Add a per-cpu view of SMT state
x86/mm
.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
---
arch/x86/include/asm/processor.h | 2 ++
arch/x86/kernel/smpboot.c| 10 +-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 70 +++
.../admin
ivery).
There is also no seccomp integration for the feature.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
arch/Kconfig | 4 ++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/nospec-branch.h | 2 +
arc
: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200510014803.12190-4-sbl...@amazon.com
Link: https://lore.kernel.org/r/20200729001103.6450-3-sbl...@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53
called only when HW assisted
flushing is available.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200729001103.6450-4-sbl...@amazon.com
---
arch/x86/include/asm/cacheflush.h | 8
arch/x86/include/asm
ycles. I guess not all code can be accumulated under a single
hierarchy. May not be worth the effort, just thinking out loud.
Balbir Singh
he caller
> guarantees that the pointer it is passing does not point to a tail page.
>
Is this a part of a larger use case or general cleanup/refactor where
the split between page and folio simplify programming?
Balbir Singh.
. Both 64k and 4k pages work. Running as a KVM host works, but
> nothing in arch/powerpc/kvm is instrumented. It's also potentially a bit
> fragile - if any real mode code paths call out to instrumented code, things
> will go boom.
>
The last time I checked, the changes for real mode, made the code hard to
review/maintain. I am happy to see that we've decided to leave that off
the table for now, reviewing the series
Balbir Singh.
VE_ARCH_KASAN_HW_TAGS
> config HAVE_ARCH_KASAN_VMALLOC
> bool
>
> +config ARCH_DISABLE_KASAN_INLINE
> + def_bool n
> +
Some comments on what arch's want to disable kasan inline would
be helpful and why.
Balbir Singh.
On Fri, Mar 19, 2021 at 01:25:27AM +, Matthew Wilcox wrote:
> On Fri, Mar 19, 2021 at 10:56:45AM +1100, Balbir Singh wrote:
> > On Fri, Mar 05, 2021 at 04:18:37AM +, Matthew Wilcox (Oracle) wrote:
> > > A struct folio refers to an entire (possibly compound) page. A fu
ny code that runs with translations off after
> booting. Take this approach for now and require outline instrumentation.
>
> Previous attempts allowed inline instrumentation. However, they came with
> some unfortunate restrictions: only physically contiguous memory could be
> used and
On Thu, Feb 25, 2021 at 09:21:25PM +0800, Muchun Song wrote:
> When we free a HugeTLB page to the buddy allocator, we should allocate
> the vmemmap pages associated with it. But we may cannot allocate vmemmap
> pages when the system is under memory pressure, in this case, we just
> refuse to free t
ake
> them the liability of jobs in the system that DON'T share the same fs.
>
> But again, this is a useful discussion to have, but I don't quite see
> why it's relevant to Muchun's patches. They're purely an optimization.
>
> So I'd like to clear that up first before going further.
>
I suspect a lot of the issue really is the lack of lockstepping
between a page (unmapped page cache) and the corresponding memcgroups
lifecycle. When we delete a memcgroup, we sort of lose accounting
(depending on the inheriting parent) and ideally we want to bring back
the accounting when the page is reused in a different cgroup (almost
like first touch). I would like to look at the patches and see if they
do solve the issue that leads to zombie cgroups hanging around. In my
experience,
the combination of namespaces and number of cgroups (several of which could
be zombies), does not scale well.
Balbir Singh.
c
> > @@ -0,0 +1,124 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * linux/mm/bootmem_info.c
> > + *
> > + * Copyright (C)
>
> Looks like incomplete
>
Not that my comment was, I should have said
The copyright looks very incomplete
Balbir Singh.
+---+ | |
> > > | |
> > > | | | 4 | + |
> > > | |
> > > |2MB| +---+ |
> > > | |
> > > | | | 5 | --+
> > > | |
> > > | | +---+
> > > | |
> > > | | | 6 |
> > > + |
> > > | | +---+
> > > |
> > > | | | 7 |
> > > --+
> > > | | +---+
> > > | |
> > > | |
> > > | |
> > > +---+
> > >
> > > When a HugeTLB is freed to the buddy system, we should allocate 6 pages
> > > for
> > > vmemmap pages and restore the previous mapping relationship.
> > >
> >
> > Can these 6 pages come from the hugeTLB page itself? When you say 6 pages,
> > I presume you mean 6 pages of PAGE_SIZE
>
> There was a decent discussion about this in a previous version of the
> series starting here:
>
> https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/
>
> In this thread various other options were suggested and discussed.
>
Thanks,
Balbir Singh
>
> Signed-off-by: Muchun Song
> Reviewed-by: Oscar Salvador
> Acked-by: Mike Kravetz
> Reviewed-by: Miaohe Lin
> ---
Reviewed-by: Balbir Singh
we start at c00e...
> >> + */
> >> +
> >
> > assuming we have
> > #define VMEMMAP_END R_VMEMMAP_END
> > and ditto for hash we probably need
> >
> > BUILD_BUG_ON(VMEMMAP_END + KASAN_SHADOW_OFFSET != KASAN_SHADOW_END);
>
> Sorry, I'm not sure what this is supposed to be testing? In what
> situation would this trigger?
>
I am bit concerned that we have hard coded (IIR) 0xa80e... in the
config, any changes to VMEMMAP_END, KASAN_SHADOW_OFFSET/END
should be guarded.
Balbir Singh.
!=
> + if (__region_intersects(addr, size, 0, IORES_DESC_NONE) !=
> REGION_DISJOINT)
> continue;
>
> - if (dev)
> - res = devm_request_mem_region(dev, addr, size, name);
> - else
> - res = request_mem_region(addr, size, name);
> - if (!res)
> - return ERR_PTR(-ENOMEM);
> + if (!request_region_locked(&iomem_resource, res, addr,
> +size, name, 0))
> + break;
> +
> res->desc = IORES_DESC_DEVICE_PRIVATE_MEMORY;
> + if (dev) {
> + dr->parent = &iomem_resource;
> + dr->start = addr;
> + dr->n = size;
> + devres_add(dev, dr);
> + }
> +
> + write_unlock(&resource_lock);
> return res;
> }
>
> + write_unlock(&resource_lock);
> + free_resource(res);
> +
> return ERR_PTR(-ERANGE);
> }
>
Balbir Singh.
On Mon, Mar 29, 2021 at 12:55:15PM +1100, Alistair Popple wrote:
> On Friday, 26 March 2021 4:15:36 PM AEDT Balbir Singh wrote:
> > On Fri, Mar 26, 2021 at 12:20:35PM +1100, Alistair Popple wrote:
> > > +static int __region_intersects(resource_size_t
ret = VM_FAULT_HWPOISON;
> - delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> + delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
> goto out_release;
> }
>
> locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
>
> - delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> + delayacct_clear_flag(current, DELAYACCT_PF_SWAPIN);
> if (!locked) {
> ret |= VM_FAULT_RETRY;
> goto out_release;
Acked-by: Balbir Singh
The changes seem reasonable to me. I don't maintain a git tree, Andrew can we
please queue them up in your tree?
Balbir Singh.
>
> Signed-off-by: Chunguang Xu
The approach seems to make sense, but the test robot has found
a few issues, can you correct those as applicable please?
Balbir Singh.
f: restrict unknown scalars of mixed signed bounds
> for unprivileged")
> Signed-off-by: Samuel Mendoza-Jonas
> Reviewed-by: Frank van der Linden
> Reviewed-by: Ethan Chen
> ---
Thanks for catching it :)
Reviewed-by: Balbir Singh
y the SIGBUS behaviour,
there needs to be contention on the CPU where the task that opts
into L1D flushing is running to see the SIGBUS being sent to it
(the deterministic bit is that if there is scope of data leak
the task will get killed)
Balbir Singh (3):
x86/mm: change l1d flush runtime
Detection of task affinities at API opt-in time is not the best
approach, the approach is to kill the task if it runs on a
SMT enable core. This is better than not flushing the L1D cache
when the task switches from a non-SMT core to an SMT enabled core.
Signed-off-by: Balbir Singh
---
To be
Add a label to spec_set_ctrl to remove the build warning.
Signed-off-by: Balbir Singh
---
To be applied on top of tip commit id
767d46ab566dd489733666efe48732d523c8c332
Documentation/admin-guide/hw-vuln/l1d_flush.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a
Update the documentation to mention that a SIGBUS will be sent
to tasks that opt-into L1D flushing and execute on non-SMT cores.
Signed-off-by: Balbir Singh
---
To be applied on top of tip commit id
767d46ab566dd489733666efe48732d523c8c332
Documentation/admin-guide/hw-vuln/l1d_flush.rst | 8
On Thu, Nov 26, 2020 at 05:26:31PM +0800, Li, Aubrey wrote:
> On 2020/11/26 16:32, Balbir Singh wrote:
> > On Thu, Nov 26, 2020 at 11:20:41AM +0800, Li, Aubrey wrote:
> >> On 2020/11/26 6:57, Balbir Singh wrote:
> >>> On Wed, Nov 25, 2020 at 11:12:53AM +0800, Li, Aubr
set cgroup tag to 0 when the loop is done below.
> */
> while ((p = css_task_iter_next(&it))) {
> - p->core_cookie = !!val ? (unsigned long)tg : 0UL;
> -
> - if (sched_core_enqueued(p)) {
> - sched_core_dequeue(task_rq(p), p);
> - if (!p->core_cookie)
> - continue;
> - }
> -
> - if (sched_core_enabled(task_rq(p)) &&
> - p->core_cookie && task_on_rq_queued(p))
> - sched_core_enqueue(task_rq(p), p);
> + unsigned long cookie = !!val ? (unsigned long)tg : 0UL;
>
> + sched_core_tag_requeue(p, cookie, true /* group */);
> }
> css_task_iter_end(&it);
>
> diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
> index 60a922d3f46f..8c452b8010ad 100644
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -1024,6 +1024,10 @@ void proc_sched_show_task(struct task_struct *p,
> struct pid_namespace *ns,
> __PS("clock-delta", t1-t0);
> }
>
> +#ifdef CONFIG_SCHED_CORE
> + __PS("core_cookie", p->core_cookie);
> +#endif
> +
> sched_show_numa(p, m);
> }
>
Balbir Singh.
hat still live) in CDE? Have the most specific tag live. Same with
> that thread stuff.
>
> All this API stuff here is a complete and utter trainwreck. Please just
> delete the patches and start over. Hint: if you use stop_machine(),
> you're doing it wrong.
>
> At best you now have the requirements sorted.
+1, just remove this patch from the series so as to unblock the series.
Balbir Singh.
than just trace_printk()
Balbir Singh.
as we call
> put_prev_task before calling pick_task_fair. But for coresched, we
> call pick_task_fair on siblings while the task is running and would
> not be able to call put_prev_task. So this refactor of the code fixes
> the crash by explicitly passing curr.
>
> Hope this clarifies..
>
Yes, it does!
Thanks,
Balbir Singh.
On Mon, Nov 23, 2020 at 07:31:31AM -0500, Vineeth Pillai wrote:
> Hi Balbir,
>
> On 11/22/20 6:44 AM, Balbir Singh wrote:
> >
> > This seems cumbersome, is there no way to track the min_vruntime via
> > rq->core->min_vruntime?
> Do you mean to have a core w
On Mon, Nov 23, 2020 at 11:07:27PM +0800, Li, Aubrey wrote:
> On 2020/11/23 12:38, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:43PM -0500, Joel Fernandes (Google) wrote:
> >> From: Peter Zijlstra
> >>
> >> When a sibling is forced-idle to match the c
On Fri, Dec 04, 2020 at 11:19:17PM +0100, Thomas Gleixner wrote:
>
> Balbir,
>
> On Fri, Nov 27 2020 at 17:59, Balbir Singh wrote:
> > +enum l1d_flush_out_mitigations {
> > + L1D_FLUSH_OUT_OFF,
> > + L1D_FLUSH_OUT_ON,
> > +};
> > +
On Tue, Nov 24, 2020 at 08:32:01AM +0800, Li, Aubrey wrote:
> On 2020/11/24 7:35, Balbir Singh wrote:
> > On Mon, Nov 23, 2020 at 11:07:27PM +0800, Li, Aubrey wrote:
> >> On 2020/11/23 12:38, Balbir Singh wrote:
> >>> On Tue, Nov 17, 2020 at 06:19:43PM -0500,
On Wed, Nov 25, 2020 at 11:12:53AM +0800, Li, Aubrey wrote:
> On 2020/11/24 23:42, Peter Zijlstra wrote:
> > On Mon, Nov 23, 2020 at 12:36:10PM +0800, Li, Aubrey wrote:
> +#ifdef CONFIG_SCHED_CORE
> +/*
> + * Skip this cpu if source task's cookie does
On Tue, Nov 24, 2020 at 01:30:38PM -0500, Joel Fernandes wrote:
> On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> > > From: Peter Zijlstra
> > >
> > > The rationale
On Tue, Nov 24, 2020 at 10:09:55AM +0100, Peter Zijlstra wrote:
> On Tue, Nov 24, 2020 at 10:31:49AM +1100, Balbir Singh wrote:
> > On Mon, Nov 23, 2020 at 07:31:31AM -0500, Vineeth Pillai wrote:
> > > Hi Balbir,
> > >
> > > On 11/22/20 6:44 AM, Balbir Sing
On Fri, Nov 20, 2020 at 11:58:54AM -0500, Joel Fernandes wrote:
> On Fri, Nov 20, 2020 at 10:56:09AM +1100, Singh, Balbir wrote:
> [..]
> > > +#ifdef CONFIG_SMP
> > > +static struct task_struct *pick_task_fair(struct rq *rq)
> > > +{
> > > + struct cfs_rq *cfs_rq = &rq->cfs;
> > > + struct sched_e
able on SMT (provided you did that
> CONFIG_ thing). Even on AMD systems RT tasks might want to claim the
> core exclusively.
Agreed, specifically if we need to have special cgroup tag/association to
enable it.
Balbir Singh.
On Tue, Nov 24, 2020 at 09:16:17AM +0100, Peter Zijlstra wrote:
> On Sun, Nov 22, 2020 at 08:11:52PM +1100, Balbir Singh wrote:
> > On Tue, Nov 17, 2020 at 06:19:34PM -0500, Joel Fernandes (Google) wrote:
> > > From: Peter Zijlstra
> > >
> > > Introduce the
et;
> +
> + raw_spin_lock(rq_lockp(rq));
> + /*
> + * Core-wide nesting counter can never be 0 because we are
> + * still in it on this CPU.
> + */
> + nest = rq->core->core_unsafe_nest;
> + WARN_ON_ONCE(!nest);
> +
> + WRITE_ONCE(rq->core->core_unsafe_nest, nest - 1);
> + /*
> + * The raw_spin_unlock release semantics pairs with the nest counter's
> + * smp_load_acquire() in sched_core_wait_till_safe().
> + */
> + raw_spin_unlock(rq_lockp(rq));
> +ret:
> + local_irq_restore(flags);
> +}
> +
> // XXX fairness/fwd progress conditions
> /*
> * Returns
> @@ -5497,6 +5737,7 @@ static inline void sched_core_cpu_starting(unsigned int
> cpu)
> rq = cpu_rq(i);
> if (rq->core && rq->core == rq)
> core_rq = rq;
> + init_sched_core_irq_work(rq);
> }
>
> if (!core_rq)
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 615092cb693c..be6691337bbb 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1074,6 +1074,8 @@ struct rq {
> unsigned intcore_enabled;
> unsigned intcore_sched_seq;
> struct rb_root core_tree;
> + struct irq_work core_irq_work; /* To force HT into kernel */
> + unsigned intcore_this_unsafe_nest;
>
> /* shared state */
> unsigned intcore_task_seq;
> @@ -1081,6 +1083,7 @@ struct rq {
> unsigned long core_cookie;
> unsigned char core_forceidle;
> unsigned intcore_forceidle_seq;
> + unsigned intcore_unsafe_nest;
> #endif
> };
>
Balbir Singh.
On Thu, Nov 26, 2020 at 11:20:41AM +0800, Li, Aubrey wrote:
> On 2020/11/26 6:57, Balbir Singh wrote:
> > On Wed, Nov 25, 2020 at 11:12:53AM +0800, Li, Aubrey wrote:
> >> On 2020/11/24 23:42, Peter Zijlstra wrote:
> >>> On Mon, Nov 23, 2020 at 12:36:10PM +0800, Li,
On Thu, Feb 25, 2021 at 09:21:26PM +0800, Muchun Song wrote:
> Because we reuse the first tail vmemmap page frame and remap it
> with read-only, we cannot set the PageHWPosion on some tail pages.
> So we can use the head[4].private (There are at least 128 struct
> page structures associated with th
On Fri, Feb 05, 2021 at 10:43:02AM +0800, Weiping Zhang wrote:
> On Fri, Feb 5, 2021 at 8:08 AM Balbir Singh wrote:
> >
> > On Thu, Feb 04, 2021 at 10:37:20PM +0800, Weiping Zhang wrote:
> > > On Thu, Feb 4, 2021 at 6:20 PM Balbir Singh wrote:
> > > >
> &
On Mon, Dec 28, 2020 at 10:10:03PM +0800, Weiping Zhang wrote:
> Hi David,
>
> Could you help review this patch ?
>
> thanks
I've got it on my review list, thanks for the ping!
You should hear back from me soon.
Balbir Singh.
>
> On Fri, Dec 18, 2020 at 1:24
On Fri, Jan 22, 2021 at 10:07:50PM +0800, Weiping Zhang wrote:
> Hello Balbir Singh,
>
> Could you help review this patch, thanks
>
> On Mon, Dec 28, 2020 at 10:10 PM Weiping Zhang wrote:
> >
> > Hi David,
> >
> > Could you help review this patch ?
> &g
On Thu, Nov 26, 2020 at 09:29:14AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 26, 2020 at 10:05:19AM +1100, Balbir Singh wrote:
> > > @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct
> > > *prev, struct rq_flags *rf)
> > >
-data-sampling
[3] https://lkml.org/lkml/2020/6/2/1150
[4] https://lore.kernel.org/lkml/20200729001103.6450-1-sbl...@amazon.com/
[5] https://lore.kernel.org/lkml/20201117234934.25985-2-sbl...@amazon.com/
Changelog v3:
- Implement the SIGBUS mechansim
- Update and fix the documentation
Balbir Singh
Detection of task affinities at API opt-in time is not the best
approach, the approach is to kill the task if it runs on a
SMT enable core. This is better than not flushing the L1D cache
when the task switches from a non-SMT core to an SMT enabled core.
Signed-off-by: Balbir Singh
---
arch/x86
ivery).
There is also no seccomp integration for the feature.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
arch/Kconfig | 4 +++
arch/x86/Kconfig | 1 +
arch/x86/kernel/cpu/bugs.c
: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lkml.kernel.org/r/20200510014803.12190-4-sbl...@amazon.com
Link: https://lore.kernel.org/r/20200729001103.6450-3-sbl...@amazon.com
---
arch/x86/include/asm/tlbflush.h | 2 +-
arch/x86/mm/tlb.c | 53
called only when HW assisted
flushing is available.
Suggested-by: Thomas Gleixner
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
Link: https://lore.kernel.org/r/20200729001103.6450-4-sbl...@amazon.com
---
arch/x86/include/asm/cacheflush.h | 8
arch/x86/include/asm
Add documentation of l1d flushing, explain the need for the
feature and how it can be used.
Signed-off-by: Balbir Singh
Signed-off-by: Thomas Gleixner
---
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/l1d_flush.rst | 69 +++
.../admin
way to run these patches for testing? Bochs emulation or anything
else? I presume you've been testing against violations of CET in user space?
Can you share your testing?
Balbir Singh.
g is
> dynamic based on whether core sched is enabled or not (both statically and
> dynamically).
>
My point was that the word game does not do justice to the change, some
details around how this abstractions helps based on the (re)definition of rq
with coresched might help.
Balbir Singh.
it possible to have some
cores with core sched disabled? I don't see a strong use case for it,
but I am wondering if the design will fall apart if that assumption is
broken?
Balbir Singh
P */
>
> +#ifdef CONFIG_SCHED_CORE
> +static inline bool
> +__entity_slice_used(struct sched_entity *se, int min_nr_tasks)
> +{
> + u64 slice = sched_slice(cfs_rq_of(se), se);
I wonder if the definition of sched_slice() should be revisited for core
scheduling?
Should we use sched_slice = sched_slice / cpumask_weight(smt_mask)?
Would that resolve the issue your seeing? Effectively we need to answer
if two sched core siblings should be treated as executing one large
slice?
Balbir Singh.
easier. Further, it may make reverting the improvement easier in
> case the improvement causes any regression.
>
This seems cumbersome, is there no way to track the min_vruntime via
rq->core->min_vruntime?
Balbir Singh.
On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote:
> From: Peter Zijlstra
>
> The rationale is as follows. In the core-wide pick logic, even if
> need_sync == false, we need to go look at other CPUs (non-local CPUs) to
> see if they could be running RT.
>
> Say the RQs in a
if core scheduler is not enabled on the CPU. */
> + if (!sched_core_enabled(rq))
> + return true;
> +
> + for_each_cpu(cpu, cpu_smt_mask(cpu_of(rq))) {
> + if (!available_idle_cpu(cpu)) {
I was looking at this snippet and comparing this to is_core_idle(), the
major difference is the check for vcpu_is_preempted(). Do we want to
call the core as non idle if any vcpu was preempted on this CPU?
> + idle_core = false;
> + break;
> + }
> + }
> +
> + /*
> + * A CPU in an idle core is always the best choice for tasks with
> + * cookies.
> + */
> + return idle_core || rq->core->core_cookie == p->core_cookie;
> +}
> +
Balbir Singh.
presume we are looking at either one or two cpus
to define the core_occupation and we expect to match it against the
destination CPU.
Balbir Singh.
d by the series to determine if waiting is
> needed or not, during exit to user or guest mode.
>
> Tested-by: Julien Desfossez
> Reviewed-by: Aubrey Li
> Signed-off-by: Joel Fernandes (Google)
> ---
Acked-by: Balbir Singh
On Mon, Feb 10, 2014 at 04:21:30PM +0530, Gautham R Shenoy wrote:
> On Mon, Feb 10, 2014 at 02:45:55PM +0530, Srivatsa S. Bhat wrote:
>
> + cpuhp_lock_acquire_read();
> mutex_lock(&cpu_hotplug.lock);
Don't you want to abstract cpuhp_lock_acquire_read and mutex_lock into a
more useful pr
ms in patch 1 and for symbol table in patch 2.
> 3. perf probe failure with kretprobe when using kallsyms. This was
> failing as we were specifying an offset. This is fixed in patch 1.
>
> A few examples demonstrating the issues and the fix:
>
Given the choices, I think this makes sense
Acked-by: Balbir Singh
vmap
field using arch_* operations? Not sure
Balbir Singh
601 - 700 of 1095 matches
Mail list logo