Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Yajun Deng



On 2023/10/9 22:28, Steven Rostedt wrote:

On Mon, 9 Oct 2023 18:58:27 +0800
Yajun Deng  wrote:


C compiler decides to inline or not, depending on various factors.

The most efficient (and small) code is generated by this_cpu_inc()
version, allowing the compiler to inline it.

If you copy/paste this_cpu_inc()  twenty times, then the compiler
would  not inline the function anymore.

Yes, if you want something to be visible by ftrace, it must not be inlined
(as inlined functions are not function calls by definition). And as Eric
stated, the compiler is perfectly allowed to inline something if it
believes it will be more efficient. i.e. There may be code around the function
call that could be more efficient if it wasn't change to parameters. If you
want to make sure a function stays out of line, you must explicitly tell
the compiler you want the function not to ever be inlined (hence the
"noinline" attribute).



Thanks for the details.



Got it. Thank you.

Great.

-- Steve




Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Huang, Kai
On Mon, 2023-10-09 at 20:42 -0500, Haitao Huang wrote:
> Hi Sean
> 
> On Mon, 09 Oct 2023 19:23:04 -0500, Sean Christopherson  
>  wrote:
> 
> > On Mon, Oct 09, 2023, Kai Huang wrote:
> > > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > > > +/**
> > > > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > > > + * @lru:   LRU that is low
> > > > + *
> > > > + * Return: %true if a victim was found and kicked.
> > > > + */
> > > > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > > > +{
> > > > +   struct sgx_epc_page *victim;
> > > > +
> > > > +   spin_lock(>lock);
> > > > +   victim = sgx_oom_get_victim(lru);
> > > > +   spin_unlock(>lock);
> > > > +
> > > > +   if (!victim)
> > > > +   return false;
> > > > +
> > > > +   if (victim->flags & SGX_EPC_OWNER_PAGE)
> > > > +   return sgx_oom_encl_page(victim->encl_page);
> > > > +
> > > > +   if (victim->flags & SGX_EPC_OWNER_ENCL)
> > > > +   return sgx_oom_encl(victim->encl);
> > > 
> > > I hate to bring this up, at least at this stage, but I am wondering why  
> > > we need
> > > to put VA and SECS pages to the unreclaimable list, but cannot keep an
> > > "enclave_list" instead?
> > 
> > The motivation for tracking EPC pages instead of enclaves was so that  
> > the EPC
> > OOM-killer could "kill" VMs as well as host-owned enclaves.  The virtual  
> > EPC code
> > didn't actually kill the VM process, it instead just freed all of the  
> > EPC pages
> > and abused the SGX architecture to effectively make the guest recreate  
> > all its
> > enclaves (IIRC, QEMU does the same thing to "support" live migration).
> > 
> > Looks like y'all punted on that with:
> > 
> >   The EPC pages allocated for KVM guests by the virtual EPC driver are  
> > not
> >   reclaimable by the host kernel [5]. Therefore they are not tracked by  
> > any
> >   LRU lists for reclaiming purposes in this implementation, but they are
> >   charged toward the cgroup of the user processs (e.g., QEMU) launching  
> > the
> >   guest.  And when the cgroup EPC usage reaches its limit, the virtual  
> > EPC
> >   driver will stop allocating more EPC for the VM, and return SIGBUS to  
> > the
> >   user process which would abort the VM launch.
> > 
> > which IMO is a hack, unless returning SIGBUS is actually enforced  
> > somehow.  Relying
> > on userspace to be kind enough to kill its VMs kinda defeats the purpose  
> > of cgroup
> > enforcement.  E.g. if the hard limit for a EPC cgroup is lowered,  
> > userspace running
> > encalves in a VM could continue on and refuse to give up its EPC, and  
> > thus run above
> > its limit in perpetuity.
> > 
> Cgroup would refuse to allocate more when limit is reached so VMs can not  
> run above limit.
> 
> IIRC VMs only support static EPC size right now, reaching limit at launch  
> means the EPC size given in command line for QEMU is not appropriate. So  
> VM should not launch, hence the current behavior.
> 
> [all EPC pages in guest are allocated on page fault caused by the  
> sensitization process in guest kernel during init, which is part of the VM  
> Launch process. So SIGNBUS will turn into failed VM launch.]
> 
> Once it is launched, guest kernel would have 'total capacity' given by the  
> static value from QEMU option. And it would start paging when it is used  
> up, never would ask for more from host.
> 
> For future with dynamic EPC for running guests, QEMU could handle  
> allocation failure and pass SIGBUS to the running guest kernel.  Is that  
> correct understanding?
> 
> 
> > I can see userspace wanting to explicitly terminate the VM instead of  
> > "silently"
> > the VM's enclaves, but that seems like it should be a knob in the  
> > virtual EPC
> > code.
> 
> If my understanding above is correct and understanding your statement  
> above correctly, then don't see we really need separate knob for vEPC  
> code. Reaching a cgroup limit by a running guest (assuming dynamic  
> allocation implemented) should not translate automatically killing the VM.  
> Instead, it's user space job to work with guest to handle allocation  
> failure. Guest could page and kill enclaves.
> 

IIUC Sean was talking about changing misc.max _after_ you launch SGX VMs:

1) misc.max = 100M
2) Launch VMs with total virtual EPC size = 100M<- success
3) misc.max = 50M

3) will also succeed, but nothing will happen, the VMs will be still holding
100M EPC.

You need to somehow track virtual EPC and kill VM instead.

(or somehow fail to do 3) if it is also an acceptable option.)



Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Huang, Kai

> > > > 
> > > Later the hosting process could migrated/reassigned to another cgroup?
> > > What to do when the new cgroup is OOM?
> > > 
> > 
> > You addressed in the documentation, no?
> > 
> > +Migration
> > +-
> > +
> > +Once an EPC page is charged to a cgroup (during allocation), it
> > +remains charged to the original cgroup until the page is released
> > +or reclaimed.  Migrating a process to a different cgroup doesn't
> > +move the EPC charges that it incurred while in the previous cgroup
> > +to its new cgroup.
> 
> Should we kill the enclave though because some VA pages may be in the new  
> group?
> 

I guess acceptable?

And any difference if you keep VA/SECS to unreclaimabe list? If you migrate one
enclave to another cgroup, the old EPC pages stay in the old cgroup while the
new one is charged to the new group IIUC.

I am not cgroup expert, but by searching some old thread it appears this isn't a
supported model:

https://lore.kernel.org/lkml/yeyr9181qgzt+...@mtj.duckdns.org/



Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Haitao Huang

Hi Sean

On Mon, 09 Oct 2023 19:23:04 -0500, Sean Christopherson  
 wrote:



On Mon, Oct 09, 2023, Kai Huang wrote:

On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> +/**
> + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> + * @lru:  LRU that is low
> + *
> + * Return:%true if a victim was found and kicked.
> + */
> +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> +{
> +  struct sgx_epc_page *victim;
> +
> +  spin_lock(>lock);
> +  victim = sgx_oom_get_victim(lru);
> +  spin_unlock(>lock);
> +
> +  if (!victim)
> +  return false;
> +
> +  if (victim->flags & SGX_EPC_OWNER_PAGE)
> +  return sgx_oom_encl_page(victim->encl_page);
> +
> +  if (victim->flags & SGX_EPC_OWNER_ENCL)
> +  return sgx_oom_encl(victim->encl);

I hate to bring this up, at least at this stage, but I am wondering why  
we need

to put VA and SECS pages to the unreclaimable list, but cannot keep an
"enclave_list" instead?


The motivation for tracking EPC pages instead of enclaves was so that  
the EPC
OOM-killer could "kill" VMs as well as host-owned enclaves.  The virtual  
EPC code
didn't actually kill the VM process, it instead just freed all of the  
EPC pages
and abused the SGX architecture to effectively make the guest recreate  
all its

enclaves (IIRC, QEMU does the same thing to "support" live migration).

Looks like y'all punted on that with:

  The EPC pages allocated for KVM guests by the virtual EPC driver are  
not
  reclaimable by the host kernel [5]. Therefore they are not tracked by  
any

  LRU lists for reclaiming purposes in this implementation, but they are
  charged toward the cgroup of the user processs (e.g., QEMU) launching  
the
  guest.  And when the cgroup EPC usage reaches its limit, the virtual  
EPC
  driver will stop allocating more EPC for the VM, and return SIGBUS to  
the

  user process which would abort the VM launch.

which IMO is a hack, unless returning SIGBUS is actually enforced  
somehow.  Relying
on userspace to be kind enough to kill its VMs kinda defeats the purpose  
of cgroup
enforcement.  E.g. if the hard limit for a EPC cgroup is lowered,  
userspace running
encalves in a VM could continue on and refuse to give up its EPC, and  
thus run above

its limit in perpetuity.

Cgroup would refuse to allocate more when limit is reached so VMs can not  
run above limit.


IIRC VMs only support static EPC size right now, reaching limit at launch  
means the EPC size given in command line for QEMU is not appropriate. So  
VM should not launch, hence the current behavior.


[all EPC pages in guest are allocated on page fault caused by the  
sensitization process in guest kernel during init, which is part of the VM  
Launch process. So SIGNBUS will turn into failed VM launch.]


Once it is launched, guest kernel would have 'total capacity' given by the  
static value from QEMU option. And it would start paging when it is used  
up, never would ask for more from host.


For future with dynamic EPC for running guests, QEMU could handle  
allocation failure and pass SIGBUS to the running guest kernel.  Is that  
correct understanding?



I can see userspace wanting to explicitly terminate the VM instead of  
"silently"
the VM's enclaves, but that seems like it should be a knob in the  
virtual EPC

code.


If my understanding above is correct and understanding your statement  
above correctly, then don't see we really need separate knob for vEPC  
code. Reaching a cgroup limit by a running guest (assuming dynamic  
allocation implemented) should not translate automatically killing the VM.  
Instead, it's user space job to work with guest to handle allocation  
failure. Guest could page and kill enclaves.


Haitao


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Haitao Huang

On Mon, 09 Oct 2023 20:18:00 -0500, Huang, Kai  wrote:


On Mon, 2023-10-09 at 20:04 -0500, Haitao Huang wrote:
On Mon, 09 Oct 2023 18:45:06 -0500, Huang, Kai   
wrote:


> On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > From: Sean Christopherson 
> >
> > Introduce the OOM path for killing an enclave with a reclaimer that  
is

> > no
> > longer able to reclaim enough EPC pages. Find a victim enclave,  
which

> > will be an enclave with only "unreclaimable" EPC pages left in the
> > cgroup LRU lists. Once a victim is identified, mark the enclave as  
OOM
> > and zap the enclave's entire page range, and drain all mm  
references in

> > encl->mm_list. Block allocating any EPC pages in #PF handler, or
> > reloading any pages in all paths, or creating any new mappings.
> >
> > The OOM killing path may race with the reclaimers: in some cases,  
the
> > victim enclave is in the process of reclaiming the last EPC pages  
when

> > OOM happens, that is, all pages other than SECS and VA pages are in
> > RECLAIMING_IN_PROGRESS state. The reclaiming process requires  
access to
> > the enclave backing, VA pages as well as SECS. So the OOM killer  
does

> > not directly release those enclave resources, instead, it lets all
> > reclaiming in progress to finish, and relies (as currently done) on
> > kref_put on encl->refcount to trigger sgx_encl_release() to do the
> > final cleanup.
> >
> > Signed-off-by: Sean Christopherson 
> > Co-developed-by: Kristen Carlson Accardi 
> > Signed-off-by: Kristen Carlson Accardi 
> > Co-developed-by: Haitao Huang 
> > Signed-off-by: Haitao Huang 
> > Cc: Sean Christopherson 
> > ---
> > V5:
> > - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY
> >
> > V4:
> > - Updates for patch reordering and typo fixes.
> >
> > V3:
> > - Rebased to use the new VMA_ITERATOR to zap VMAs.
> > - Fixed the racing cases by blocking new page allocation/mapping and
> > reloading when enclave is marked for OOM. And do not release any  
enclave

> > resources other than draining mm_list entries, and let pages in
> > RECLAIMING_IN_PROGRESS to be reaped by reclaimers.
> > - Due to above changes, also removed the no-longer needed  
encl->lock in
> > the OOM path which was causing deadlocks reported by the lock  
prover.

> >
>
> [...]
>
> > +
> > +/**
> > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > + * @lru: LRU that is low
> > + *
> > + * Return:   %true if a victim was found and kicked.
> > + */
> > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > +{
> > + struct sgx_epc_page *victim;
> > +
> > + spin_lock(>lock);
> > + victim = sgx_oom_get_victim(lru);
> > + spin_unlock(>lock);
> > +
> > + if (!victim)
> > + return false;
> > +
> > + if (victim->flags & SGX_EPC_OWNER_PAGE)
> > + return sgx_oom_encl_page(victim->encl_page);
> > +
> > + if (victim->flags & SGX_EPC_OWNER_ENCL)
> > + return sgx_oom_encl(victim->encl);
>
> I hate to bring this up, at least at this stage, but I am wondering  
why

> we need
> to put VA and SECS pages to the unreclaimable list, but cannot keep an
> "enclave_list" instead?
>
> So by looking the patch (" x86/sgx: Limit process EPC usage with misc
> cgroup
> controller"), if I am not missing anything, the whole "unreclaimable"
> list is
> just used to find the victim enclave when OOM needs to be done.   
Thus, I

> don't
> see why "enclave_list" cannot be used to achieve this.
>
> The reason that I am asking is because it seems using "enclave_list"  
we

> can
> simplify the code.  At least the patches related to track VA/SECS  
pages,

> and the
> SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated
> completely.
> Using "enclave_list", I guess you just need to put the enclave to the
> current
> EPC cgroup when SECS page is allocated.
>
Later the hosting process could migrated/reassigned to another cgroup?
What to do when the new cgroup is OOM?



You addressed in the documentation, no?

+Migration
+-
+
+Once an EPC page is charged to a cgroup (during allocation), it
+remains charged to the original cgroup until the page is released
+or reclaimed.  Migrating a process to a different cgroup doesn't
+move the EPC charges that it incurred while in the previous cgroup
+to its new cgroup.


Should we kill the enclave though because some VA pages may be in the new  
group?


Haitao


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Huang, Kai
On Tue, 2023-10-10 at 00:50 +, Huang, Kai wrote:
> On Mon, 2023-10-09 at 17:23 -0700, Sean Christopherson wrote:
> > On Mon, Oct 09, 2023, Kai Huang wrote:
> > > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > > > +/**
> > > > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > > > + * @lru:   LRU that is low
> > > > + *
> > > > + * Return: %true if a victim was found and kicked.
> > > > + */
> > > > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > > > +{
> > > > +   struct sgx_epc_page *victim;
> > > > +
> > > > +   spin_lock(>lock);
> > > > +   victim = sgx_oom_get_victim(lru);
> > > > +   spin_unlock(>lock);
> > > > +
> > > > +   if (!victim)
> > > > +   return false;
> > > > +
> > > > +   if (victim->flags & SGX_EPC_OWNER_PAGE)
> > > > +   return sgx_oom_encl_page(victim->encl_page);
> > > > +
> > > > +   if (victim->flags & SGX_EPC_OWNER_ENCL)
> > > > +   return sgx_oom_encl(victim->encl);
> > > 
> > > I hate to bring this up, at least at this stage, but I am wondering why 
> > > we need
> > > to put VA and SECS pages to the unreclaimable list, but cannot keep an
> > > "enclave_list" instead?
> > 
> > The motivation for tracking EPC pages instead of enclaves was so that the 
> > EPC
> > OOM-killer could "kill" VMs as well as host-owned enclaves.  
> > 
> 
> Ah this seems a fair argument. :-)
> 
> > The virtual EPC code
> > didn't actually kill the VM process, it instead just freed all of the EPC 
> > pages
> > and abused the SGX architecture to effectively make the guest recreate all 
> > its
> > enclaves (IIRC, QEMU does the same thing to "support" live migration).
> 
> It returns SIGBUS.  SGX VM live migration also requires enough EPC being able 
> to
> be allocated on the destination machine to work AFAICT.
>  
> > 
> > Looks like y'all punted on that with:
> > 
> >   The EPC pages allocated for KVM guests by the virtual EPC driver are not
> >   reclaimable by the host kernel [5]. Therefore they are not tracked by any
> >   LRU lists for reclaiming purposes in this implementation, but they are
> >   charged toward the cgroup of the user processs (e.g., QEMU) launching the
> >   guest.  And when the cgroup EPC usage reaches its limit, the virtual EPC
> >   driver will stop allocating more EPC for the VM, and return SIGBUS to the
> >   user process which would abort the VM launch.
> > 
> > which IMO is a hack, unless returning SIGBUS is actually enforced somehow.  
> > 
> 
> "enforced" do you mean?
> 
> Currently the sgx_vepc_fault() returns VM_FAULT_SIGBUS when it cannot allocate
> EPC page.  And when this happens, KVM returns KVM_PFN_ERR_FAULT in 
> hva_to_pfn(),
> which eventually results in KVM returning -EFAULT to userspace in vcpu_run(). 
> And Qemu then kills the VM with some nonsense message:
> 
> error: kvm run failed Bad address
> 
> 
> > Relying
> > on userspace to be kind enough to kill its VMs kinda defeats the purpose of 
> > cgroup
> > enforcement.  E.g. if the hard limit for a EPC cgroup is lowered, userspace 
> > running
> > encalves in a VM could continue on and refuse to give up its EPC, and thus 
> > run above
> > its limit in perpetuity.
> 
> > 
> > I can see userspace wanting to explicitly terminate the VM instead of 
> > "silently"
> > the VM's enclaves, but that seems like it should be a knob in the virtual 
> > EPC
> > code.

I guess I slightly misunderstood your words.

You mean we want to kill VM when the limit is set to be lower than virtual EPC
size.

This patch adds SGX_ENCL_NO_MEMORY.  I guess we can use it for virtual EPC too?

In the sgx_vepc_fault(), we check this flag at early time and return SIGBUS if
it is set.

But this also requires keeping virtual EPC pages in some list, and handles them
in sgx_epc_oom() too.

And for virtual EPC pages, I guess the "young" logic can be applied thus
probably it's better to keep the actual virtual EPC pages to a (separate?) list
instead of keeping the virtual EPC instance.

struct sgx_epc_lru {
struct list_head reclaimable;
struct sgx_encl *enclaves;
struct list_head vepc_pages;
}

Or still tracking VA/SECS and virtual EPC pages in a single unrecliamable list?

I don't know :-)


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Huang, Kai
On Mon, 2023-10-09 at 20:04 -0500, Haitao Huang wrote:
> On Mon, 09 Oct 2023 18:45:06 -0500, Huang, Kai  wrote:
> 
> > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > > From: Sean Christopherson 
> > > 
> > > Introduce the OOM path for killing an enclave with a reclaimer that is  
> > > no
> > > longer able to reclaim enough EPC pages. Find a victim enclave, which
> > > will be an enclave with only "unreclaimable" EPC pages left in the
> > > cgroup LRU lists. Once a victim is identified, mark the enclave as OOM
> > > and zap the enclave's entire page range, and drain all mm references in
> > > encl->mm_list. Block allocating any EPC pages in #PF handler, or
> > > reloading any pages in all paths, or creating any new mappings.
> > > 
> > > The OOM killing path may race with the reclaimers: in some cases, the
> > > victim enclave is in the process of reclaiming the last EPC pages when
> > > OOM happens, that is, all pages other than SECS and VA pages are in
> > > RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to
> > > the enclave backing, VA pages as well as SECS. So the OOM killer does
> > > not directly release those enclave resources, instead, it lets all
> > > reclaiming in progress to finish, and relies (as currently done) on
> > > kref_put on encl->refcount to trigger sgx_encl_release() to do the
> > > final cleanup.
> > > 
> > > Signed-off-by: Sean Christopherson 
> > > Co-developed-by: Kristen Carlson Accardi 
> > > Signed-off-by: Kristen Carlson Accardi 
> > > Co-developed-by: Haitao Huang 
> > > Signed-off-by: Haitao Huang 
> > > Cc: Sean Christopherson 
> > > ---
> > > V5:
> > > - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY
> > > 
> > > V4:
> > > - Updates for patch reordering and typo fixes.
> > > 
> > > V3:
> > > - Rebased to use the new VMA_ITERATOR to zap VMAs.
> > > - Fixed the racing cases by blocking new page allocation/mapping and
> > > reloading when enclave is marked for OOM. And do not release any enclave
> > > resources other than draining mm_list entries, and let pages in
> > > RECLAIMING_IN_PROGRESS to be reaped by reclaimers.
> > > - Due to above changes, also removed the no-longer needed encl->lock in
> > > the OOM path which was causing deadlocks reported by the lock prover.
> > > 
> > 
> > [...]
> > 
> > > +
> > > +/**
> > > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > > + * @lru: LRU that is low
> > > + *
> > > + * Return:   %true if a victim was found and kicked.
> > > + */
> > > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > > +{
> > > + struct sgx_epc_page *victim;
> > > +
> > > + spin_lock(>lock);
> > > + victim = sgx_oom_get_victim(lru);
> > > + spin_unlock(>lock);
> > > +
> > > + if (!victim)
> > > + return false;
> > > +
> > > + if (victim->flags & SGX_EPC_OWNER_PAGE)
> > > + return sgx_oom_encl_page(victim->encl_page);
> > > +
> > > + if (victim->flags & SGX_EPC_OWNER_ENCL)
> > > + return sgx_oom_encl(victim->encl);
> > 
> > I hate to bring this up, at least at this stage, but I am wondering why  
> > we need
> > to put VA and SECS pages to the unreclaimable list, but cannot keep an
> > "enclave_list" instead?
> > 
> > So by looking the patch (" x86/sgx: Limit process EPC usage with misc  
> > cgroup
> > controller"), if I am not missing anything, the whole "unreclaimable"  
> > list is
> > just used to find the victim enclave when OOM needs to be done.  Thus, I  
> > don't
> > see why "enclave_list" cannot be used to achieve this.
> > 
> > The reason that I am asking is because it seems using "enclave_list" we  
> > can
> > simplify the code.  At least the patches related to track VA/SECS pages,  
> > and the
> > SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated  
> > completely.  
> > Using "enclave_list", I guess you just need to put the enclave to the  
> > current
> > EPC cgroup when SECS page is allocated.
> > 
> Later the hosting process could migrated/reassigned to another cgroup?
> What to do when the new cgroup is OOM?
> 

You addressed in the documentation, no?

+Migration
+-
+
+Once an EPC page is charged to a cgroup (during allocation), it
+remains charged to the original cgroup until the page is released
+or reclaimed.  Migrating a process to a different cgroup doesn't
+move the EPC charges that it incurred while in the previous cgroup
+to its new cgroup.


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Haitao Huang

On Mon, 09 Oct 2023 18:45:06 -0500, Huang, Kai  wrote:


On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:

From: Sean Christopherson 

Introduce the OOM path for killing an enclave with a reclaimer that is  
no

longer able to reclaim enough EPC pages. Find a victim enclave, which
will be an enclave with only "unreclaimable" EPC pages left in the
cgroup LRU lists. Once a victim is identified, mark the enclave as OOM
and zap the enclave's entire page range, and drain all mm references in
encl->mm_list. Block allocating any EPC pages in #PF handler, or
reloading any pages in all paths, or creating any new mappings.

The OOM killing path may race with the reclaimers: in some cases, the
victim enclave is in the process of reclaiming the last EPC pages when
OOM happens, that is, all pages other than SECS and VA pages are in
RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to
the enclave backing, VA pages as well as SECS. So the OOM killer does
not directly release those enclave resources, instead, it lets all
reclaiming in progress to finish, and relies (as currently done) on
kref_put on encl->refcount to trigger sgx_encl_release() to do the
final cleanup.

Signed-off-by: Sean Christopherson 
Co-developed-by: Kristen Carlson Accardi 
Signed-off-by: Kristen Carlson Accardi 
Co-developed-by: Haitao Huang 
Signed-off-by: Haitao Huang 
Cc: Sean Christopherson 
---
V5:
- Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY

V4:
- Updates for patch reordering and typo fixes.

V3:
- Rebased to use the new VMA_ITERATOR to zap VMAs.
- Fixed the racing cases by blocking new page allocation/mapping and
reloading when enclave is marked for OOM. And do not release any enclave
resources other than draining mm_list entries, and let pages in
RECLAIMING_IN_PROGRESS to be reaped by reclaimers.
- Due to above changes, also removed the no-longer needed encl->lock in
the OOM path which was causing deadlocks reported by the lock prover.



[...]


+
+/**
+ * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
+ * @lru:   LRU that is low
+ *
+ * Return: %true if a victim was found and kicked.
+ */
+bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
+{
+   struct sgx_epc_page *victim;
+
+   spin_lock(>lock);
+   victim = sgx_oom_get_victim(lru);
+   spin_unlock(>lock);
+
+   if (!victim)
+   return false;
+
+   if (victim->flags & SGX_EPC_OWNER_PAGE)
+   return sgx_oom_encl_page(victim->encl_page);
+
+   if (victim->flags & SGX_EPC_OWNER_ENCL)
+   return sgx_oom_encl(victim->encl);


I hate to bring this up, at least at this stage, but I am wondering why  
we need

to put VA and SECS pages to the unreclaimable list, but cannot keep an
"enclave_list" instead?

So by looking the patch (" x86/sgx: Limit process EPC usage with misc  
cgroup
controller"), if I am not missing anything, the whole "unreclaimable"  
list is
just used to find the victim enclave when OOM needs to be done.  Thus, I  
don't

see why "enclave_list" cannot be used to achieve this.

The reason that I am asking is because it seems using "enclave_list" we  
can
simplify the code.  At least the patches related to track VA/SECS pages,  
and the
SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated  
completely.  
Using "enclave_list", I guess you just need to put the enclave to the  
current

EPC cgroup when SECS page is allocated.


Later the hosting process could migrated/reassigned to another cgroup?
What to do when the new cgroup is OOM?

Thanks
Haitao


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Huang, Kai
On Mon, 2023-10-09 at 17:23 -0700, Sean Christopherson wrote:
> On Mon, Oct 09, 2023, Kai Huang wrote:
> > On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > > +/**
> > > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > > + * @lru: LRU that is low
> > > + *
> > > + * Return:   %true if a victim was found and kicked.
> > > + */
> > > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > > +{
> > > + struct sgx_epc_page *victim;
> > > +
> > > + spin_lock(>lock);
> > > + victim = sgx_oom_get_victim(lru);
> > > + spin_unlock(>lock);
> > > +
> > > + if (!victim)
> > > + return false;
> > > +
> > > + if (victim->flags & SGX_EPC_OWNER_PAGE)
> > > + return sgx_oom_encl_page(victim->encl_page);
> > > +
> > > + if (victim->flags & SGX_EPC_OWNER_ENCL)
> > > + return sgx_oom_encl(victim->encl);
> > 
> > I hate to bring this up, at least at this stage, but I am wondering why we 
> > need
> > to put VA and SECS pages to the unreclaimable list, but cannot keep an
> > "enclave_list" instead?
> 
> The motivation for tracking EPC pages instead of enclaves was so that the EPC
> OOM-killer could "kill" VMs as well as host-owned enclaves.  
> 

Ah this seems a fair argument. :-)

> The virtual EPC code
> didn't actually kill the VM process, it instead just freed all of the EPC 
> pages
> and abused the SGX architecture to effectively make the guest recreate all its
> enclaves (IIRC, QEMU does the same thing to "support" live migration).

It returns SIGBUS.  SGX VM live migration also requires enough EPC being able to
be allocated on the destination machine to work AFAICT.
 
> 
> Looks like y'all punted on that with:
> 
>   The EPC pages allocated for KVM guests by the virtual EPC driver are not
>   reclaimable by the host kernel [5]. Therefore they are not tracked by any
>   LRU lists for reclaiming purposes in this implementation, but they are
>   charged toward the cgroup of the user processs (e.g., QEMU) launching the
>   guest.  And when the cgroup EPC usage reaches its limit, the virtual EPC
>   driver will stop allocating more EPC for the VM, and return SIGBUS to the
>   user process which would abort the VM launch.
> 
> which IMO is a hack, unless returning SIGBUS is actually enforced somehow.  
> 

"enforced" do you mean?

Currently the sgx_vepc_fault() returns VM_FAULT_SIGBUS when it cannot allocate
EPC page.  And when this happens, KVM returns KVM_PFN_ERR_FAULT in hva_to_pfn(),
which eventually results in KVM returning -EFAULT to userspace in vcpu_run(). 
And Qemu then kills the VM with some nonsense message:

error: kvm run failed Bad address


> Relying
> on userspace to be kind enough to kill its VMs kinda defeats the purpose of 
> cgroup
> enforcement.  E.g. if the hard limit for a EPC cgroup is lowered, userspace 
> running
> encalves in a VM could continue on and refuse to give up its EPC, and thus 
> run above
> its limit in perpetuity.

Agreed.  But this looks cannot resolved until we can reclaim EPC page from VM.

Or in the EPC cgroup code we can refuse to set the maximum which cannot be
supported, e.g., less the total virtual EPC size.

I guess the second is acceptable for now?

> 
> I can see userspace wanting to explicitly terminate the VM instead of 
> "silently"
> the VM's enclaves, but that seems like it should be a knob in the virtual EPC
> code.

See above for the second option.



Re: [PATCH 01/10] appletalk: remove localtalk and ppp support

2023-10-09 Thread Jakub Kicinski
On Mon,  9 Oct 2023 16:18:59 +0200 Arnd Bergmann wrote:
> The last localtalk driver is gone now, and ppp support was never fully
> merged, so clean up the appletalk code by removing the obvious dead
> code paths.
> 
> Notably, this removes one of the two callers of the old .ndo_do_ioctl()
> callback that was abused for getting device addresses and is now
> only used in the ieee802154 subsystem, which still uses the same trick.
> 
> The include/uapi/linux/if_ltalk.h header might still be required
> for building userspace programs, but I made sure that debian code
> search and the netatalk upstream have no references it it, so it
> should be fine to remove.

Looks like it depends on the ipddp driver removal.
Could you repost once that one is merged (~tomorrow)?
-- 
pw-bot: cr


Re: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

2023-10-09 Thread Huang, Kai

> @@ -332,6 +336,7 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, 
> size_t nr_to_scan,
>   * sgx_reclaim_epc_pages() - Reclaim EPC pages from the consumers
>   * @nr_to_scan:   Number of EPC pages to scan for reclaim
>   * @ignore_age:   Reclaim a page even if it is young
> + * @epc_cg:   EPC cgroup from which to reclaim
>   *
>   * Take a fixed number of pages from the head of the active page pool and
>   * reclaim them to the enclave's private shmem files. Skip the pages, which 
> have
> @@ -345,7 +350,8 @@ void sgx_isolate_epc_pages(struct sgx_epc_lru_lists *lru, 
> size_t nr_to_scan,
>   * problematic as it would increase the lock contention too much, which would
>   * halt forward progress.
>   */
> -size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age)
> +size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool ignore_age,
> +  struct sgx_epc_cgroup *epc_cg)
>  {
>   struct sgx_backing backing[SGX_NR_TO_SCAN_MAX];
>   struct sgx_epc_page *epc_page, *tmp;
> @@ -355,7 +361,15 @@ size_t sgx_reclaim_epc_pages(size_t nr_to_scan, bool 
> ignore_age)
>   LIST_HEAD(iso);
>   size_t ret, i;
>  
> - sgx_isolate_epc_pages(_global_lru, nr_to_scan, );
> + /*
> +  * If a specific cgroup is not being targeted, take from the global
> +  * list first, even when cgroups are enabled.  If there are
> +  * pages on the global LRU then they should get reclaimed asap.
> +  */
> + if (!IS_ENABLED(CONFIG_CGROUP_SGX_EPC) || !epc_cg)
> + sgx_isolate_epc_pages(_global_lru, _to_scan, );
> +
> + sgx_epc_cgroup_isolate_pages(epc_cg, _to_scan, );

(I wish such code can be somehow moved to the earlier patches, so that we can
get early idea that how sgx_reclaim_epc_pages() is supposed to be used.)

So here when we are not targeting a specific EPC cgroup, we always reclaim from
the global list first, ...

[...]

>  
>   if (list_empty())
>   return 0;
> @@ -423,7 +437,7 @@ static bool sgx_should_reclaim(unsigned long watermark)
>  void sgx_reclaim_direct(void)
>  {
>   if (sgx_should_reclaim(SGX_NR_LOW_PAGES))
> - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
> + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);

... and we always try to reclaim the global list first when directly reclaim is
desired, even the enclave is within some EPC cgroup.  ... 

>  }
>  
>  static int ksgxd(void *p)
> @@ -446,7 +460,7 @@ static int ksgxd(void *p)
>sgx_should_reclaim(SGX_NR_HIGH_PAGES));
>  
>   if (sgx_should_reclaim(SGX_NR_HIGH_PAGES))
> - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
> + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);

... and in ksgxd() as well, which I guess is somehow acceptable.  ...

>  
>   cond_resched();
>   }
> @@ -600,6 +614,11 @@ int sgx_drop_epc_page(struct sgx_epc_page *page)
>  struct sgx_epc_page *sgx_alloc_epc_page(void *owner, bool reclaim)
>  {
>   struct sgx_epc_page *page;
> + struct sgx_epc_cgroup *epc_cg;
> +
> + epc_cg = sgx_epc_cgroup_try_charge(reclaim);
> + if (IS_ERR(epc_cg))
> + return ERR_CAST(epc_cg);
>  
>   for ( ; ; ) {
>   page = __sgx_alloc_epc_page();
> @@ -608,8 +627,10 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, 
> bool reclaim)
>   break;
>   }
>  
> - if (!sgx_can_reclaim())
> - return ERR_PTR(-ENOMEM);
> + if (!sgx_can_reclaim()) {
> + page = ERR_PTR(-ENOMEM);
> + break;
> + }
>  
>   if (!reclaim) {
>   page = ERR_PTR(-EBUSY);
> @@ -621,10 +642,17 @@ struct sgx_epc_page *sgx_alloc_epc_page(void *owner, 
> bool reclaim)
>   break;
>   }
>  
> - sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false);
> + sgx_reclaim_epc_pages(SGX_NR_TO_SCAN, false, NULL);

... and when an EPC page is allocated, no matter whether the EPC page belongs to
any cgroup or not.

When we are allocating EPC page for one enclave, if that enclave belongs to some
cgroup, is it more reasonable to reclaim EPC pages from it's own group (and the
children under it)?

You already got the current EPC cgroup at the beginning of sgx_alloc_epc_page()
when you want to charge the EPC allocation.

>   cond_resched();
>   }
>  


Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Sean Christopherson
On Mon, Oct 09, 2023, Kai Huang wrote:
> On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> > +/**
> > + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> > + * @lru:   LRU that is low
> > + *
> > + * Return: %true if a victim was found and kicked.
> > + */
> > +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> > +{
> > +   struct sgx_epc_page *victim;
> > +
> > +   spin_lock(>lock);
> > +   victim = sgx_oom_get_victim(lru);
> > +   spin_unlock(>lock);
> > +
> > +   if (!victim)
> > +   return false;
> > +
> > +   if (victim->flags & SGX_EPC_OWNER_PAGE)
> > +   return sgx_oom_encl_page(victim->encl_page);
> > +
> > +   if (victim->flags & SGX_EPC_OWNER_ENCL)
> > +   return sgx_oom_encl(victim->encl);
> 
> I hate to bring this up, at least at this stage, but I am wondering why we 
> need
> to put VA and SECS pages to the unreclaimable list, but cannot keep an
> "enclave_list" instead?

The motivation for tracking EPC pages instead of enclaves was so that the EPC
OOM-killer could "kill" VMs as well as host-owned enclaves.  The virtual EPC 
code
didn't actually kill the VM process, it instead just freed all of the EPC pages
and abused the SGX architecture to effectively make the guest recreate all its
enclaves (IIRC, QEMU does the same thing to "support" live migration).

Looks like y'all punted on that with:

  The EPC pages allocated for KVM guests by the virtual EPC driver are not
  reclaimable by the host kernel [5]. Therefore they are not tracked by any
  LRU lists for reclaiming purposes in this implementation, but they are
  charged toward the cgroup of the user processs (e.g., QEMU) launching the
  guest.  And when the cgroup EPC usage reaches its limit, the virtual EPC
  driver will stop allocating more EPC for the VM, and return SIGBUS to the
  user process which would abort the VM launch.

which IMO is a hack, unless returning SIGBUS is actually enforced somehow.  
Relying
on userspace to be kind enough to kill its VMs kinda defeats the purpose of 
cgroup
enforcement.  E.g. if the hard limit for a EPC cgroup is lowered, userspace 
running
encalves in a VM could continue on and refuse to give up its EPC, and thus run 
above
its limit in perpetuity.

I can see userspace wanting to explicitly terminate the VM instead of "silently"
the VM's enclaves, but that seems like it should be a knob in the virtual EPC
code.


Re: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

2023-10-09 Thread Huang, Kai

> +static inline struct sgx_epc_lru_lists *epc_cg_lru(struct sgx_epc_cgroup 
> *epc_cg)
> +{
> + if (epc_cg)
> + return _cg->lru;
> + return NULL;
> +}
> 

It's legal to return NULL EPC cgroup for a given EPC page, i.e., when the
enclave isn't assigned to any cgroup.  But ...

>  
>  static inline struct sgx_epc_lru_lists *sgx_lru_lists(struct sgx_epc_page 
> *epc_page)
>  {
> + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC))
> + return epc_cg_lru(epc_page->epc_cg);
> +
>   return _global_lru;
>  }

... here is it legal to return a NULL LRU list?

It appears you always want to return a valid LRU list.  That is, if EPC cgroup
is enabled, and when the EPC page doesn't belong to any cgroup, then you want to
return the sgx_global_lru ?



Re: [PATCH v5 16/18] x86/sgx: Limit process EPC usage with misc cgroup controller

2023-10-09 Thread Huang, Kai

> +/**
> + * sgx_epc_cgroup_lru_empty() - check if a cgroup tree has no pages on its 
> lrus
> + * @root:root of the tree to check
> + *
> + * Return: %true if all cgroups under the specified root have empty LRU 
> lists.
> + * Used to avoid livelocks due to a cgroup having a non-zero charge count but
> + * no pages on its LRUs, e.g. due to a dead enclave waiting to be released or
> + * because all pages in the cgroup are unreclaimable.
> + */
> +bool sgx_epc_cgroup_lru_empty(struct sgx_epc_cgroup *root)
> +{
> + struct cgroup_subsys_state *css_root;
> + struct cgroup_subsys_state *pos;
> + struct sgx_epc_cgroup *epc_cg;
> + bool ret = true;
> +
> + /*
> +  * Caller ensure css_root ref acquired
> +  */
> + css_root = root ? >cg->css : &(misc_cg_root()->css);
> +
> + rcu_read_lock();
> + css_for_each_descendant_pre(pos, css_root) {
> + if (!css_tryget(pos))
> + break;
> +
> + rcu_read_unlock();
> +
> + epc_cg = sgx_epc_cgroup_from_misc_cg(css_misc(pos));
> +
> + spin_lock(_cg->lru.lock);
> + ret = list_empty(_cg->lru.reclaimable);
> + spin_unlock(_cg->lru.lock);
> +
> + rcu_read_lock();
> + css_put(pos);
> + if (!ret)
> + break;
> + }
> +
> + rcu_read_unlock();
> +
> + return ret;
> +}
> 

[...]

> 
>  static inline bool sgx_can_reclaim(void)
>  {
> + if (IS_ENABLED(CONFIG_CGROUP_SGX_EPC))
> + return !sgx_epc_cgroup_lru_empty(NULL);
> +

Is it better to keep a root sgx_epc_cgroup and pass the root instead of NULL?

>   return !list_empty(_global_lru.reclaimable);
>  }
>  


[ANNOUNCE] 5.10.197-rt96

2023-10-09 Thread Luis Claudio R. Goncalves
Hello RT-list!

I'm pleased to announce the 5.10.197-rt96 stable release.

This release is just an update to the new stable 5.10.197 version
and no RT-specific changes have been performed.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v5.10-rt
  Head SHA1: 2140f78d1da2b7ba8913fb8d009d6b394bf1b813

Or to build 5.10.197-rt96 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v5.x/linux-5.10.tar.xz

  https://www.kernel.org/pub/linux/kernel/v5.x/patch-5.10.197.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/5.10/older/patch-5.10.197-rt96.patch.xz

Signing key fingerprint:

  9354 0649 9972 8D31 D464  D140 F394 A423 F8E6 7C26

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Luis



Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

2023-10-09 Thread Huang, Kai
On Fri, 2023-09-22 at 20:06 -0700, Haitao Huang wrote:
> From: Sean Christopherson 
> 
> Introduce the OOM path for killing an enclave with a reclaimer that is no
> longer able to reclaim enough EPC pages. Find a victim enclave, which
> will be an enclave with only "unreclaimable" EPC pages left in the
> cgroup LRU lists. Once a victim is identified, mark the enclave as OOM
> and zap the enclave's entire page range, and drain all mm references in
> encl->mm_list. Block allocating any EPC pages in #PF handler, or
> reloading any pages in all paths, or creating any new mappings.
> 
> The OOM killing path may race with the reclaimers: in some cases, the
> victim enclave is in the process of reclaiming the last EPC pages when
> OOM happens, that is, all pages other than SECS and VA pages are in
> RECLAIMING_IN_PROGRESS state. The reclaiming process requires access to
> the enclave backing, VA pages as well as SECS. So the OOM killer does
> not directly release those enclave resources, instead, it lets all
> reclaiming in progress to finish, and relies (as currently done) on
> kref_put on encl->refcount to trigger sgx_encl_release() to do the
> final cleanup.
> 
> Signed-off-by: Sean Christopherson 
> Co-developed-by: Kristen Carlson Accardi 
> Signed-off-by: Kristen Carlson Accardi 
> Co-developed-by: Haitao Huang 
> Signed-off-by: Haitao Huang 
> Cc: Sean Christopherson 
> ---
> V5:
> - Rename SGX_ENCL_OOM to SGX_ENCL_NO_MEMORY
> 
> V4:
> - Updates for patch reordering and typo fixes.
> 
> V3:
> - Rebased to use the new VMA_ITERATOR to zap VMAs.
> - Fixed the racing cases by blocking new page allocation/mapping and
> reloading when enclave is marked for OOM. And do not release any enclave
> resources other than draining mm_list entries, and let pages in
> RECLAIMING_IN_PROGRESS to be reaped by reclaimers.
> - Due to above changes, also removed the no-longer needed encl->lock in
> the OOM path which was causing deadlocks reported by the lock prover.
> 

[...]

> +
> +/**
> + * sgx_epc_oom() - invoke EPC out-of-memory handling on target LRU
> + * @lru: LRU that is low
> + *
> + * Return:   %true if a victim was found and kicked.
> + */
> +bool sgx_epc_oom(struct sgx_epc_lru_lists *lru)
> +{
> + struct sgx_epc_page *victim;
> +
> + spin_lock(>lock);
> + victim = sgx_oom_get_victim(lru);
> + spin_unlock(>lock);
> +
> + if (!victim)
> + return false;
> +
> + if (victim->flags & SGX_EPC_OWNER_PAGE)
> + return sgx_oom_encl_page(victim->encl_page);
> +
> + if (victim->flags & SGX_EPC_OWNER_ENCL)
> + return sgx_oom_encl(victim->encl);

I hate to bring this up, at least at this stage, but I am wondering why we need
to put VA and SECS pages to the unreclaimable list, but cannot keep an
"enclave_list" instead?

So by looking the patch (" x86/sgx: Limit process EPC usage with misc cgroup
controller"), if I am not missing anything, the whole "unreclaimable" list is
just used to find the victim enclave when OOM needs to be done.  Thus, I don't
see why "enclave_list" cannot be used to achieve this.

The reason that I am asking is because it seems using "enclave_list" we can
simplify the code.  At least the patches related to track VA/SECS pages, and the
SGX_EPC_OWNER_PAGE/SGX_EPC_OWNER_ENCL thing can be eliminated completely.  

Using "enclave_list", I guess you just need to put the enclave to the current
EPC cgroup when SECS page is allocated.

In fact, putting "unreclaimable" list to LRU itself is a little bit confusing
because: 1) you cannot really reclaim anything from the list; 2) VA/SECS pages
don't have the concept of "young" at all, thus makes no sense to annotate they
as LRU.

Thus putting VA/SECS to "unreclaimable" list, instead of keeping an
"enclave_list" seems won't have any benefit but will only make the code more
complicated.

Or am I missing anything?


Re: [PATCH 01/10] appletalk: remove localtalk and ppp support

2023-10-09 Thread Stephen Hemminger
On Mon, 9 Oct 2023 18:49:43 +0200
Rodolfo Zitellini  wrote:

> Hi!
> I’ve been working on a new LocalTalk interface driver for the last couple 
> months, do you think it would be possible to at least postpone the removal of 
> LT a bit?
> 
> It is a driver for an open source device called TashTalk 
> (https://github.com/lampmerchant/tashtalk), which runs on a PIC micro that 
> does all the LT interfacing, and communicates back via serial to the host 
> system. My driver is relatively simple and works very well with netatalk 2.2 
> (which is still maintained and still has support for AppleTalk). The driver 
> is basically complete and trsted and I was preparing to submit a patch.
> 
> Still having LocalTalk in my view has many advantages for us enthusiasts that 
> still want to bridge old machines to the current world without modifications, 
> for example for printing on modern printers, netbooting, sharing files and 
> even tcp/ip. All this basically works out of the box via the driver, Linux 
> and available userspace tools (netatalk, macipgw).
> 
> The old ISA cards supported by COPS were basically unobtanium even 20 years 
> ago, but the solution of using a PIC and a serial port is very robust and 
> much more furure-proof. We also already have a device that can interface a 
> modern machine directly via USB to LocalTalk.
> 
> The development of the TashTalk has been also extensively discussed on thr 
> 68KMLA forum 
> (https://68kmla.org/bb/index.php?threads/modtashtalk-lt0-driver-for-linux.45031/)
> 
> I hope the decision to remove LocalTalk can be reconsidered at least for the 
> time being so there is a chance to submit a new, modern device making use of 
> this stack.
> 
> Many Thanks,
> Rodolfo Zitellini

Does it really need it to be a kernel protocol stack?
What about doing it in userspace or with BPF?


[ANNOUNCE] 4.14.326-rt155

2023-10-09 Thread Luis Claudio R. Goncalves
Hello RT-list!

I'm pleased to announce the 4.14.326-rt155 stable release.

This release is an update to the new stable 4.14.326-rt155
version and no RT-specific changes have been performed.

You can get this release via the git tree at:

  git://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git

  branch: v4.14-rt
  Head SHA1: 1d10d09bd19b5eb5e90c5585afb308207e9c683f

Or to build 4.14.326-rt155 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.14.tar.xz

  https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.14.326.xz

  
https://www.kernel.org/pub/linux/kernel/projects/rt/4.14/older/patch-4.14.326-rt155.patch.xz

Signing key fingerprint:

  9354 0649 9972 8D31 D464  D140 F394 A423 F8E6 7C26

All keys used for the above files and repositories can be found on the
following git repository:

   git://git.kernel.org/pub/scm/docs/kernel/pgpkeys.git

Enjoy!
Luis



Re: [PATCH] KVM: deprecate KVM_WERROR in favor of general WERROR

2023-10-09 Thread Jakub Kicinski
On Mon, 9 Oct 2023 12:33:53 -0700 Sean Christopherson wrote:
> > We do have sympathy for these folks, we are mostly volunteers after
> > all. At the same time someone's under-investment should not be causing
> > pain to those of us who _do_ build test stuff carefully.  
> 
> This is a bit over the top.  Yeah, I need to add W=1 to my build scripts, but 
> that's
> not a lack of investment, just an oversight.  Though in this case it likely 
> wouldn't
> have made any difference since Paolo grabbed the patches directly and might 
> have
> even bypassed linux-next.  But again I would argue that's bad process, not a 
> lack
> of investment.

If you do invest in build testing automation, why can't your automation
count warnings rather than depend on WERROR? I don't understand.

> > Rather than tweak stuff I'd prefer if we could agree that local -Werror
> > is anti-social :(
> > 
> > The global WERROR seems to be a good compromise.  
> 
> I disagree.  WERROR simply doesn't provide the same coverage.  E.g. it can't 
> be
> enabled for i386 without tuning FRAME_WARN, which (a) won't be at all obvious 
> to
> the average contributor and (b) increasing FRAME_WARN effectively reduces the
> test coverage of KVM i386.
> 
> For KVM x86, I want the rules for contributing to be clearly documented, and 
> as
> simple as possible.  I don't see a sane way to achieve that with WERROR=y.

Linus, you created the global WERROR option. Do you have an opinion
on whether random subsystems should create their own WERROR flags?
W=1 warning got in thru KVM and since they have a KVM_WERROR which
defaults to enabled it broke build testing in networking.
Randomly sprinkled -Werrors are fragile. Can we ask people to stop
using them now that the global ERROR exists?


Re: [PATCH 7/7] arm64: dts: qcom: Add support for Xiaomi Redmi Note 9S

2023-10-09 Thread Konrad Dybcio




On 10/7/23 15:58, David Wronek wrote:

From: Joe Mason 

Add a device tree for the Xiaomi Redmi Note 9S (curtana) phone, based on
sm7125-xiaomi-common.dtsi.

Signed-off-by: Joe Mason 
Signed-off-by: David Wronek 
---
  arch/arm64/boot/dts/qcom/Makefile|  1 +
  .../boot/dts/qcom/sm7125-xiaomi-curtana.dts  | 16 
  2 files changed, 17 insertions(+)
  create mode 100644 arch/arm64/boot/dts/qcom/sm7125-xiaomi-curtana.dts

diff --git a/arch/arm64/boot/dts/qcom/Makefile 
b/arch/arm64/boot/dts/qcom/Makefile
index d6cb840b7050..57974fb0c580 100644
--- a/arch/arm64/boot/dts/qcom/Makefile
+++ b/arch/arm64/boot/dts/qcom/Makefile
@@ -207,6 +207,7 @@ dtb-$(CONFIG_ARCH_QCOM) += 
sm6125-sony-xperia-seine-pdx201.dtb
  dtb-$(CONFIG_ARCH_QCOM)   += sm6125-xiaomi-laurel-sprout.dtb
  dtb-$(CONFIG_ARCH_QCOM)   += sm6350-sony-xperia-lena-pdx213.dtb
  dtb-$(CONFIG_ARCH_QCOM)   += sm6375-sony-xperia-murray-pdx225.dtb
+dtb-$(CONFIG_ARCH_QCOM)+= sm7125-xiaomi-curtana.dtb
  dtb-$(CONFIG_ARCH_QCOM)   += sm7125-xiaomi-joyeuse.dtb
  dtb-$(CONFIG_ARCH_QCOM)   += sm7225-fairphone-fp4.dtb
  dtb-$(CONFIG_ARCH_QCOM)   += sm8150-hdk.dtb
diff --git a/arch/arm64/boot/dts/qcom/sm7125-xiaomi-curtana.dts 
b/arch/arm64/boot/dts/qcom/sm7125-xiaomi-curtana.dts
new file mode 100644
index ..12f517a8492c
--- /dev/null
+++ b/arch/arm64/boot/dts/qcom/sm7125-xiaomi-curtana.dts
@@ -0,0 +1,16 @@
+// SPDX-License-Identifier: GPL-2.0
I have no idea if you can just include a bunch of bsd3 source and slap 
gpl2 atop your changes.. Somebody else could probably chime in


Konrad


Re: [PATCH 6/7] arm64: dts: qcom: sm7125-xiaomi-common: Add UFS nodes

2023-10-09 Thread Konrad Dybcio




On 10/7/23 15:58, David Wronek wrote:

Enable the UFS found on the SM7125 Xiaomi smartphones.

Signed-off-by: David Wronek 
---

Reviewed-by: Konrad Dybcio 

Konrad


[PATCH] remoteproc: st: Use device_get_match_data()

2023-10-09 Thread Rob Herring
Use preferred device_get_match_data() instead of of_match_device() to
get the driver match data. With this, adjust the includes to explicitly
include the correct headers.

Signed-off-by: Rob Herring 
---
 drivers/remoteproc/st_remoteproc.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/remoteproc/st_remoteproc.c 
b/drivers/remoteproc/st_remoteproc.c
index e3ce01d98b4c..b0638f984842 100644
--- a/drivers/remoteproc/st_remoteproc.c
+++ b/drivers/remoteproc/st_remoteproc.c
@@ -16,10 +16,9 @@
 #include 
 #include 
 #include 
-#include 
-#include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -341,7 +340,6 @@ static int st_rproc_parse_dt(struct platform_device *pdev)
 static int st_rproc_probe(struct platform_device *pdev)
 {
struct device *dev = >dev;
-   const struct of_device_id *match;
struct st_rproc *ddata;
struct device_node *np = dev->of_node;
struct rproc *rproc;
@@ -349,19 +347,15 @@ static int st_rproc_probe(struct platform_device *pdev)
int enabled;
int ret, i;
 
-   match = of_match_device(st_rproc_match, dev);
-   if (!match || !match->data) {
-   dev_err(dev, "No device match found\n");
-   return -ENODEV;
-   }
-
rproc = rproc_alloc(dev, np->name, _rproc_ops, NULL, sizeof(*ddata));
if (!rproc)
return -ENOMEM;
 
rproc->has_iommu = false;
ddata = rproc->priv;
-   ddata->config = (struct st_rproc_config *)match->data;
+   ddata->config = (struct st_rproc_config *)device_get_match_data(dev);
+   if (!ddata->config)
+   goto free_rproc;
 
platform_set_drvdata(pdev, rproc);
 
-- 
2.42.0



Re: [PATCH 5/7] arm64: dts: qcom: sc7180: Add UFS nodes

2023-10-09 Thread Konrad Dybcio




On 10/7/23 15:58, David Wronek wrote:

Add the UFS and QMP PHY nodes for the Qualcomm SC7180 SoC.

Signed-off-by: David Wronek 
---
  arch/arm64/boot/dts/qcom/sc7180.dtsi | 70 
  1 file changed, 70 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi 
b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index 11f353d416b4..9f18be4fd61a 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -1532,6 +1532,76 @@ mmss_noc: interconnect@174 {
qcom,bcm-voters = <_bcm_voter>;
};
  
+		ufs_mem_hc: ufshc@1d84000 {

+   compatible = "qcom,sc7180-ufshc", "qcom,ufshc",
+"jedec,ufs-2.0";
+   reg = <0 0x01d84000 0 0x3000>,
+ <0 0x01d9 0 0x8000>;
+   reg-names = "std", "ice";

Recently the ICE was separated into its own node


+   interrupts = ;
+   phys = <_mem_phy>;
+   phy-names = "ufsphy";
+   lanes-per-direction = <1>;
+   power-domains = < UFS_PHY_GDSC>;
+   #reset-cells = <1>;
+   resets = < GCC_UFS_PHY_BCR>;
+   reset-names = "rst";
+
+   iommus = <_smmu 0xa0 0x0>;
+
+   clock-names =
+   "core_clk",

Remove the newline after the =, here and below


+   "bus_aggr_clk",
+   "iface_clk",
+   "core_clk_unipro",
+   "ref_clk",
+   "tx_lane0_sync_clk",
+   "rx_lane0_sync_clk",
+   "ice_core_clk";
+   clocks =
+   < GCC_UFS_PHY_AXI_CLK>,
+   < GCC_AGGRE_UFS_PHY_AXI_CLK>,
+   < GCC_UFS_PHY_AHB_CLK>,
+   < GCC_UFS_PHY_UNIPRO_CORE_CLK>,
+   < RPMH_CXO_CLK>,
+   < GCC_UFS_PHY_TX_SYMBOL_0_CLK>,
+   < GCC_UFS_PHY_RX_SYMBOL_0_CLK>,
+   < GCC_UFS_PHY_ICE_CORE_CLK>;
+   freq-table-hz =
+   <5000 2>,
+   <0 0>,
+   <0 0>,
+   <3750 15000>,
+   <0 0>,
+   <0 0>,
+   <0 0>,
+   <0 3>;
+
+   interconnects = <_noc MASTER_UFS_MEM 0 _virt 
SLAVE_EBI1 0>,
+   <_noc MASTER_APPSS_PROC 0 _noc 
SLAVE_UFS_MEM_CFG 0>;
Please make the include/dt-bindings/interconnect/qcom,icc.h like in sa8775p.dtsi

+   interconnect-names = "ufs-ddr", "cpu-ufs";
+
+   status = "disabled";
+   };
+
+   ufs_mem_phy: phy@1d87000 {
+   compatible = "qcom,sc7180-qmp-ufs-phy";
+   reg = <0 0x01d87000 0 0x1000>;
+
+   clocks = < GCC_UFS_MEM_CLKREF_CLK>,
+   < GCC_UFS_PHY_PHY_AUX_CLK>;

Please align the 

Re: [PATCH] KVM: deprecate KVM_WERROR in favor of general WERROR

2023-10-09 Thread Sean Christopherson
On Mon, Oct 09, 2023, Jakub Kicinski wrote:
> On Mon, 9 Oct 2023 10:43:43 -0700 Sean Christopherson wrote:
> > On Fri, Oct 06, 2023, Jakub Kicinski wrote:
> > On a related topic, this is comically stale as WERROR is on by default for 
> > both
> > allmodconfig and allyesconfig, which work because they trigger 64-bit 
> > builds.
> > And KASAN on x86 is 64-bit only.
> > 
> > Rather than yank out KVM_WERROR entirely, what if we make default=n and 
> > trim the
> > depends down to "KVM && EXPERT && !KASAN"?  E.g.
> 
> IMO setting WERROR is a bit perverse. The way I see it WERROR is a
> crutch for people who don't have the time / infra to properly build
> test changes they send to Linus. Or wait for build bots to do their job.

KVM_WERROR reduces the probability of issues in patches being sent to *me*.  The
reality is that most contributors do not have the knowledge and/or resources to
"properly" build test changes without specific guidance on what/how to test, or
what configs to prioritize.

Nor is it realistic to expect that build bots will detect every issue in every
possible configuration in every patch that's posted.

Call -Werror a crutch if you will, but for me it's a crutch that I'm more than
willing to lean on in order to increase the overall quality of KVM x86 
submissions.

> We do have sympathy for these folks, we are mostly volunteers after
> all. At the same time someone's under-investment should not be causing
> pain to those of us who _do_ build test stuff carefully.

This is a bit over the top.  Yeah, I need to add W=1 to my build scripts, but 
that's
not a lack of investment, just an oversight.  Though in this case it likely 
wouldn't
have made any difference since Paolo grabbed the patches directly and might have
even bypassed linux-next.  But again I would argue that's bad process, not a 
lack
of investment.

> Rather than tweak stuff I'd prefer if we could agree that local -Werror
> is anti-social :(
> 
> The global WERROR seems to be a good compromise.

I disagree.  WERROR simply doesn't provide the same coverage.  E.g. it can't be
enabled for i386 without tuning FRAME_WARN, which (a) won't be at all obvious to
the average contributor and (b) increasing FRAME_WARN effectively reduces the
test coverage of KVM i386.

For KVM x86, I want the rules for contributing to be clearly documented, and as
simple as possible.  I don't see a sane way to achieve that with WERROR=y.


Re: [PATCH] KVM: deprecate KVM_WERROR in favor of general WERROR

2023-10-09 Thread Jakub Kicinski
On Mon, 9 Oct 2023 10:43:43 -0700 Sean Christopherson wrote:
> On Fri, Oct 06, 2023, Jakub Kicinski wrote:
> > Setting WERROR for random subsystems make life really hard
> > for subsystems which want to build-test their stuff with W=1.
> > WERROR for the entire kernel now exists and can be used
> > instead. W=1 people probably know how to deal with the global
> > W=1 already, tracking all per-subsystem WERRORs is too much...  
> 
> I assume s/W=1/WERROR=y in this line?

Yes, sorry about that.

> > Link: 
> > https://lore.kernel.org/all/0da9874b6e9fcbaaa5edeb345d7e2a7c859fc818.1696271334.git.thomas.lenda...@amd.com/
> > Signed-off-by: Jakub Kicinski 
> > ---
> >  Documentation/process/maintainer-kvm-x86.rst |  2 +-
> >  arch/x86/kvm/Kconfig | 14 --
> >  arch/x86/kvm/Makefile|  1 -
> >  3 files changed, 1 insertion(+), 16 deletions(-)
> > 
> > diff --git a/Documentation/process/maintainer-kvm-x86.rst 
> > b/Documentation/process/maintainer-kvm-x86.rst
> > index 9183bd449762..cd70c0351108 100644
> > --- a/Documentation/process/maintainer-kvm-x86.rst
> > +++ b/Documentation/process/maintainer-kvm-x86.rst
> > @@ -243,7 +243,7 @@ context and disambiguate the reference.
> >  Testing
> >  ---
> >  At a bare minimum, *all* patches in a series must build cleanly for 
> > KVM_INTEL=m
> > -KVM_AMD=m, and KVM_WERROR=y.  Building every possible combination of 
> > Kconfigs
> > +KVM_AMD=m, and WERROR=y.  Building every possible combination of Kconfigs
> >  isn't feasible, but the more the merrier.  KVM_SMM, KVM_XEN, 
> > PROVE_LOCKING, and
> >  X86_64 are particularly interesting knobs to turn.
> >  
> > diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> > index ed90f148140d..12929324ac3e 100644
> > --- a/arch/x86/kvm/Kconfig
> > +++ b/arch/x86/kvm/Kconfig
> > @@ -63,20 +63,6 @@ config KVM
> >  
> >   If unsure, say N.
> >  
> > -config KVM_WERROR
> > -   bool "Compile KVM with -Werror"
> > -   # KASAN may cause the build to fail due to larger frames
> > -   default y if X86_64 && !KASAN  
> 
> Hrm, I am loath to give up KVM's targeted -Werror as it allows for more 
> aggresive
> enabling, e.g. enabling CONFIG_WERROR for i386 builds with other defaults 
> doesn't
> work because of CONFIG_FRAME_WARN=1024.  That in turns means making WERROR=y a
> requirement in maintainer-kvm-x86.rst is likely unreasonable.
> 
> And arguably KVM_WERROR is doing its job by flagging the linked W=1 error.  
> The
> problem there lies more in my build testing, which I'll go fix by adding a W=1
> configuration or three.  As the changelog notes, I highly doubt W=1 builds 
> work
> with WERROR, whereas keeping KVM x86 warning-free even with W=1 is feasible.
> 
> > -   # We use the dependency on !COMPILE_TEST to not be enabled
> > -   # blindly in allmodconfig or allyesconfig configurations
> > -   depends on KVM
> > -   depends on (X86_64 && !KASAN) || !COMPILE_TEST  
> 
> On a related topic, this is comically stale as WERROR is on by default for 
> both
> allmodconfig and allyesconfig, which work because they trigger 64-bit builds.
> And KASAN on x86 is 64-bit only.
> 
> Rather than yank out KVM_WERROR entirely, what if we make default=n and trim 
> the
> depends down to "KVM && EXPERT && !KASAN"?  E.g.

IMO setting WERROR is a bit perverse. The way I see it WERROR is a
crutch for people who don't have the time / infra to properly build
test changes they send to Linus. Or wait for build bots to do their job.
We do have sympathy for these folks, we are mostly volunteers after
all. At the same time someone's under-investment should not be causing
pain to those of us who _do_ build test stuff carefully.

Rather than tweak stuff I'd prefer if we could agree that local -Werror
is anti-social :(

The global WERROR seems to be a good compromise.


Re: [PATCH v2 3/6] ACPI: AC: Replace acpi_driver with platform_driver

2023-10-09 Thread Rafael J. Wysocki
On Mon, Oct 9, 2023 at 3:04 PM Wilczynski, Michal
 wrote:
>
>
>
> On 10/9/2023 2:27 PM, Rafael J. Wysocki wrote:
> > On Mon, Oct 9, 2023 at 10:40 AM Wilczynski, Michal
> >  wrote:
> >>

[cut]

> >> Yeah we could add platform device without removing acpi device, and
> >> yes that would introduce data duplication, like Andy noticed.
> > But he hasn't answered my question regarding what data duplication he
> > meant, exactly.
> >
> > So what data duplication do you mean?
>
> So what I meant was that many drivers would find it useful to have
> a struct device embedded in their 'private structure', and in that case
> there would be a duplication of platform_device and acpi_device as
> both pointers would be needed.

It all depends on how often each of them is going to be used in the
given driver.

This particular driver only needs a struct acpi_device pointer if I'm
not mistaken.

> Mind this if you only have struct device
> it's easy to get acpi device, but it's not the case the other way around.
>
> So for this driver it's just a matter of sticking to convention,

There is no convention in this respect and there is always a tradeoff
between using more memory and using more CPU time in computing in
general, but none of them should be wasted just for the sake of
following a convention.

> for the others like it would be duplication.

So I'm only talking about the driver modified by the patch at hand.

> In my version of this patch we managed to avoid this duplication, thanks
> to the contextual argument introduced before, but look at this patch:
> https://github.com/mwilczy/linux-pm/commit/cc8ef52707341e67a12067d6ead991d56ea017ca
>
> Author of this patch had to introduce platform_device and acpi_device to the 
> struct ac, so
> there was some duplication. That is the case for many drivers to come, so I 
> decided it's better
> to change this and have a pattern with keeping only 'struct device'.

Well, if the only thing you need from a struct device is its
ACPI_COMPANION(), it is better to store a pointer to the latter.



Re: [PATCH] KVM: deprecate KVM_WERROR in favor of general WERROR

2023-10-09 Thread Sean Christopherson
On Fri, Oct 06, 2023, Jakub Kicinski wrote:
> Setting WERROR for random subsystems make life really hard
> for subsystems which want to build-test their stuff with W=1.
> WERROR for the entire kernel now exists and can be used
> instead. W=1 people probably know how to deal with the global
> W=1 already, tracking all per-subsystem WERRORs is too much...

I assume s/W=1/WERROR=y in this line?

> Link: 
> https://lore.kernel.org/all/0da9874b6e9fcbaaa5edeb345d7e2a7c859fc818.1696271334.git.thomas.lenda...@amd.com/
> Signed-off-by: Jakub Kicinski 
> ---
>  Documentation/process/maintainer-kvm-x86.rst |  2 +-
>  arch/x86/kvm/Kconfig | 14 --
>  arch/x86/kvm/Makefile|  1 -
>  3 files changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/Documentation/process/maintainer-kvm-x86.rst 
> b/Documentation/process/maintainer-kvm-x86.rst
> index 9183bd449762..cd70c0351108 100644
> --- a/Documentation/process/maintainer-kvm-x86.rst
> +++ b/Documentation/process/maintainer-kvm-x86.rst
> @@ -243,7 +243,7 @@ context and disambiguate the reference.
>  Testing
>  ---
>  At a bare minimum, *all* patches in a series must build cleanly for 
> KVM_INTEL=m
> -KVM_AMD=m, and KVM_WERROR=y.  Building every possible combination of Kconfigs
> +KVM_AMD=m, and WERROR=y.  Building every possible combination of Kconfigs
>  isn't feasible, but the more the merrier.  KVM_SMM, KVM_XEN, PROVE_LOCKING, 
> and
>  X86_64 are particularly interesting knobs to turn.
>  
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index ed90f148140d..12929324ac3e 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -63,20 +63,6 @@ config KVM
>  
> If unsure, say N.
>  
> -config KVM_WERROR
> - bool "Compile KVM with -Werror"
> - # KASAN may cause the build to fail due to larger frames
> - default y if X86_64 && !KASAN

Hrm, I am loath to give up KVM's targeted -Werror as it allows for more 
aggresive
enabling, e.g. enabling CONFIG_WERROR for i386 builds with other defaults 
doesn't
work because of CONFIG_FRAME_WARN=1024.  That in turns means making WERROR=y a
requirement in maintainer-kvm-x86.rst is likely unreasonable.

And arguably KVM_WERROR is doing its job by flagging the linked W=1 error.  The
problem there lies more in my build testing, which I'll go fix by adding a W=1
configuration or three.  As the changelog notes, I highly doubt W=1 builds work
with WERROR, whereas keeping KVM x86 warning-free even with W=1 is feasible.

> - # We use the dependency on !COMPILE_TEST to not be enabled
> - # blindly in allmodconfig or allyesconfig configurations
> - depends on KVM
> - depends on (X86_64 && !KASAN) || !COMPILE_TEST

On a related topic, this is comically stale as WERROR is on by default for both
allmodconfig and allyesconfig, which work because they trigger 64-bit builds.
And KASAN on x86 is 64-bit only.

Rather than yank out KVM_WERROR entirely, what if we make default=n and trim the
depends down to "KVM && EXPERT && !KASAN"?  E.g.

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8452ed0228cb..c2466304aa6a 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -65,13 +65,12 @@ config KVM
 
 config KVM_WERROR
bool "Compile KVM with -Werror"
-   # KASAN may cause the build to fail due to larger frames
-   default y if X86_64 && !KASAN
-   # We use the dependency on !COMPILE_TEST to not be enabled
-   # blindly in allmodconfig or allyesconfig configurations
-   depends on KVM
-   depends on (X86_64 && !KASAN) || !COMPILE_TEST
-   depends on EXPERT
+   # Disallow KVM's -Werror if KASAN=y, e.g. to guard against randomized
+   # configs from selecting KVM_WERROR=y.  KASAN builds generates warnings
+   # for the default FRAME_WARN, i.e. KVM_WERROR=y with KASAN=y requires
+   # special tuning.  Building KVM with -Werror and KASAN is still doable
+   * via enabling the kernel-wide WERROR=y.
+   depends on KVM && EXPERT && !KASAN
help
  Add -Werror to the build flags for KVM.


Re: [PATCH 01/10] appletalk: remove localtalk and ppp support

2023-10-09 Thread Arnd Bergmann
On Mon, Oct 9, 2023, at 18:49, Rodolfo Zitellini wrote:
>> From: Arnd Bergmann 
>> 
>> The last localtalk driver is gone now, and ppp support was never fully
>> merged, so clean up the appletalk code by removing the obvious dead
>> code paths.
>> 
>> Notably, this removes one of the two callers of the old .ndo_do_ioctl()
>> callback that was abused for getting device addresses and is now
>> only used in the ieee802154 subsystem, which still uses the same trick.
>> 
>> The include/uapi/linux/if_ltalk.h header might still be required
>> for building userspace programs, but I made sure that debian code
>> search and the netatalk upstream have no references it it, so it
>> should be fine to remove.
>> 
>> Signed-off-by: Arnd Bergmann 
>
> Hi!
> I’ve been working on a new LocalTalk interface driver for the last 
> couple months, do you think it would be possible to at least postpone 
> the removal of LT a bit?
>
> It is a driver for an open source device called TashTalk 
> (https://github.com/lampmerchant/tashtalk), which runs on a PIC micro 
> that does all the LT interfacing, and communicates back via serial to 
> the host system. My driver is relatively simple and works very well 
> with netatalk 2.2 (which is still maintained and still has support for 
> AppleTalk). The driver is basically complete and trsted and I was 
> preparing to submit a patch.
>
> Still having LocalTalk in my view has many advantages for us 
> enthusiasts that still want to bridge old machines to the current world 
> without modifications, for example for printing on modern printers, 
> netbooting, sharing files and even tcp/ip. All this basically works out 
> of the box via the driver, Linux and available userspace tools 
> (netatalk, macipgw).
>
> The old ISA cards supported by COPS were basically unobtanium even 20 
> years ago, but the solution of using a PIC and a serial port is very 
> robust and much more furure-proof. We also already have a device that 
> can interface a modern machine directly via USB to LocalTalk.
>
> The development of the TashTalk has been also extensively discussed on 
> thr 68KMLA forum 
> (https://68kmla.org/bb/index.php?threads/modtashtalk-lt0-driver-for-linux.45031/)
>
> I hope the decision to remove LocalTalk can be reconsidered at least 
> for the time being so there is a chance to submit a new, modern device 
> making use of this stack.

Nothing is decided, I'm just proposing my patch as a cleanup
for now. It would be nice to still drop the ndo_do_ioctl function
though, at least in some form. When your driver actually makes
it into the kernel, you can find a different method of communicating
the address between the socket interface and the device driver.

I can see a few ways this could work out:

- add a custom callback pointer to struct atalk_iface to
  get and set the address for phase1 probing instead of going
  through the ioctl

- rewrite the probing logic in aarp.c more widely, and improve
  the userspace interface in the process by introducing a netlink
  interface

- Move your entire driver into userspace and go to the kernel
  using tun/tap. This has the added benefit of avoiding a lot
  of the complexity of the tty line discipline code you have.

  Arnd


Re: [PATCH] KVM: deprecate KVM_WERROR in favor of general WERROR

2023-10-09 Thread Kees Cook
On Fri, Oct 06, 2023 at 01:54:15PM -0700, Jakub Kicinski wrote:
> Setting WERROR for random subsystems make life really hard
> for subsystems which want to build-test their stuff with W=1.
> WERROR for the entire kernel now exists and can be used
> instead. W=1 people probably know how to deal with the global
> W=1 already, tracking all per-subsystem WERRORs is too much...
> 
> Link: 
> https://lore.kernel.org/all/0da9874b6e9fcbaaa5edeb345d7e2a7c859fc818.1696271334.git.thomas.lenda...@amd.com/
> Signed-off-by: Jakub Kicinski 

Yeah, best to have just the global option.

Reviewed-by: Kees Cook 

-- 
Kees Cook


Re: [PATCH 01/10] appletalk: remove localtalk and ppp support

2023-10-09 Thread Rodolfo Zitellini


> From: Arnd Bergmann 
> 
> The last localtalk driver is gone now, and ppp support was never fully
> merged, so clean up the appletalk code by removing the obvious dead
> code paths.
> 
> Notably, this removes one of the two callers of the old .ndo_do_ioctl()
> callback that was abused for getting device addresses and is now
> only used in the ieee802154 subsystem, which still uses the same trick.
> 
> The include/uapi/linux/if_ltalk.h header might still be required
> for building userspace programs, but I made sure that debian code
> search and the netatalk upstream have no references it it, so it
> should be fine to remove.
> 
> Signed-off-by: Arnd Bergmann 

Hi!
I’ve been working on a new LocalTalk interface driver for the last couple 
months, do you think it would be possible to at least postpone the removal of 
LT a bit?

It is a driver for an open source device called TashTalk 
(https://github.com/lampmerchant/tashtalk), which runs on a PIC micro that does 
all the LT interfacing, and communicates back via serial to the host system. My 
driver is relatively simple and works very well with netatalk 2.2 (which is 
still maintained and still has support for AppleTalk). The driver is basically 
complete and trsted and I was preparing to submit a patch.

Still having LocalTalk in my view has many advantages for us enthusiasts that 
still want to bridge old machines to the current world without modifications, 
for example for printing on modern printers, netbooting, sharing files and even 
tcp/ip. All this basically works out of the box via the driver, Linux and 
available userspace tools (netatalk, macipgw).

The old ISA cards supported by COPS were basically unobtanium even 20 years 
ago, but the solution of using a PIC and a serial port is very robust and much 
more furure-proof. We also already have a device that can interface a modern 
machine directly via USB to LocalTalk.

The development of the TashTalk has been also extensively discussed on thr 
68KMLA forum 
(https://68kmla.org/bb/index.php?threads/modtashtalk-lt0-driver-for-linux.45031/)

I hope the decision to remove LocalTalk can be reconsidered at least for the 
time being so there is a chance to submit a new, modern device making use of 
this stack.

Many Thanks,
Rodolfo Zitellini

Re: [PATCH 5/5] kbuild: unify no-compiler-targets and no-sync-config-targets

2023-10-09 Thread Nathan Chancellor
On Mon, Oct 09, 2023 at 09:42:10PM +0900, Masahiro Yamada wrote:
> Now that vdso_install does not depend on any in-tree build artifact,
> it no longer invokes a compiler, making no-compiler-targets the same
> as no-sync-config-targets.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  Makefile | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index 2170d56630e8..982b1ad33287 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -277,10 +277,6 @@ no-dot-config-targets := $(clean-targets) \
>$(version_h) headers headers_% archheaders archscripts 
> \
>%asm-generic kernelversion %src-pkg dt_binding_check \
>outputmakefile rustavailable rustfmt rustfmtcheck
> -# Installation targets should not require compiler. Unfortunately, 
> vdso_install
> -# is an exception where build artifacts may be updated. This must be fixed.
> -no-compiler-targets := $(no-dot-config-targets) install dtbs_install \
> - headers_install modules_install modules_sign 
> kernelrelease image_name
>  no-sync-config-targets := $(no-dot-config-targets) %install modules_sign 
> kernelrelease \
> image_name
>  single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.rsi %.s 
> %.symtypes %/
> @@ -288,7 +284,6 @@ single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o 
> %.rsi %.s %.symtypes %
>  config-build :=
>  mixed-build  :=
>  need-config  := 1
> -need-compiler:= 1
>  may-sync-config  := 1
>  single-build :=
>  
> @@ -298,12 +293,6 @@ ifneq ($(filter $(no-dot-config-targets), 
> $(MAKECMDGOALS)),)
>   endif
>  endif
>  
> -ifneq ($(filter $(no-compiler-targets), $(MAKECMDGOALS)),)
> - ifeq ($(filter-out $(no-compiler-targets), $(MAKECMDGOALS)),)
> - need-compiler :=
> - endif
> -endif
> -

MIPS and LoongArch seem to have grown a usage of need-compiler in
4fe4a6374c4d ("MIPS: Only fiddle with CHECKFLAGS if `need-compiler'")
and 54c2c9df083f ("LoongArch: Only fiddle with CHECKFLAGS if
`need-compiler'"). With this removal, should those be updated as well?

>  ifneq ($(filter $(no-sync-config-targets), $(MAKECMDGOALS)),)
>   ifeq ($(filter-out $(no-sync-config-targets), $(MAKECMDGOALS)),)
>   may-sync-config :=
> @@ -675,7 +664,7 @@ endif
>  
>  # Include this also for config targets because some architectures need
>  # cc-cross-prefix to determine CROSS_COMPILE.
> -ifdef need-compiler
> +ifdef may-sync-config
>  include $(srctree)/scripts/Makefile.compiler
>  endif
>  
> -- 
> 2.39.2
> 


Re: [RFC PATCH v2 14/20] x86/kvm: Make kvm_async_pf_enabled __ro_after_init

2023-10-09 Thread Maxim Levitsky
У чт, 2023-07-20 у 17:30 +0100, Valentin Schneider пише:
> objtool now warns about it:
> 
>   vmlinux.o: warning: objtool: exc_page_fault+0x2a: Non __ro_after_init 
> static key "kvm_async_pf_enabled" in .noinstr section
> 
> The key can only be enabled (and not disabled) in the __init function
> kvm_guest_init(), so mark it as __ro_after_init.
> 
> Signed-off-by: Valentin Schneider 
> ---
>  arch/x86/kernel/kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index 1cceac5984daa..319460090a836 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -44,7 +44,7 @@
>  #include 
>  #include 
>  
> -DEFINE_STATIC_KEY_FALSE(kvm_async_pf_enabled);
> +DEFINE_STATIC_KEY_FALSE_RO(kvm_async_pf_enabled);
>  
>  static int kvmapf = 1;
>  
Reviewed-by: Maxim Levitsky 

Best regards,
Maxim Levitsky




Re: [PATCH v1] Ftrace: make sched_wakeup can focus on the target process

2023-10-09 Thread Steven Rostedt
On Mon,  9 Oct 2023 23:37:14 +0800
Jinyu Tang  wrote:

> When we want to know what happened in kernel when our app
> has more latency than we hope, but the larger latency of
> our app may be lower than other process in the syetem.
> We feel sad after waiting a long time but only get other 
> process sched_wakeup trace.
> 
> This Patch can let us only trace target process sched-wakeup 
> time, other process sched-wakeup will be dropped and won't
> change tracing_max_latency.
> 
> The patch is tested by the following commands:
> 
> $ mount -t tracefs none /sys/kernel/tracing
> $ echo wakeup_rt > /sys/kernel/tracing/current_tracer
> # some other stress-ng options are also tested 
> $ stress-ng --cpu 4 &
> $ cyclictest --mlockall --smp --priority=99 &
> $ cyclictest_pid=$!
> # child thread of cyclictest main process
> $ thread_pid=$((cyclictest_pid + 1))
> 
> $ echo ${thread_pid} > /sys/kernel/tracing/set_wakeup_pid
> 
> $ echo 1 > /sys/kernel/tracing/tracing_on
> $ echo 0 > /sys/kernel/tracing/tracing_max_latency
> $ wait ${cyclictest_pid}
> $ echo 0 > /sys/kernel/tracing/tracing_on
> $ cat /sys/kernel/tracing/trace
> 
> The maximum latency and backtrace recorded in the trace file will be only
> generated by the target process.
> Then we can eliminate interference from other programs, making it easier 
> to identify the cause of latency.
> 
> Tested-by: Jiexun Wang 
> Signed-off-by: Jinyu Tang 
> ---


Honestly, the wakeup tracers are obsolete. I haven't used them in years. I
use synthetic events instead:

 # cd /sys/kernel/tracing
 # echo 'wakeup_lat pid_t pid; u64 delay;' > synthetic_events
 # echo 'hist:keys=pid:ts=common_timestamp.usecs' if pid==$thread_pid > 
events/sched/sched_waking/trigger
 # echo 
'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmax($delta).trace(wakeup_lat,next_pid,$delta)'
 > events/sched/sched_switch/trigger
 # echo 1 > events/synthetic/wakeup_lat/enable
 # cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 3/3   #P:8
#
#_-=> irqs-off/BH-disabled
#   / _=> need-resched
#  | / _---=> hardirq/softirq
#  || / _--=> preempt-depth
#  ||| / _-=> migrate-disable
#   / delay
#   TASK-PID CPU#  |  TIMESTAMP  FUNCTION
#  | | |   | | |
  -0   [000] d..4. 350799.423428: wakeup_lat: pid=59921 
delay=1281
  -0   [000] d..4. 350800.423441: wakeup_lat: pid=59921 
delay=1317
  -0   [000] d..4. 350801.423445: wakeup_lat: pid=59921 
delay=1331

I could also make it record stack traces, disable tracing, and all sorts of
other nifty things.

-- Steve



Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-09 Thread Jan Engelhardt


On Monday 2023-10-09 17:14, Masahiro Yamada wrote:
>
>Let me add more context to my question.
>
>I am interested in the timing when
>'pkg-config --print-variables kmod | grep module_directory'
>is executed.
>
>1.  Build a SRPM on machine A
>2.  Copy the SRPM from machine A to machine B
>3.  Run rpmbuild on machine B to build the SRPM into a RPM
>4.  Copy the RPM from machine B to machine C
>5.  Install the RPM to machine C

In over 20 years of Linux distros existing, the one thing that
everyone has learned is that installing foreign RPM packages (or any
other format) is probably not going to work for one reason or
another. Different package names in Require: lines (just think of the
switch from modutils to kmod), different ABIs..

The overwhelming amount of package production that is going on these
days targets distro(A) == distro(B) == distro(C).


Yeah, the kernel package is kinda special because the files in it
are freestanding executables, but still..



[PATCH v1] Ftrace: make sched_wakeup can focus on the target process

2023-10-09 Thread Jinyu Tang
When we want to know what happened in kernel when our app
has more latency than we hope, but the larger latency of
our app may be lower than other process in the syetem.
We feel sad after waiting a long time but only get other 
process sched_wakeup trace.

This Patch can let us only trace target process sched-wakeup 
time, other process sched-wakeup will be dropped and won't
change tracing_max_latency.

The patch is tested by the following commands:

$ mount -t tracefs none /sys/kernel/tracing
$ echo wakeup_rt > /sys/kernel/tracing/current_tracer
# some other stress-ng options are also tested 
$ stress-ng --cpu 4 &
$ cyclictest --mlockall --smp --priority=99 &
$ cyclictest_pid=$!
# child thread of cyclictest main process
$ thread_pid=$((cyclictest_pid + 1))

$ echo ${thread_pid} > /sys/kernel/tracing/set_wakeup_pid

$ echo 1 > /sys/kernel/tracing/tracing_on
$ echo 0 > /sys/kernel/tracing/tracing_max_latency
$ wait ${cyclictest_pid}
$ echo 0 > /sys/kernel/tracing/tracing_on
$ cat /sys/kernel/tracing/trace

The maximum latency and backtrace recorded in the trace file will be only
generated by the target process.
Then we can eliminate interference from other programs, making it easier 
to identify the cause of latency.

Tested-by: Jiexun Wang 
Signed-off-by: Jinyu Tang 
---
 kernel/trace/trace.h  |   3 +
 kernel/trace/trace_sched_wakeup.c | 179 ++
 2 files changed, 182 insertions(+)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 77debe53f07c..c6f212e8bfd2 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -403,6 +403,9 @@ struct trace_array {
 #endif
/* function tracing enabled */
int function_enabled;
+#endif
+#ifdef CONFIG_SCHED_TRACER
+   struct trace_pid_list   __rcu *wakeup_pids;
 #endif
int no_filter_buffering_ref;
struct list_headhist_vars;
diff --git a/kernel/trace/trace_sched_wakeup.c 
b/kernel/trace/trace_sched_wakeup.c
index 0469a04a355f..b6cb9391e120 100644
--- a/kernel/trace/trace_sched_wakeup.c
+++ b/kernel/trace/trace_sched_wakeup.c
@@ -10,6 +10,9 @@
  *  Copyright (C) 2004-2006 Ingo Molnar
  *  Copyright (C) 2004 Nadia Yvette Chambers
  */
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -17,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "trace.h"
 
 static struct trace_array  *wakeup_trace;
@@ -361,6 +365,169 @@ static bool report_latency(struct trace_array *tr, u64 
delta)
return true;
 }
 
+DEFINE_MUTEX(sched_wakeup_mutex);
+static void *
+p_next(struct seq_file *m, void *v, loff_t *pos)
+{
+   struct trace_array *tr = m->private;
+   struct trace_pid_list *pid_list = 
rcu_dereference_sched(tr->wakeup_pids);
+
+   return trace_pid_next(pid_list, v, pos);
+}
+
+static void *p_start(struct seq_file *m, loff_t *pos)
+   __acquires(RCU)
+{
+   struct trace_pid_list *pid_list;
+   struct trace_array *tr = m->private;
+
+   /*
+* Grab the mutex, to keep calls to p_next() having the same
+* tr->wakeup_pids as p_start() has.
+* If we just passed the tr->wakeup_pids around, then RCU would
+* have been enough, but doing that makes things more complex.
+*/
+   mutex_lock(_wakeup_mutex);
+   rcu_read_lock_sched();
+
+   pid_list = rcu_dereference_sched(tr->wakeup_pids);
+
+   if (!pid_list)
+   return NULL;
+
+   return trace_pid_start(pid_list, pos);
+}
+
+static void p_stop(struct seq_file *m, void *p)
+   __releases(RCU)
+{
+   rcu_read_unlock_sched();
+   mutex_unlock(_wakeup_mutex);
+}
+
+static const struct seq_operations show_set_pid_seq_ops = {
+   .start = p_start,
+   .next = p_next,
+   .show = trace_pid_show,
+   .stop = p_stop,
+};
+
+static int
+ftrace_wakeup_open(struct inode *inode, struct file *file,
+ const struct seq_operations *seq_ops)
+{
+   struct seq_file *m;
+   int ret;
+
+   ret = seq_open(file, seq_ops);
+   if (ret < 0)
+   return ret;
+   m = file->private_data;
+   /* copy tr over to seq ops */
+   m->private = inode->i_private;
+
+   return ret;
+}
+
+static void __clear_wakeup_pids(struct trace_array *tr)
+{
+   struct trace_pid_list *pid_list;
+
+   pid_list = rcu_dereference_protected(tr->wakeup_pids,
+
lockdep_is_held(_wakeup_mutex));
+   if (!pid_list)
+   return;
+
+   rcu_assign_pointer(tr->wakeup_pids, NULL);
+
+   synchronize_rcu();
+   trace_pid_list_free(pid_list);
+}
+
+static void clear_wakeup_pids(struct trace_array *tr)
+{
+   mutex_lock(_wakeup_mutex);
+   __clear_wakeup_pids(tr);
+   mutex_unlock(_wakeup_mutex);
+
+}
+
+static int
+ftrace_set_wakeup_pid_open(struct inode *inode, struct file *file)
+{
+   const struct seq_operations *seq_ops = _set_pid_seq_ops;
+   struct 

Re: [PATCH 04/10] staging: ks7010: remove unused ioctl handler

2023-10-09 Thread Greg Kroah-Hartman
On Mon, Oct 09, 2023 at 04:19:02PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The ndo_do_ioctl function has no actual callers, and doesn't do much here,
> so just remove it entirely as preparation for removing the callback pointer
> from net_device_ops.
> 
> Signed-off-by: Arnd Bergmann 


Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH 05/10] staging: rtl8192: remove unused legacy ioctl handlers

2023-10-09 Thread Greg Kroah-Hartman
On Mon, Oct 09, 2023 at 04:19:03PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The .ndo_do_ioctl functions are never called, and can just be removed,
> especially since this is a staging driver.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/staging/rtl8192u/ieee80211/dot11d.c   |  41 --
>  drivers/staging/rtl8192u/ieee80211/dot11d.h   |   2 -
>  .../staging/rtl8192u/ieee80211/ieee80211.h|  12 -
>  .../rtl8192u/ieee80211/ieee80211_softmac.c| 563 --
>  drivers/staging/rtl8192u/r8192U.h |   2 -
>  drivers/staging/rtl8192u/r8192U_core.c| 109 
>  6 files changed, 729 deletions(-)

Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH 07/10] staging: rtl8723bs: remove dead code

2023-10-09 Thread Greg Kroah-Hartman
On Mon, Oct 09, 2023 at 04:19:05PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The .ndo_do_ioctl functions are never called, so the three implementation here
> is useless but only works as a way to identify the device in the notifiers,
> which can really be removed as well.
> 
> Looking through the exported functions, I found a bunch more that have
> no callers, so just drop all of those.
> 
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH 06/10] staging: rtl8712: remove unused legacy ioctl handlers

2023-10-09 Thread Greg Kroah-Hartman
On Mon, Oct 09, 2023 at 04:19:04PM +0200, Arnd Bergmann wrote:
> From: Arnd Bergmann 
> 
> The .ndo_do_ioctl functions are never called, and can just be removed,
> especially since this is a staging driver.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/staging/rtl8712/os_intfs.c|   1 -
>  drivers/staging/rtl8712/osdep_intf.h  |   2 -
>  drivers/staging/rtl8712/rtl871x_ioctl_linux.c | 124 --
>  3 files changed, 127 deletions(-)

Reviewed-by: Greg Kroah-Hartman 


Re: [PATCH 1/5] csky: remove unused cmd_vdso_install

2023-10-09 Thread Guo Ren
On Mon, Oct 9, 2023 at 8:42 PM Masahiro Yamada  wrote:
>
> You cannot run this code because arch/csky/Makefile does not define the
> vdso_install target.
>
> It appears that this code was blindly copied from another architecture.
Yes, I do that. Thx for pointing it out.

Acked-by: Guo Ren 

>
> Remove the dead code.
>
> Signed-off-by: Masahiro Yamada 
> ---
>
>  arch/csky/kernel/vdso/Makefile | 10 --
>  1 file changed, 10 deletions(-)
>
> diff --git a/arch/csky/kernel/vdso/Makefile b/arch/csky/kernel/vdso/Makefile
> index 299e4e41ebc5..ddf784a62c11 100644
> --- a/arch/csky/kernel/vdso/Makefile
> +++ b/arch/csky/kernel/vdso/Makefile
> @@ -58,13 +58,3 @@ quiet_cmd_vdsold = VDSOLD  $@
>  # that contains the same symbols at the same offsets.
>  quiet_cmd_so2s = SO2S$@
>cmd_so2s = $(NM) -D $< | $(srctree)/$(src)/so2s.sh > $@
> -
> -# install commands for the unstripped file
> -quiet_cmd_vdso_install = INSTALL $@
> -  cmd_vdso_install = cp $(obj)/$@.dbg $(MODLIB)/vdso/$@
> -
> -vdso.so: $(obj)/vdso.so.dbg
> -   @mkdir -p $(MODLIB)/vdso
> -   $(call cmd,vdso_install)
> -
> -vdso_install: vdso.so
> --
> 2.39.2
>


-- 
Best Regards
 Guo Ren


Re: [PATCH v5 1/2] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-10-09 Thread David Hildenbrand

On 07.10.23 00:01, Verma, Vishal L wrote:

On Fri, 2023-10-06 at 14:52 +0200, David Hildenbrand wrote:

On 05.10.23 20:31, Vishal Verma wrote:



<..>

@@ -2167,47 +2221,28 @@ static int __ref try_remove_memory(u64 start, u64 size)
 if (rc)
 return rc;
   
+   mem_hotplug_begin();

+
 /*
-    * We only support removing memory added with MHP_MEMMAP_ON_MEMORY in
-    * the same granularity it was added - a single memory block.
+    * For memmap_on_memory, the altmaps could have been added on
+    * a per-memblock basis. Loop through the entire range if so,
+    * and remove each memblock and its altmap.
  */
 if (mhp_memmap_on_memory()) {
-   rc = walk_memory_blocks(start, size, , test_has_altmap_cb);
-   if (rc) {
-   if (size != memory_block_size_bytes()) {
-   pr_warn("Refuse to remove %#llx - %#llx,"
-   "wrong granularity\n",
-   start, start + size);
-   return -EINVAL;
-   }
-   altmap = mem->altmap;
-   /*
-    * Mark altmap NULL so that we can add a debug
-    * check on memblock free.
-    */
-   mem->altmap = NULL;
-   }
+   unsigned long memblock_size = memory_block_size_bytes();
+   u64 cur_start;
+
+   for (cur_start = start; cur_start < start + size;
+    cur_start += memblock_size)
+   remove_memory_block_and_altmap(nid, cur_start,
+  memblock_size);
+   } else {
+   remove_memory_block_and_altmap(nid, start, size);


Better call remove_memory_block_devices() and arch_remove_memory(start,
size, altmap) here explicitly instead of using
remove_memory_block_and_altmap() that really can only handle a single
memory block with any inputs.


I'm not sure I follow. Even in the non memmap_on_memory case, we'd have
to walk_memory_blocks() to get to the memory_block->altmap, right?


See my other reply to, at least with mhp_memmap_on_memory()==false, we 
don't have to worry about the altmap.




Or is there a more direct way? If we have to walk_memory_blocks, what's
the advantage of calling those directly instead of calling the helper
created above?


I think we have two cases to handle

1) All have an altmap. Remove them block-by-block. Probably we should 
call a function remove_memory_blocks(altmap=true) [or alternatively 
remove_memory_blocks_and_altmaps()] and just handle iterating internally.


2) All don't have an altmap. We can remove them in one go. Probably we 
should call that remove_memory_blocks(altmap=false) [or alternatively 
remove_memory_blocks_no_altmaps()].


I guess it's best to do a walk upfront to make sure either all have an 
altmap or none has one. Then we can branch off to the right function 
knowing whether we have to process altmaps or not.


The existing

if (mhp_memmap_on_memory()) {
...
}

Can be extended for that case.

Please let me know if I failed to express what I mean, then I can 
briefly prototype it on top of your changes.


--
Cheers,

David / dhildenb




Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-09 Thread Masahiro Yamada
On Mon, Oct 9, 2023 at 11:07 PM Michal Suchánek  wrote:
>
> On Mon, Oct 09, 2023 at 09:34:10PM +0900, Masahiro Yamada wrote:
> > On Mon, Oct 9, 2023 at 5:52 PM Michal Suchánek  wrote:
> > >
> > > Hello,
> > >
> > > On Mon, Oct 09, 2023 at 05:31:02PM +0900, Masahiro Yamada wrote:
> > > > On Fri, Oct 6, 2023 at 12:49 AM Michal Suchanek  
> > > > wrote:
> > > > >
> > > > > The default MODLIB value is composed of two variables and the 
> > > > > hardcoded
> > > > > string '/lib/modules/'.
> > > > >
> > > > > MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
> > > > >
> > > > > Defining this middle part as a variable was rejected on the basis that
> > > > > users can pass the whole MODLIB to make, such as
> > > >
> > > >
> > > > In other words, do you want to say
> > > >
> > > > "If defining this middle part as a variable had been accepted,
> > > > this patch would have been unneeded." ?
> > >
> > > If it were accepted I would not have to guess what the middle part is,
> > > and could use the variable that unambiguosly defines it instead.
> >
> >
> > How?
> >
> > scripts/package/kernel.spec hardcodes 'lib/modules'
> > in a couple of places.
> >
> > I am asking how to derive the module path.
>
> Not sure what you are asking here. The path is hardcoded, everywhere.
>
> The current Makefile has
>
> MODLIB  = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
>
> and there is no reliable way to learn what the middle part was after the
> fact - $(INSTALL_MOD_PATH) can be non-empty.
>
> The rejected patch was changing this to a variable, and also default to
> adjusting the content to what kmod exports in pkgconfig after applying a
> proposed patch to make this hardcoded part configurable:
>
> export KERNEL_MODULE_DIRECTORY := $(shell pkg-config --print-variables kmod 
> 2>/dev/null | grep '^module_directory$$' >/dev/null && pkg-config 
> --variable=module_directory kmod || echo /lib/modules)
>
> MODLIB  = $(INSTALL_MOD_PATH)$(KERNEL_MODULE_DIRECTORY)/$(KERNELRELEASE)
>
> It would be completely posible to only define the middle part as a
> variable that could then be used in rpm-pkg:
>
> export KERNEL_MODULE_DIRECTORY := /lib/modules
>
> MODLIB  = $(INSTALL_MOD_PATH)$(KERNEL_MODULE_DIRECTORY)/$(KERNELRELEASE)
>
> Thanks
>
> Michal
>
>


Let me add more context to my question.


I am interested in the timing when
'pkg-config --print-variables kmod | grep module_directory'
is executed.



1.  Build a SRPM on machine A

2.  Copy the SRPM from machine A to machine B

3.  Run rpmbuild on machine B to build the SRPM into a RPM

4.  Copy the RPM from machine B to machine C

5.  Install the RPM to machine C


Of course, we are most interested in the module path
of machine C, but it is difficult/impossible to
guess it at the time of building.

We can assume machine B == machine C.

We are the second most interested in the module
path on machine B.

The module path of machine A is not important.


So, I am asking where you would inject
'pkg-config --print-variables kmod | grep module_directory'.


-- 
Best Regards
Masahiro Yamada



Re: [PATCH v5 1/2] mm/memory_hotplug: split memmap_on_memory requests across memblocks

2023-10-09 Thread David Hildenbrand

On 07.10.23 10:55, Huang, Ying wrote:

Vishal Verma  writes:


The MHP_MEMMAP_ON_MEMORY flag for hotplugged memory is restricted to
'memblock_size' chunks of memory being added. Adding a larger span of
memory precludes memmap_on_memory semantics.

For users of hotplug such as kmem, large amounts of memory might get
added from the CXL subsystem. In some cases, this amount may exceed the
available 'main memory' to store the memmap for the memory being added.
In this case, it is useful to have a way to place the memmap on the
memory being added, even if it means splitting the addition into
memblock-sized chunks.

Change add_memory_resource() to loop over memblock-sized chunks of
memory if caller requested memmap_on_memory, and if other conditions for
it are met. Teach try_remove_memory() to also expect that a memory
range being removed might have been split up into memblock sized chunks,
and to loop through those as needed.

Cc: Andrew Morton 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Cc: Oscar Salvador 
Cc: Dan Williams 
Cc: Dave Jiang 
Cc: Dave Hansen 
Cc: Huang Ying 
Suggested-by: David Hildenbrand 
Signed-off-by: Vishal Verma 
---
  mm/memory_hotplug.c | 162 
  1 file changed, 99 insertions(+), 63 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index f8d3e7427e32..77ec6f15f943 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1380,6 +1380,44 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
return arch_supports_memmap_on_memory(vmemmap_size);
  }
  
+static int add_memory_create_devices(int nid, struct memory_group *group,

+u64 start, u64 size, mhp_t mhp_flags)
+{
+   struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   struct vmem_altmap mhp_altmap = {
+   .base_pfn =  PHYS_PFN(start),
+   .end_pfn  =  PHYS_PFN(start + size - 1),
+   };
+   int ret;
+
+   if ((mhp_flags & MHP_MEMMAP_ON_MEMORY)) {
+   mhp_altmap.free = memory_block_memmap_on_memory_pages();
+   params.altmap = kmalloc(sizeof(struct vmem_altmap), GFP_KERNEL);
+   if (!params.altmap)
+   return -ENOMEM;
+
+   memcpy(params.altmap, _altmap, sizeof(mhp_altmap));
+   }
+
+   /* call arch's memory hotadd */
+   ret = arch_add_memory(nid, start, size, );
+   if (ret < 0)
+   goto error;
+
+   /* create memory block devices after memory was added */
+   ret = create_memory_block_devices(start, size, params.altmap, group);
+   if (ret)
+   goto err_bdev;
+
+   return 0;
+
+err_bdev:
+   arch_remove_memory(start, size, NULL);
+error:
+   kfree(params.altmap);
+   return ret;
+}
+
  /*
   * NOTE: The caller must call lock_device_hotplug() to serialize hotplug
   * and online/offline operations (triggered e.g. by sysfs).
@@ -1388,14 +1426,10 @@ static bool mhp_supports_memmap_on_memory(unsigned long 
size)
   */
  int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)
  {
-   struct mhp_params params = { .pgprot = pgprot_mhp(PAGE_KERNEL) };
+   unsigned long memblock_size = memory_block_size_bytes();
enum memblock_flags memblock_flags = MEMBLOCK_NONE;
-   struct vmem_altmap mhp_altmap = {
-   .base_pfn =  PHYS_PFN(res->start),
-   .end_pfn  =  PHYS_PFN(res->end),
-   };
struct memory_group *group = NULL;
-   u64 start, size;
+   u64 start, size, cur_start;
bool new_node = false;
int ret;
  
@@ -1436,28 +1470,21 @@ int __ref add_memory_resource(int nid, struct resource *res, mhp_t mhp_flags)

/*
 * Self hosted memmap array
 */
-   if (mhp_flags & MHP_MEMMAP_ON_MEMORY) {
-   if (mhp_supports_memmap_on_memory(size)) {
-   mhp_altmap.free = memory_block_memmap_on_memory_pages();
-   params.altmap = kmalloc(sizeof(struct vmem_altmap), 
GFP_KERNEL);
-   if (!params.altmap)
+   if ((mhp_flags & MHP_MEMMAP_ON_MEMORY) &&
+   mhp_supports_memmap_on_memory(memblock_size)) {
+   for (cur_start = start; cur_start < start + size;
+cur_start += memblock_size) {
+   ret = add_memory_create_devices(nid, group, cur_start,
+   memblock_size,
+   mhp_flags);
+   if (ret)
goto error;
-
-   memcpy(params.altmap, _altmap, sizeof(mhp_altmap));
}
-   /* fallback to not using altmap  */
-   }
-
-   /* call arch's memory hotadd */
-   ret = arch_add_memory(nid, start, size, );
-   if (ret < 0)
-   goto error_free;
-
-   /* create memory block 

Re: [PATCH v1] samples: kprobes: Fixes a typo

2023-10-09 Thread Google
On Mon, 9 Oct 2023 09:51:03 -0400
Steven Rostedt  wrote:

> On Sat, 7 Oct 2023 21:09:00 +0530
> Atul Kumar Pant  wrote:
> 
> > On Sat, Sep 23, 2023 at 11:00:46PM +0530, Atul Kumar Pant wrote:
> > > On Thu, Aug 17, 2023 at 10:38:19PM +0530, Atul Kumar Pant wrote:  
> > > > Fixes typo in a function name.
> > > > 
> > > > Signed-off-by: Atul Kumar Pant 
> > > > ---
> > > >  samples/kprobes/kretprobe_example.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/samples/kprobes/kretprobe_example.c 
> > > > b/samples/kprobes/kretprobe_example.c
> > > > index cbf16542d84e..ed79fd3d48fb 100644
> > > > --- a/samples/kprobes/kretprobe_example.c
> > > > +++ b/samples/kprobes/kretprobe_example.c
> > > > @@ -35,7 +35,7 @@ struct my_data {
> > > > ktime_t entry_stamp;
> > > >  };
> > > >  
> > > > -/* Here we use the entry_hanlder to timestamp function entry */
> > > > +/* Here we use the entry_handler to timestamp function entry */
> > > >  static int entry_handler(struct kretprobe_instance *ri, struct pt_regs 
> > > > *regs)
> > > >  {
> > > > struct my_data *data;
> > > > -- 
> > > > 2.25.1
> > > >   
> > > 
> > >   Hi all, can someone provide comments on this change.  
> > 
> > Hi all, can someone please review this change. It has 
> > been not
> > reviewed for quite some time.
> 
> That's because trivial typos in comments are considered very low priority,
> and are usually only added (if they are ever added) if the maintainer has
> extra time, which may not be for a while.

Anyway, let me pick this. I found this in my inbox now. :)

Thank you,

> 
> -- Steve


-- 
Masami Hiramatsu (Google) 



Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Steven Rostedt
On Mon, 9 Oct 2023 18:58:27 +0800
Yajun Deng  wrote:

> > C compiler decides to inline or not, depending on various factors.
> >
> > The most efficient (and small) code is generated by this_cpu_inc()
> > version, allowing the compiler to inline it.
> >
> > If you copy/paste this_cpu_inc()  twenty times, then the compiler
> > would  not inline the function anymore.  

Yes, if you want something to be visible by ftrace, it must not be inlined
(as inlined functions are not function calls by definition). And as Eric
stated, the compiler is perfectly allowed to inline something if it
believes it will be more efficient. i.e. There may be code around the function
call that could be more efficient if it wasn't change to parameters. If you
want to make sure a function stays out of line, you must explicitly tell
the compiler you want the function not to ever be inlined (hence the
"noinline" attribute).

> 
> 
> Got it. Thank you.

Great.

-- Steve



[PATCH 10/10] net: remove ndo_do_ioctl handler

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

All of the references to the callback pointer are gone, so remove the
pointer itself before we grow new references to it.

Signed-off-by: Arnd Bergmann 
---
 Documentation/networking/netdevices.rst | 8 
 include/linux/netdevice.h   | 7 ---
 2 files changed, 15 deletions(-)

diff --git a/Documentation/networking/netdevices.rst 
b/Documentation/networking/netdevices.rst
index 9e4cccb90b870..6f9b71c5d37b8 100644
--- a/Documentation/networking/netdevices.rst
+++ b/Documentation/networking/netdevices.rst
@@ -218,14 +218,6 @@ ndo_stop:
Context: process
Note: netif_running() is guaranteed false
 
-ndo_do_ioctl:
-   Synchronization: rtnl_lock() semaphore.
-   Context: process
-
-This is only called by network subsystems internally,
-not by user space calling ioctl as it was in before
-linux-5.14.
-
 ndo_siocbond:
 Synchronization: rtnl_lock() semaphore.
 Context: process
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e070a4540fbaf..8d1cc8f195cb6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1121,11 +1121,6 @@ struct netdev_net_notifier {
  * int (*ndo_validate_addr)(struct net_device *dev);
  * Test if Media Access Control address is valid for the device.
  *
- * int (*ndo_do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);
- * Old-style ioctl entry point. This is used internally by the
- * appletalk and ieee802154 subsystems but is no longer called by
- * the device ioctl handler.
- *
  * int (*ndo_siocbond)(struct net_device *dev, struct ifreq *ifr, int cmd);
  * Used by the bonding driver for its device specific ioctls:
  * SIOCBONDENSLAVE, SIOCBONDRELEASE, SIOCBONDSETHWADDR, 
SIOCBONDCHANGEACTIVE,
@@ -1429,8 +1424,6 @@ struct net_device_ops {
int (*ndo_set_mac_address)(struct net_device *dev,
   void *addr);
int (*ndo_validate_addr)(struct net_device *dev);
-   int (*ndo_do_ioctl)(struct net_device *dev,
-   struct ifreq *ifr, int cmd);
int (*ndo_eth_ioctl)(struct net_device *dev,
 struct ifreq *ifr, int cmd);
int (*ndo_siocbond)(struct net_device *dev,
-- 
2.39.2



[PATCH 08/10] wireless: atmel: remove unused ioctl function

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

This function has no callers, and for the past 20 years, the request_firmware
interface has been in place instead of the custom firmware loader.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/atmel/atmel.c | 72 --
 1 file changed, 72 deletions(-)

diff --git a/drivers/net/wireless/atmel/atmel.c 
b/drivers/net/wireless/atmel/atmel.c
index 7c2d1c588156d..461dce21de2b0 100644
--- a/drivers/net/wireless/atmel/atmel.c
+++ b/drivers/net/wireless/atmel/atmel.c
@@ -571,7 +571,6 @@ static const struct {
  { REG_DOMAIN_ISRAEL, 3, 9, "Israel"} };
 
 static void build_wpa_mib(struct atmel_private *priv);
-static int atmel_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
 static void atmel_copy_to_card(struct net_device *dev, u16 dest,
   const unsigned char *src, u16 len);
 static void atmel_copy_to_host(struct net_device *dev, unsigned char *dest,
@@ -1487,7 +1486,6 @@ static const struct net_device_ops atmel_netdev_ops = {
.ndo_stop   = atmel_close,
.ndo_set_mac_address= atmel_set_mac_address,
.ndo_start_xmit = start_tx,
-   .ndo_do_ioctl   = atmel_ioctl,
.ndo_validate_addr  = eth_validate_addr,
 };
 
@@ -2616,76 +2614,6 @@ static const struct iw_handler_def atmel_handler_def = {
.get_wireless_stats = atmel_get_wireless_stats
 };
 
-static int atmel_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
-{
-   int i, rc = 0;
-   struct atmel_private *priv = netdev_priv(dev);
-   struct atmel_priv_ioctl com;
-   struct iwreq *wrq = (struct iwreq *) rq;
-   unsigned char *new_firmware;
-   char domain[REGDOMAINSZ + 1];
-
-   switch (cmd) {
-   case ATMELIDIFC:
-   wrq->u.param.value = ATMELMAGIC;
-   break;
-
-   case ATMELFWL:
-   if (copy_from_user(, rq->ifr_data, sizeof(com))) {
-   rc = -EFAULT;
-   break;
-   }
-
-   if (!capable(CAP_NET_ADMIN)) {
-   rc = -EPERM;
-   break;
-   }
-
-   new_firmware = memdup_user(com.data, com.len);
-   if (IS_ERR(new_firmware)) {
-   rc = PTR_ERR(new_firmware);
-   break;
-   }
-
-   kfree(priv->firmware);
-
-   priv->firmware = new_firmware;
-   priv->firmware_length = com.len;
-   strncpy(priv->firmware_id, com.id, 31);
-   priv->firmware_id[31] = '\0';
-   break;
-
-   case ATMELRD:
-   if (copy_from_user(domain, rq->ifr_data, REGDOMAINSZ)) {
-   rc = -EFAULT;
-   break;
-   }
-
-   if (!capable(CAP_NET_ADMIN)) {
-   rc = -EPERM;
-   break;
-   }
-
-   domain[REGDOMAINSZ] = 0;
-   rc = -EINVAL;
-   for (i = 0; i < ARRAY_SIZE(channel_table); i++) {
-   if (!strcasecmp(channel_table[i].name, domain)) {
-   priv->config_reg_domain = 
channel_table[i].reg_domain;
-   rc = 0;
-   }
-   }
-
-   if (rc == 0 &&  priv->station_state != STATION_STATE_DOWN)
-   rc = atmel_open(dev);
-   break;
-
-   default:
-   rc = -EOPNOTSUPP;
-   }
-
-   return rc;
-}
-
 struct auth_body {
__le16 alg;
__le16 trans_seq;
-- 
2.39.2



[PATCH 09/10] wireless: hostap: remove unused ioctl function

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The ioctl handler has no actual callers in the kernel and is useless.
All the functionality should be reachable through the regualar interfaces.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/intersil/hostap/hostap.h |   1 -
 .../wireless/intersil/hostap/hostap_ioctl.c   | 228 --
 .../wireless/intersil/hostap/hostap_main.c|   3 -
 3 files changed, 232 deletions(-)

diff --git a/drivers/net/wireless/intersil/hostap/hostap.h 
b/drivers/net/wireless/intersil/hostap/hostap.h
index c17ab6dbbb538..552ae33d78751 100644
--- a/drivers/net/wireless/intersil/hostap/hostap.h
+++ b/drivers/net/wireless/intersil/hostap/hostap.h
@@ -92,7 +92,6 @@ void hostap_info_process(local_info_t *local, struct sk_buff 
*skb);
 extern const struct iw_handler_def hostap_iw_handler_def;
 extern const struct ethtool_ops prism2_ethtool_ops;
 
-int hostap_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd);
 int hostap_siocdevprivate(struct net_device *dev, struct ifreq *ifr,
  void __user *data, int cmd);
 
diff --git a/drivers/net/wireless/intersil/hostap/hostap_ioctl.c 
b/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
index b4adfc190ae87..26162f92e3c3d 100644
--- a/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
+++ b/drivers/net/wireless/intersil/hostap/hostap_ioctl.c
@@ -2316,21 +2316,6 @@ static const struct iw_priv_args prism2_priv[] = {
 };
 
 
-static int prism2_ioctl_priv_inquire(struct net_device *dev, int *i)
-{
-   struct hostap_interface *iface;
-   local_info_t *local;
-
-   iface = netdev_priv(dev);
-   local = iface->local;
-
-   if (local->func->cmd(dev, HFA384X_CMDCODE_INQUIRE, *i, NULL, NULL))
-   return -EOPNOTSUPP;
-
-   return 0;
-}
-
-
 static int prism2_ioctl_priv_prism2_param(struct net_device *dev,
  struct iw_request_info *info,
  union iwreq_data *uwrq, char *extra)
@@ -2910,146 +2895,6 @@ static int prism2_ioctl_priv_writemif(struct net_device 
*dev,
 }
 
 
-static int prism2_ioctl_priv_monitor(struct net_device *dev, int *i)
-{
-   struct hostap_interface *iface;
-   local_info_t *local;
-   int ret = 0;
-   union iwreq_data wrqu;
-
-   iface = netdev_priv(dev);
-   local = iface->local;
-
-   printk(KERN_DEBUG "%s: process %d (%s) used deprecated iwpriv monitor "
-  "- update software to use iwconfig mode monitor\n",
-  dev->name, task_pid_nr(current), current->comm);
-
-   /* Backward compatibility code - this can be removed at some point */
-
-   if (*i == 0) {
-   /* Disable monitor mode - old mode was not saved, so go to
-* Master mode */
-   wrqu.mode = IW_MODE_MASTER;
-   ret = prism2_ioctl_siwmode(dev, NULL, , NULL);
-   } else if (*i == 1) {
-   /* netlink socket mode is not supported anymore since it did
-* not separate different devices from each other and was not
-* best method for delivering large amount of packets to
-* user space */
-   ret = -EOPNOTSUPP;
-   } else if (*i == 2 || *i == 3) {
-   switch (*i) {
-   case 2:
-   local->monitor_type = PRISM2_MONITOR_80211;
-   break;
-   case 3:
-   local->monitor_type = PRISM2_MONITOR_PRISM;
-   break;
-   }
-   wrqu.mode = IW_MODE_MONITOR;
-   ret = prism2_ioctl_siwmode(dev, NULL, , NULL);
-   hostap_monitor_mode_enable(local);
-   } else
-   ret = -EINVAL;
-
-   return ret;
-}
-
-
-static int prism2_ioctl_priv_reset(struct net_device *dev, int *i)
-{
-   struct hostap_interface *iface;
-   local_info_t *local;
-
-   iface = netdev_priv(dev);
-   local = iface->local;
-
-   printk(KERN_DEBUG "%s: manual reset request(%d)\n", dev->name, *i);
-   switch (*i) {
-   case 0:
-   /* Disable and enable card */
-   local->func->hw_shutdown(dev, 1);
-   local->func->hw_config(dev, 0);
-   break;
-
-   case 1:
-   /* COR sreset */
-   local->func->hw_reset(dev);
-   break;
-
-   case 2:
-   /* Disable and enable port 0 */
-   local->func->reset_port(dev);
-   break;
-
-   case 3:
-   prism2_sta_deauth(local, WLAN_REASON_DEAUTH_LEAVING);
-   if (local->func->cmd(dev, HFA384X_CMDCODE_DISABLE, 0, NULL,
-NULL))
-   return -EINVAL;
-   break;
-
-   case 4:
-   if (local->func->cmd(dev, HFA384X_CMDCODE_ENABLE, 0, NULL,
-NULL))
-   return -EINVAL;
-   break;
-

[PATCH 07/10] staging: rtl8723bs: remove dead code

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The .ndo_do_ioctl functions are never called, so the three implementation here
is useless but only works as a way to identify the device in the notifiers,
which can really be removed as well.

Looking through the exported functions, I found a bunch more that have
no callers, so just drop all of those.

Signed-off-by: Arnd Bergmann 
---
 drivers/staging/rtl8723bs/Makefile|1 -
 .../staging/rtl8723bs/include/osdep_intf.h|   32 -
 drivers/staging/rtl8723bs/include/rtw_io.h|1 -
 .../staging/rtl8723bs/os_dep/ioctl_linux.c| 1300 -
 drivers/staging/rtl8723bs/os_dep/os_intfs.c   |   29 -
 drivers/staging/rtl8723bs/os_dep/sdio_intf.c  |   23 +-
 6 files changed, 1 insertion(+), 1385 deletions(-)
 delete mode 100644 drivers/staging/rtl8723bs/os_dep/ioctl_linux.c

diff --git a/drivers/staging/rtl8723bs/Makefile 
b/drivers/staging/rtl8723bs/Makefile
index 590bde02058c7..0f3f6dea4955e 100644
--- a/drivers/staging/rtl8723bs/Makefile
+++ b/drivers/staging/rtl8723bs/Makefile
@@ -50,7 +50,6 @@ r8723bs-y = \
hal/HalHWImg8723B_RF.o \
hal/HalPhyRf_8723B.o \
os_dep/ioctl_cfg80211.o \
-   os_dep/ioctl_linux.o \
os_dep/mlme_linux.o \
os_dep/osdep_service.o \
os_dep/os_intfs.o \
diff --git a/drivers/staging/rtl8723bs/include/osdep_intf.h 
b/drivers/staging/rtl8723bs/include/osdep_intf.h
index 111e0179712ac..83a25598e9627 100644
--- a/drivers/staging/rtl8723bs/include/osdep_intf.h
+++ b/drivers/staging/rtl8723bs/include/osdep_intf.h
@@ -8,33 +8,6 @@
 #ifndef __OSDEP_INTF_H_
 #define __OSDEP_INTF_H_
 
-
-struct intf_priv {
-
-   u8 *intf_dev;
-   u32 max_iosz;   /* USB2.0: 128, USB1.1: 64, SDIO:64 */
-   u32 max_xmitsz; /* USB2.0: unlimited, SDIO:512 */
-   u32 max_recvsz; /* USB2.0: unlimited, SDIO:512 */
-
-   volatile u8 *io_rwmem;
-   volatile u8 *allocated_io_rwmem;
-   u32 io_wsz; /* unit: 4bytes */
-   u32 io_rsz;/* unit: 4bytes */
-   u8 intf_status;
-
-   void (*_bus_io)(u8 *priv);
-
-/*
-Under Sync. IRP (SDIO/USB)
-A protection mechanism is necessary for the io_rwmem(read/write protocol)
-
-Under Async. IRP (SDIO/USB)
-The protection mechanism is through the pending queue.
-*/
-
-   struct mutex ioctl_mutex;
-};
-
 struct dvobj_priv *devobj_init(void);
 void devobj_deinit(struct dvobj_priv *pdvobj);
 
@@ -47,17 +20,12 @@ u32 rtw_start_drv_threads(struct adapter *padapter);
 void rtw_stop_drv_threads(struct adapter *padapter);
 void rtw_cancel_all_timer(struct adapter *padapter);
 
-int rtw_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
-
 int rtw_init_netdev_name(struct net_device *pnetdev, const char *ifname);
 struct net_device *rtw_init_netdev(struct adapter *padapter);
 void rtw_unregister_netdevs(struct dvobj_priv *dvobj);
 
 u16 rtw_recv_select_queue(struct sk_buff *skb);
 
-int rtw_ndev_notifier_register(void);
-void rtw_ndev_notifier_unregister(void);
-
 void rtw_ips_dev_unload(struct adapter *padapter);
 
 int rtw_ips_pwr_up(struct adapter *padapter);
diff --git a/drivers/staging/rtl8723bs/include/rtw_io.h 
b/drivers/staging/rtl8723bs/include/rtw_io.h
index e98083a07a660..f92093e73fe67 100644
--- a/drivers/staging/rtl8723bs/include/rtw_io.h
+++ b/drivers/staging/rtl8723bs/include/rtw_io.h
@@ -72,7 +72,6 @@
 
 #define _INTF_ASYNC_   BIT(0)  /* support async io */
 
-struct intf_priv;
 struct intf_hdl;
 struct io_queue;
 
diff --git a/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c 
b/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c
deleted file mode 100644
index c81b30f1f1b05..0
--- a/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c
+++ /dev/null
@@ -1,1300 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/**
- *
- * Copyright(c) 2007 - 2012 Realtek Corporation. All rights reserved.
- *
- 
**/
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-#define RTL_IOCTL_WPA_SUPPLICANT   (SIOCIWFIRSTPRIV + 30)
-
-static int wpa_set_auth_algs(struct net_device *dev, u32 value)
-{
-   struct adapter *padapter = rtw_netdev_priv(dev);
-   int ret = 0;
-
-   if ((value & IW_AUTH_ALG_SHARED_KEY) && (value & 
IW_AUTH_ALG_OPEN_SYSTEM)) {
-   padapter->securitypriv.ndisencryptstatus = 
Ndis802_11Encryption1Enabled;
-   padapter->securitypriv.ndisauthtype = 
Ndis802_11AuthModeAutoSwitch;
-   padapter->securitypriv.dot11AuthAlgrthm = dot11AuthAlgrthm_Auto;
-   } else if (value & IW_AUTH_ALG_SHARED_KEY)  {
-   padapter->securitypriv.ndisencryptstatus = 
Ndis802_11Encryption1Enabled;
-
-   padapter->securitypriv.ndisauthtype = Ndis802_11AuthModeShared;
-   padapter->securitypriv.dot11AuthAlgrthm = 
dot11AuthAlgrthm_Shared;
-   

[PATCH 06/10] staging: rtl8712: remove unused legacy ioctl handlers

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The .ndo_do_ioctl functions are never called, and can just be removed,
especially since this is a staging driver.

Signed-off-by: Arnd Bergmann 
---
 drivers/staging/rtl8712/os_intfs.c|   1 -
 drivers/staging/rtl8712/osdep_intf.h  |   2 -
 drivers/staging/rtl8712/rtl871x_ioctl_linux.c | 124 --
 3 files changed, 127 deletions(-)

diff --git a/drivers/staging/rtl8712/os_intfs.c 
b/drivers/staging/rtl8712/os_intfs.c
index b18e6d9c832b8..121edffbd2507 100644
--- a/drivers/staging/rtl8712/os_intfs.c
+++ b/drivers/staging/rtl8712/os_intfs.c
@@ -191,7 +191,6 @@ static const struct net_device_ops rtl8712_netdev_ops = {
.ndo_start_xmit = r8712_xmit_entry,
.ndo_set_mac_address = r871x_net_set_mac_address,
.ndo_get_stats = r871x_net_get_stats,
-   .ndo_do_ioctl = r871x_ioctl,
 };
 
 struct net_device *r8712_init_netdev(void)
diff --git a/drivers/staging/rtl8712/osdep_intf.h 
b/drivers/staging/rtl8712/osdep_intf.h
index 9e75116c987ec..ce823030bfec2 100644
--- a/drivers/staging/rtl8712/osdep_intf.h
+++ b/drivers/staging/rtl8712/osdep_intf.h
@@ -27,6 +27,4 @@ struct intf_priv {
struct completion io_retevt_comp;
 };
 
-int r871x_ioctl(struct net_device *dev, struct ifreq *rq, int cmd);
-
 #endif /*_OSDEP_INTF_H_*/
diff --git a/drivers/staging/rtl8712/rtl871x_ioctl_linux.c 
b/drivers/staging/rtl8712/rtl871x_ioctl_linux.c
index 36f6904d25abc..a4a34c9f00b84 100644
--- a/drivers/staging/rtl8712/rtl871x_ioctl_linux.c
+++ b/drivers/staging/rtl8712/rtl871x_ioctl_linux.c
@@ -36,8 +36,6 @@
 #include 
 #include 
 
-#define RTL_IOCTL_WPA_SUPPLICANT   (SIOCIWFIRSTPRIV + 0x1E)
-
 #define SCAN_ITEM_SIZE 768
 #define MAX_CUSTOM_LEN 64
 #define RATE_COUNT 4
@@ -2066,128 +2064,6 @@ static int r871x_wps_start(struct net_device *dev,
return 0;
 }
 
-static int wpa_set_param(struct net_device *dev, u8 name, u32 value)
-{
-   struct _adapter *padapter = netdev_priv(dev);
-
-   switch (name) {
-   case IEEE_PARAM_WPA_ENABLED:
-   padapter->securitypriv.AuthAlgrthm = 2; /* 802.1x */
-   switch ((value) & 0xff) {
-   case 1: /* WPA */
-   padapter->securitypriv.ndisauthtype =
-   Ndis802_11AuthModeWPAPSK; /* WPA_PSK */
-   padapter->securitypriv.ndisencryptstatus =
-   Ndis802_11Encryption2Enabled;
-   break;
-   case 2: /* WPA2 */
-   padapter->securitypriv.ndisauthtype =
-   Ndis802_11AuthModeWPA2PSK; /* WPA2_PSK */
-   padapter->securitypriv.ndisencryptstatus =
-   Ndis802_11Encryption3Enabled;
-   break;
-   }
-   break;
-   case IEEE_PARAM_TKIP_COUNTERMEASURES:
-   break;
-   case IEEE_PARAM_DROP_UNENCRYPTED:
-   /* HACK:
-*
-* wpa_supplicant calls set_wpa_enabled when the driver
-* is loaded and unloaded, regardless of if WPA is being
-* used.  No other calls are made which can be used to
-* determine if encryption will be used or not prior to
-* association being expected.  If encryption is not being
-* used, drop_unencrypted is set to false, else true -- we
-* can use this to determine if the CAP_PRIVACY_ON bit should
-* be set.
-*/
-   break;
-   case IEEE_PARAM_PRIVACY_INVOKED:
-   break;
-   case IEEE_PARAM_AUTH_ALGS:
-   return wpa_set_auth_algs(dev, value);
-   case IEEE_PARAM_IEEE_802_1X:
-   break;
-   case IEEE_PARAM_WPAX_SELECT:
-   /* added for WPA2 mixed mode */
-   break;
-   default:
-   return -EOPNOTSUPP;
-   }
-   return 0;
-}
-
-static int wpa_mlme(struct net_device *dev, u32 command, u32 reason)
-{
-   struct _adapter *padapter = netdev_priv(dev);
-
-   switch (command) {
-   case IEEE_MLME_STA_DEAUTH:
-   if (!r8712_set_802_11_disassociate(padapter))
-   return -1;
-   break;
-   case IEEE_MLME_STA_DISASSOC:
-   if (!r8712_set_802_11_disassociate(padapter))
-   return -1;
-   break;
-   default:
-   return -EOPNOTSUPP;
-   }
-   return 0;
-}
-
-static int wpa_supplicant_ioctl(struct net_device *dev, struct iw_point *p)
-{
-   struct ieee_param *param;
-   int ret = 0;
-   struct _adapter *padapter = netdev_priv(dev);
-
-   if (p->length < sizeof(struct ieee_param) || !p->pointer)
-   return -EINVAL;
-   param = memdup_user(p->pointer, p->length);
-   if (IS_ERR(param))
-   return PTR_ERR(param);
-   switch (param->cmd) {
-   

[PATCH 05/10] staging: rtl8192: remove unused legacy ioctl handlers

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The .ndo_do_ioctl functions are never called, and can just be removed,
especially since this is a staging driver.

Signed-off-by: Arnd Bergmann 
---
 drivers/staging/rtl8192u/ieee80211/dot11d.c   |  41 --
 drivers/staging/rtl8192u/ieee80211/dot11d.h   |   2 -
 .../staging/rtl8192u/ieee80211/ieee80211.h|  12 -
 .../rtl8192u/ieee80211/ieee80211_softmac.c| 563 --
 drivers/staging/rtl8192u/r8192U.h |   2 -
 drivers/staging/rtl8192u/r8192U_core.c| 109 
 6 files changed, 729 deletions(-)

diff --git a/drivers/staging/rtl8192u/ieee80211/dot11d.c 
b/drivers/staging/rtl8192u/ieee80211/dot11d.c
index ddaf66fa0f936..8a72c1e9eb1e1 100644
--- a/drivers/staging/rtl8192u/ieee80211/dot11d.c
+++ b/drivers/staging/rtl8192u/ieee80211/dot11d.c
@@ -97,22 +97,6 @@ void dot11d_update_country_ie(struct ieee80211_device *dev, 
u8 *pTaddr,
 }
 EXPORT_SYMBOL(dot11d_update_country_ie);
 
-u8 dot11d_get_max_tx_pwr_in_dbm(struct ieee80211_device *dev, u8 Channel)
-{
-   struct rt_dot11d_info *dot11d_info = GET_DOT11D_INFO(dev);
-   u8 MaxTxPwrInDbm = 255;
-
-   if (Channel > MAX_CHANNEL_NUMBER) {
-   netdev_err(dev->dev, "%s: Invalid Channel\n", __func__);
-   return MaxTxPwrInDbm;
-   }
-   if (dot11d_info->channel_map[Channel])
-   MaxTxPwrInDbm = dot11d_info->max_tx_pwr_dbm_list[Channel];
-
-   return MaxTxPwrInDbm;
-}
-EXPORT_SYMBOL(dot11d_get_max_tx_pwr_in_dbm);
-
 void dot11d_scan_complete(struct ieee80211_device *dev)
 {
struct rt_dot11d_info *dot11d_info = GET_DOT11D_INFO(dev);
@@ -147,28 +131,3 @@ int is_legal_channel(struct ieee80211_device *dev, u8 
channel)
return 0;
 }
 EXPORT_SYMBOL(is_legal_channel);
-
-int to_legal_channel(struct ieee80211_device *dev, u8 channel)
-{
-   struct rt_dot11d_info *dot11d_info = GET_DOT11D_INFO(dev);
-   u8 default_chn = 0;
-   u32 i = 0;
-
-   for (i = 1; i <= MAX_CHANNEL_NUMBER; i++) {
-   if (dot11d_info->channel_map[i] > 0) {
-   default_chn = i;
-   break;
-   }
-   }
-
-   if (channel > MAX_CHANNEL_NUMBER) {
-   netdev_err(dev->dev, "%s: Invalid Channel\n", __func__);
-   return default_chn;
-   }
-
-   if (dot11d_info->channel_map[channel] > 0)
-   return channel;
-
-   return default_chn;
-}
-EXPORT_SYMBOL(to_legal_channel);
diff --git a/drivers/staging/rtl8192u/ieee80211/dot11d.h 
b/drivers/staging/rtl8192u/ieee80211/dot11d.h
index 8b485fa180898..fd774265211a5 100644
--- a/drivers/staging/rtl8192u/ieee80211/dot11d.h
+++ b/drivers/staging/rtl8192u/ieee80211/dot11d.h
@@ -49,9 +49,7 @@ void dot11d_update_country_ie(struct ieee80211_device *dev,
  u8 *addr,
  u16 coutry_ie_len,
  u8 *coutry_ie);
-u8 dot11d_get_max_tx_pwr_in_dbm(struct ieee80211_device *dev, u8 channel);
 void dot11d_scan_complete(struct ieee80211_device *dev);
 int is_legal_channel(struct ieee80211_device *dev, u8 channel);
-int to_legal_channel(struct ieee80211_device *dev, u8 channel);
 
 #endif /* #ifndef __INC_DOT11D_H */
diff --git a/drivers/staging/rtl8192u/ieee80211/ieee80211.h 
b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
index 694d1b18f81c7..fc4201757c408 100644
--- a/drivers/staging/rtl8192u/ieee80211/ieee80211.h
+++ b/drivers/staging/rtl8192u/ieee80211/ieee80211.h
@@ -223,11 +223,7 @@ struct cb_desc {
 #define MAX_IE_LEN  0xff
 
 // added for kernel conflict
-#define ieee80211_wake_queue   ieee80211_wake_queue_rsl
-#define ieee80211_stop_queue   ieee80211_stop_queue_rsl
 #define notify_wx_assoc_event  notify_wx_assoc_event_rsl
-#define SendDisassociation SendDisassociation_rsl
-
 
 struct ieee_param {
u32 cmd;
@@ -2152,7 +2148,6 @@ int ieee80211_wx_set_gen_ie(struct ieee80211_device 
*ieee, u8 *ie, size_t len);
 
 /* ieee80211_softmac.c */
 short ieee80211_is_54g(const struct ieee80211_network *net);
-short ieee80211_is_shortslot(const struct ieee80211_network *net);
 int ieee80211_rx_frame_softmac(struct ieee80211_device *ieee,
   struct sk_buff *skb,
   struct ieee80211_rx_stats *rx_stats,
@@ -2160,7 +2155,6 @@ int ieee80211_rx_frame_softmac(struct ieee80211_device 
*ieee,
 void ieee80211_softmac_new_net(struct ieee80211_device *ieee,
   struct ieee80211_network *net);
 
-void SendDisassociation(struct ieee80211_device *ieee, u8 *asSta, u8 asRsn);
 void ieee80211_softmac_xmit(struct ieee80211_txb *txb,
struct ieee80211_device *ieee);
 
@@ -2182,13 +2176,7 @@ void ieee80211_stop_protocol(struct ieee80211_device 
*ieee);
 void ieee80211_softmac_start_protocol(struct ieee80211_device *ieee);
 void ieee80211_softmac_stop_protocol(struct ieee80211_device *ieee);
 void 

[PATCH 04/10] staging: ks7010: remove unused ioctl handler

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The ndo_do_ioctl function has no actual callers, and doesn't do much here,
so just remove it entirely as preparation for removing the callback pointer
from net_device_ops.

Signed-off-by: Arnd Bergmann 
---
 drivers/staging/ks7010/ks_wlan_net.c | 21 -
 1 file changed, 21 deletions(-)

diff --git a/drivers/staging/ks7010/ks_wlan_net.c 
b/drivers/staging/ks7010/ks_wlan_net.c
index 0fb97a79ad0b3..ab7463bb25169 100644
--- a/drivers/staging/ks7010/ks_wlan_net.c
+++ b/drivers/staging/ks7010/ks_wlan_net.c
@@ -51,8 +51,6 @@ static int ks_wlan_close(struct net_device *dev);
 static void ks_wlan_set_rx_mode(struct net_device *dev);
 static struct net_device_stats *ks_wlan_get_stats(struct net_device *dev);
 static int ks_wlan_set_mac_address(struct net_device *dev, void *addr);
-static int ks_wlan_netdev_ioctl(struct net_device *dev, struct ifreq *rq,
-   int cmd);
 
 static atomic_t update_phyinfo;
 static struct timer_list update_phyinfo_timer;
@@ -2458,24 +2456,6 @@ static const struct iw_handler_def ks_wlan_handler_def = 
{
.get_wireless_stats = ks_get_wireless_stats,
 };
 
-static int ks_wlan_netdev_ioctl(struct net_device *dev, struct ifreq *rq,
-   int cmd)
-{
-   int ret;
-   struct iwreq *wrq = (struct iwreq *)rq;
-
-   switch (cmd) {
-   case SIOCIWFIRSTPRIV + 20:  /* KS_WLAN_SET_STOP_REQ */
-   ret = ks_wlan_set_stop_request(dev, NULL, >u, NULL);
-   break;
-   // All other calls are currently unsupported
-   default:
-   ret = -EOPNOTSUPP;
-   }
-
-   return ret;
-}
-
 static
 struct net_device_stats *ks_wlan_get_stats(struct net_device *dev)
 {
@@ -2608,7 +2588,6 @@ static const struct net_device_ops ks_wlan_netdev_ops = {
.ndo_start_xmit = ks_wlan_start_xmit,
.ndo_open = ks_wlan_open,
.ndo_stop = ks_wlan_close,
-   .ndo_do_ioctl = ks_wlan_netdev_ioctl,
.ndo_set_mac_address = ks_wlan_set_mac_address,
.ndo_get_stats = ks_wlan_get_stats,
.ndo_tx_timeout = ks_wlan_tx_timeout,
-- 
2.39.2



Re: [PATCH v9 1/5] lib: objpool added: ring-array based lockless MPMC

2023-10-09 Thread Google
Hi,

On Mon, 9 Oct 2023 02:40:53 +0800
wuqiang  wrote:

> On 2023/9/23 17:48, Masami Hiramatsu (Google) wrote:
> > Hi Wuqiang,
> > 
> > Sorry for replying later.
> > 
> > On Tue,  5 Sep 2023 09:52:51 +0800
> > "wuqiang.matt"  wrote:
> > 
> >> The object pool is a scalable implementaion of high performance queue
> >> for object allocation and reclamation, such as kretprobe instances.
> >>
> >> With leveraging percpu ring-array to mitigate the hot spot of memory
> >> contention, it could deliver near-linear scalability for high parallel
> >> scenarios. The objpool is best suited for following cases:
> >> 1) Memory allocation or reclamation are prohibited or too expensive
> >> 2) Consumers are of different priorities, such as irqs and threads
> >>
> >> Limitations:
> >> 1) Maximum objects (capacity) is determined during pool initializing
> >> and can't be modified (extended) after objpool creation
> > 
> > So the pool size is fixed in initialization.
> 
> Right. The arrary size will be up-rounded to the exponent of 2, but the
> actual number of objects (to be allocated) are the exact value specified
> by user.

Yeah, this makes easy to use the seq-number as index.

> 
> > 
> >> 2) The memory of objects won't be freed until objpool is finalized
> >> 3) Object allocation (pop) may fail after trying all cpu slots
> > 
> > This means that object allocation will fail if the all pools are empty,
> > right?
> 
> Yes, pop() will return NULL for this case. pop() does the checking for
> only 1 round of all cpu nodes.
> 
> The objpool might not be empty since new object could be inserted back
> in the meaintime by other nodes, which is natural for lockless queues.

OK.

> 
> > 
> >>
> >> Signed-off-by: wuqiang.matt 
> >> ---
> >>   include/linux/objpool.h | 174 +
> >>   lib/Makefile|   2 +-
> >>   lib/objpool.c   | 338 
> >>   3 files changed, 513 insertions(+), 1 deletion(-)
> >>   create mode 100644 include/linux/objpool.h
> >>   create mode 100644 lib/objpool.c
> >>
> >> diff --git a/include/linux/objpool.h b/include/linux/objpool.h
> >> new file mode 100644
> >> index ..33c832216b98
> >> --- /dev/null
> >> +++ b/include/linux/objpool.h
> >> @@ -0,0 +1,174 @@
> >> +/* SPDX-License-Identifier: GPL-2.0 */
> >> +
> >> +#ifndef _LINUX_OBJPOOL_H
> >> +#define _LINUX_OBJPOOL_H
> >> +
> >> +#include 
> >> +#include 
> >> +
> >> +/*
> >> + * objpool: ring-array based lockless MPMC queue
> >> + *
> >> + * Copyright: wuqiang.m...@bytedance.com
> >> + *
> >> + * The object pool is a scalable implementaion of high performance queue
> >> + * for objects allocation and reclamation, such as kretprobe instances.
> >> + *
> >> + * With leveraging per-cpu ring-array to mitigate the hot spots of memory
> >> + * contention, it could deliver near-linear scalability for high parallel
> >> + * scenarios. The ring-array is compactly managed in a single cache-line
> >> + * to benefit from warmed L1 cache for most cases (<= 4 objects per-core).
> >> + * The body of pre-allocated objects is stored in continuous cache-lines
> >> + * just after the ring-array.
> > 
> > I consider the size of entries may be big if we have larger number of
> > CPU cores, like 72-cores. And if user specifies (2^n) + 1 entries.
> > In this case, each CPU has (2^n - 1)/72 objects, but has 2^(n + 1)
> > entries in ring buffer. So it should be noted.
> 
> Yes for the arrary size since it‘s up-rounded to the exponent of 2, but the
> actual number of pre-allocated objects will stay the same as user specified.
> 
> >> + *
> >> + * The object pool is interrupt safe. Both allocation and reclamation
> >> + * (object pop and push operations) can be preemptible or interruptable.
> > 
> > You've added raw_spinlock_disable/enable(), so it is not preemptible
> > or interruptible anymore. (Anyway, caller doesn't take care of that)
> 
> Sure, this decription is imporper and unnecessary. Will be removed.
> 
> >> + *
> >> + * It's best suited for following cases:
> >> + * 1) Memory allocation or reclamation are prohibited or too expensive
> >> + * 2) Consumers are of different priorities, such as irqs and threads
> >> + *
> >> + * Limitations:
> >> + * 1) Maximum objects (capacity) is determined during pool initializing
> >> + * 2) The memory of objects won't be freed until the pool is finalized
> >> + * 3) Object allocation (pop) may fail after trying all cpu slots
> >> + */
> >> +
> >> +/**
> >> + * struct objpool_slot - percpu ring array of objpool
> >> + * @head: head of the local ring array (to retrieve at)
> >> + * @tail: tail of the local ring array (to append at)
> >> + * @bits: log2 of capacity (for bitwise operations)
> >> + * @mask: capacity - 1
> > 
> > These description does not give idea what those roles are.
> 
> I'll refine the description. objpool_slot is totally internal to objpool.
> 
> > 
> >> + *
> >> + * Represents a cpu-local array-based ring buffer, its size is 

[PATCH 03/10] ethernet: sp7021: fix ioctl callback pointer

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The old .ndo_do_ioctl() callback is never called any more, instead the
driver should set .ndo_eth_ioctl() for the phy operations.

Fixes: fd3040b9394c5 ("net: ethernet: Add driver for Sunplus SP7021")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/sunplus/spl2sw_driver.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/sunplus/spl2sw_driver.c 
b/drivers/net/ethernet/sunplus/spl2sw_driver.c
index 391a1bc7f4463..bb4514f4e8157 100644
--- a/drivers/net/ethernet/sunplus/spl2sw_driver.c
+++ b/drivers/net/ethernet/sunplus/spl2sw_driver.c
@@ -199,7 +199,7 @@ static const struct net_device_ops netdev_ops = {
.ndo_start_xmit = spl2sw_ethernet_start_xmit,
.ndo_set_rx_mode = spl2sw_ethernet_set_rx_mode,
.ndo_set_mac_address = spl2sw_ethernet_set_mac_address,
-   .ndo_do_ioctl = phy_do_ioctl,
+   .ndo_eth_ioctl = phy_do_ioctl,
.ndo_tx_timeout = spl2sw_ethernet_tx_timeout,
 };
 
-- 
2.39.2



[PATCH 02/10] ieee802154: avoid deprecated .ndo_do_ioctl callback

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The ieee802154 socket implementation is the last remaining caller of the
netdevice ioctl callback. In order to completely remove this, add a custom
pointer to the existing wpan_dev specific operations structure. Since that
structure is currently only used to wrap the 'create' header operation,
adjust the naming slightly to make this more generic.

It would be a good idea to adjust the calling conventions and split the
get/set operations into separate functions, but that can be a follow-up
cleanup. For the moment, I kept the actual changes to a minimum to
avoid regressions.

Signed-off-by: Arnd Bergmann 
---
 include/net/cfg802154.h | 9 +
 net/ieee802154/socket.c | 5 +++--
 net/mac802154/iface.c   | 8 
 3 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/include/net/cfg802154.h b/include/net/cfg802154.h
index f79ce133e51a7..e604df98e2ee9 100644
--- a/include/net/cfg802154.h
+++ b/include/net/cfg802154.h
@@ -433,15 +433,16 @@ struct ieee802154_llsec_device_key {
u32 frame_counter;
 };
 
-struct wpan_dev_header_ops {
+struct wpan_dev_ops {
/* TODO create callback currently assumes ieee802154_mac_cb inside
 * skb->cb. This should be changed to give these information as
 * parameter.
 */
-   int (*create)(struct sk_buff *skb, struct net_device *dev,
+   int (*header_create)(struct sk_buff *skb, struct net_device *dev,
  const struct ieee802154_addr *daddr,
  const struct ieee802154_addr *saddr,
  unsigned int len);
+   int (*ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);
 };
 
 struct wpan_dev {
@@ -452,7 +453,7 @@ struct wpan_dev {
struct list_head list;
struct net_device *netdev;
 
-   const struct wpan_dev_header_ops *header_ops;
+   const struct wpan_dev_ops *ops;
 
/* lowpan interface, set when the wpan_dev belongs to one lowpan_dev */
struct net_device *lowpan_dev;
@@ -491,7 +492,7 @@ wpan_dev_hard_header(struct sk_buff *skb, struct net_device 
*dev,
 {
struct wpan_dev *wpan_dev = dev->ieee802154_ptr;
 
-   return wpan_dev->header_ops->create(skb, dev, daddr, saddr, len);
+   return wpan_dev->ops->header_create(skb, dev, daddr, saddr, len);
 }
 #endif
 
diff --git a/net/ieee802154/socket.c b/net/ieee802154/socket.c
index 00302e8b9615b..27e58237091ca 100644
--- a/net/ieee802154/socket.c
+++ b/net/ieee802154/socket.c
@@ -139,8 +139,9 @@ static int ieee802154_dev_ioctl(struct sock *sk, struct 
ifreq __user *arg,
if (!dev)
return -ENODEV;
 
-   if (dev->type == ARPHRD_IEEE802154 && dev->netdev_ops->ndo_do_ioctl)
-   ret = dev->netdev_ops->ndo_do_ioctl(dev, , cmd);
+   if (dev->type == ARPHRD_IEEE802154 && dev->ieee802154_ptr &&
+   dev->ieee802154_ptr->ops)
+   ret = dev->ieee802154_ptr->ops->ioctl(dev, , cmd);
 
if (!ret && put_user_ifreq(, arg))
ret = -EFAULT;
diff --git a/net/mac802154/iface.c b/net/mac802154/iface.c
index c0e2da5072bea..4937f8c2fb4cc 100644
--- a/net/mac802154/iface.c
+++ b/net/mac802154/iface.c
@@ -406,8 +406,9 @@ static int ieee802154_header_create(struct sk_buff *skb,
return hlen;
 }
 
-static const struct wpan_dev_header_ops ieee802154_header_ops = {
-   .create = ieee802154_header_create,
+static const struct wpan_dev_ops ieee802154_ops = {
+   .header_create  = ieee802154_header_create,
+   .ioctl  = mac802154_wpan_ioctl,
 };
 
 /* This header create functionality assumes a 8 byte array for
@@ -495,7 +496,6 @@ static const struct net_device_ops mac802154_wpan_ops = {
.ndo_open   = mac802154_wpan_open,
.ndo_stop   = mac802154_slave_close,
.ndo_start_xmit = ieee802154_subif_start_xmit,
-   .ndo_do_ioctl   = mac802154_wpan_ioctl,
.ndo_set_mac_address= mac802154_wpan_mac_addr,
 };
 
@@ -581,7 +581,7 @@ ieee802154_setup_sdata(struct ieee802154_sub_if_data *sdata,
sdata->dev->netdev_ops = _wpan_ops;
sdata->dev->ml_priv = _mlme_wpan;
sdata->iface_default_filtering = 
IEEE802154_FILTERING_4_FRAME_FIELDS;
-   wpan_dev->header_ops = _header_ops;
+   wpan_dev->ops = _ops;
 
mutex_init(>sec_mtx);
 
-- 
2.39.2



[PATCH 01/10] appletalk: remove localtalk and ppp support

2023-10-09 Thread Arnd Bergmann
From: Arnd Bergmann 

The last localtalk driver is gone now, and ppp support was never fully
merged, so clean up the appletalk code by removing the obvious dead
code paths.

Notably, this removes one of the two callers of the old .ndo_do_ioctl()
callback that was abused for getting device addresses and is now
only used in the ieee802154 subsystem, which still uses the same trick.

The include/uapi/linux/if_ltalk.h header might still be required
for building userspace programs, but I made sure that debian code
search and the netatalk upstream have no references it it, so it
should be fine to remove.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/tun.c |   3 -
 include/linux/atalk.h |   1 -
 include/linux/if_ltalk.h  |   8 ---
 include/uapi/linux/if_ltalk.h |  10 
 net/appletalk/Makefile|   2 +-
 net/appletalk/aarp.c  | 108 +++---
 net/appletalk/ddp.c   |  97 +-
 net/appletalk/dev.c   |  46 ---
 8 files changed, 11 insertions(+), 264 deletions(-)
 delete mode 100644 include/linux/if_ltalk.h
 delete mode 100644 include/uapi/linux/if_ltalk.h
 delete mode 100644 net/appletalk/dev.c

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 89ab9efe522c3..e11476296e253 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -70,7 +70,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -3059,8 +3058,6 @@ static unsigned char tun_get_addr_len(unsigned short type)
return ROSE_ADDR_LEN;
case ARPHRD_NETROM:
return AX25_ADDR_LEN;
-   case ARPHRD_LOCALTLK:
-   return LTALK_ALEN;
default:
return 0;
}
diff --git a/include/linux/atalk.h b/include/linux/atalk.h
index a55bfc6567d01..2896f2ac9568e 100644
--- a/include/linux/atalk.h
+++ b/include/linux/atalk.h
@@ -121,7 +121,6 @@ static inline struct atalk_iface *atalk_find_dev(struct 
net_device *dev)
 #endif
 
 extern struct atalk_addr *atalk_find_dev_addr(struct net_device *dev);
-extern struct net_device *atrtr_get_dev(struct atalk_addr *sa);
 extern int  aarp_send_ddp(struct net_device *dev,
   struct sk_buff *skb,
   struct atalk_addr *sa, void *hwaddr);
diff --git a/include/linux/if_ltalk.h b/include/linux/if_ltalk.h
deleted file mode 100644
index 4cc1c0b778700..0
--- a/include/linux/if_ltalk.h
+++ /dev/null
@@ -1,8 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef __LINUX_LTALK_H
-#define __LINUX_LTALK_H
-
-#include 
-
-extern struct net_device *alloc_ltalkdev(int sizeof_priv);
-#endif
diff --git a/include/uapi/linux/if_ltalk.h b/include/uapi/linux/if_ltalk.h
deleted file mode 100644
index fa61e776f598d..0
--- a/include/uapi/linux/if_ltalk.h
+++ /dev/null
@@ -1,10 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-#ifndef _UAPI__LINUX_LTALK_H
-#define _UAPI__LINUX_LTALK_H
-
-#define LTALK_HLEN 1
-#define LTALK_MTU  600
-#define LTALK_ALEN 1
-
-
-#endif /* _UAPI__LINUX_LTALK_H */
diff --git a/net/appletalk/Makefile b/net/appletalk/Makefile
index 33164d972d379..152312a151800 100644
--- a/net/appletalk/Makefile
+++ b/net/appletalk/Makefile
@@ -5,6 +5,6 @@
 
 obj-$(CONFIG_ATALK) += appletalk.o
 
-appletalk-y:= aarp.o ddp.o dev.o
+appletalk-y:= aarp.o ddp.o
 appletalk-$(CONFIG_PROC_FS)+= atalk_proc.o
 appletalk-$(CONFIG_SYSCTL) += sysctl_net_atalk.o
diff --git a/net/appletalk/aarp.c b/net/appletalk/aarp.c
index 9fa0b246902be..dfcd9f46cb3a6 100644
--- a/net/appletalk/aarp.c
+++ b/net/appletalk/aarp.c
@@ -432,49 +432,18 @@ static struct atalk_addr *__aarp_proxy_find(struct 
net_device *dev,
return a ? sa : NULL;
 }
 
-/*
- * Probe a Phase 1 device or a device that requires its Net:Node to
- * be set via an ioctl.
- */
-static void aarp_send_probe_phase1(struct atalk_iface *iface)
-{
-   struct ifreq atreq;
-   struct sockaddr_at *sa = (struct sockaddr_at *)_addr;
-   const struct net_device_ops *ops = iface->dev->netdev_ops;
-
-   sa->sat_addr.s_node = iface->address.s_node;
-   sa->sat_addr.s_net = ntohs(iface->address.s_net);
-
-   /* We pass the Net:Node to the drivers/cards by a Device ioctl. */
-   if (!(ops->ndo_do_ioctl(iface->dev, , SIOCSIFADDR))) {
-   ops->ndo_do_ioctl(iface->dev, , SIOCGIFADDR);
-   if (iface->address.s_net != htons(sa->sat_addr.s_net) ||
-   iface->address.s_node != sa->sat_addr.s_node)
-   iface->status |= ATIF_PROBE_FAIL;
-
-   iface->address.s_net  = htons(sa->sat_addr.s_net);
-   iface->address.s_node = sa->sat_addr.s_node;
-   }
-}
-
-
 void aarp_probe_network(struct atalk_iface *atif)
 {
-   if (atif->dev->type == ARPHRD_LOCALTLK ||
-   

Re: [PATCH] net: appletalk: remove cops support

2023-10-09 Thread Arnd Bergmann
On Wed, Oct 4, 2023, at 20:52, Jakub Kicinski wrote:
> On Wed, 27 Sep 2023 11:00:30 +0200 Greg Kroah-Hartman wrote:
>> The COPS Appletalk support is very old, never said to actually work
>> properly, and the firmware code for the devices are under a very suspect
>> license.  Remove it all to clear up the license issue, if it is still
>> needed and actually used by anyone, we can add it back later once the
>> license is cleared up.
>
> Nice, Doug and Arnd also mentioned this in the past so let me add
> them to the CC as I apply this...

Yes, definitely, thanks Greg for getting this done. I think every
time this came up we concluded that it can be removed, we just never
finished the job.

Acked-by: Arnd Bergmann 
Link: 
https://lore.kernel.org/netdev/e490dd0c-a65d-4acf-89c6-c06cb48ec...@app.fastmail.com/
Link: 
https://lore.kernel.org/netdev/9cac4fbd-9557-b0b8-54fa-93f0290a6...@schmorgal.com/

Semi-related:

Since this removes one of the two callers of the .ndo_do_ioctl()
callback, I've had a new look at that bit as well and ended up
with a refresh of the missing bits of [1], which I'll submit next.

 Arnd

[1] https://lore.kernel.org/lkml/20201106221743.3271965-1-a...@kernel.org/


Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-09 Thread Michal Suchánek
On Mon, Oct 09, 2023 at 09:34:10PM +0900, Masahiro Yamada wrote:
> On Mon, Oct 9, 2023 at 5:52 PM Michal Suchánek  wrote:
> >
> > Hello,
> >
> > On Mon, Oct 09, 2023 at 05:31:02PM +0900, Masahiro Yamada wrote:
> > > On Fri, Oct 6, 2023 at 12:49 AM Michal Suchanek  wrote:
> > > >
> > > > The default MODLIB value is composed of two variables and the hardcoded
> > > > string '/lib/modules/'.
> > > >
> > > > MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
> > > >
> > > > Defining this middle part as a variable was rejected on the basis that
> > > > users can pass the whole MODLIB to make, such as
> > >
> > >
> > > In other words, do you want to say
> > >
> > > "If defining this middle part as a variable had been accepted,
> > > this patch would have been unneeded." ?
> >
> > If it were accepted I would not have to guess what the middle part is,
> > and could use the variable that unambiguosly defines it instead.
> 
> 
> How?
> 
> scripts/package/kernel.spec hardcodes 'lib/modules'
> in a couple of places.
> 
> I am asking how to derive the module path.

Not sure what you are asking here. The path is hardcoded, everywhere.

The current Makefile has

MODLIB  = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)

and there is no reliable way to learn what the middle part was after the
fact - $(INSTALL_MOD_PATH) can be non-empty.

The rejected patch was changing this to a variable, and also default to
adjusting the content to what kmod exports in pkgconfig after applying a
proposed patch to make this hardcoded part configurable:

export KERNEL_MODULE_DIRECTORY := $(shell pkg-config --print-variables kmod 
2>/dev/null | grep '^module_directory$$' >/dev/null && pkg-config 
--variable=module_directory kmod || echo /lib/modules)

MODLIB  = $(INSTALL_MOD_PATH)$(KERNEL_MODULE_DIRECTORY)/$(KERNELRELEASE)

It would be completely posible to only define the middle part as a
variable that could then be used in rpm-pkg:

export KERNEL_MODULE_DIRECTORY := /lib/modules

MODLIB  = $(INSTALL_MOD_PATH)$(KERNEL_MODULE_DIRECTORY)/$(KERNELRELEASE)

Thanks

Michal





Re: [PATCH v9 1/5] lib: objpool added: ring-array based lockless MPMC

2023-10-09 Thread Google
On Mon, 9 Oct 2023 17:23:34 +0800
wuqiang  wrote:

> Hello Masami,
> 
> Just got time for the new patch and got that ages[] was removed. ages[] is
> introduced the way like 2-phase commit to keep consitency and must be kept.
> 
> Thinking of the following 2 cases that two cpu nodes are operating the same
> objpool_slot simultaneously:
> 
> Case 1:
> 
>NODE 1:  NODE 2:
>push to an empty slotpop will get wrong value
> 
>try_cmpxchg_acquire(>tail, , next)
> try_cmpxchg_release(>head, , head + 1)
> return slot->entries[head & slot->mask]
>WRITE_ONCE(slot->entries[tail & slot->mask], obj)

Oh, good catch! Hmm, indeed. I considered to use another 'commit_tail' but
it may not work (small window to leak the object for nested case).
Thanks for review it!


> 
> 
> Case 2:
> 
>NODE 1:  NODE 2
>push to slot w/ 1 objpop will get wrong value
> 
> try_cmpxchg_release(>head, , head + 1)
>try_cmpxchg_acquire(>tail, , next)
>WRITE_ONCE(slot->entries[tail & slot->mask], obj)
> return slot->entries[head & slot->mask]
> 
> 
> Regards,
> wuqiang
> 
> On 2023/9/25 17:42, Masami Hiramatsu (Google) wrote:
> > Hi Wuqiang,
> > 
> > On Tue,  5 Sep 2023 09:52:51 +0800
> > "wuqiang.matt"  wrote:
> > 
> >> The object pool is a scalable implementaion of high performance queue
> >> for object allocation and reclamation, such as kretprobe instances.
> >>
> >> With leveraging percpu ring-array to mitigate the hot spot of memory
> >> contention, it could deliver near-linear scalability for high parallel
> >> scenarios. The objpool is best suited for following cases:
> >> 1) Memory allocation or reclamation are prohibited or too expensive
> >> 2) Consumers are of different priorities, such as irqs and threads
> >>
> >> Limitations:
> >> 1) Maximum objects (capacity) is determined during pool initializing
> >> and can't be modified (extended) after objpool creation
> >> 2) The memory of objects won't be freed until objpool is finalized
> >> 3) Object allocation (pop) may fail after trying all cpu slots
> > 
> > I made a simplifying patch on this by (mainly) removing ages array.
> > I also rename local variable to use more readable names, like slot,
> > pool, and obj.
> > 
> > Here the results which I run the test_objpool.ko.
> > 
> > Original:
> > [   50.500235] Summary of testcases:
> > [   50.503139] duration: 1027135us  hits:   30628293miss:   
> >0sync: percpu objpool
> > [   50.510416] duration: 1047977us  hits:   30453023miss:   
> >0sync: percpu objpool from vmalloc
> > [   50.517421] duration: 1047098us  hits:   31145034miss:   
> >0sync & hrtimer: percpu objpool
> > [   50.524671] duration: 1053233us  hits:   30919640miss:   
> >0sync & hrtimer: percpu objpool from vmalloc
> > [   50.531382] duration: 1055822us  hits:3407221miss:   
> >   830219sync overrun: percpu objpool
> > [   50.538135] duration: 1055998us  hits:3404624miss:   
> >   854160sync overrun: percpu objpool from vmalloc
> > [   50.546686] duration: 1046104us  hits:   19464798miss:   
> >0async: percpu objpool
> > [   50.552989] duration: 1033054us  hits:   18957836miss:   
> >0async: percpu objpool from vmalloc
> > [   50.560289] duration: 1041778us  hits:   33806868miss:   
> >0async & hrtimer: percpu objpool
> > [   50.567425] duration: 1048901us  hits:   34211860miss:   
> >0async & hrtimer: percpu objpool from vmalloc
> > 
> > Simplified:
> > [   48.393236] Summary of testcases:
> > [   48.395384] duration: 1013002us  hits:   29661448miss:   
> >0sync: percpu objpool
> > [   48.400351] duration: 1057231us  hits:   31035578miss:   
> >0sync: percpu objpool from vmalloc
> > [   48.405660] duration: 1043196us  hits:   30546652miss:   
> >0sync & hrtimer: percpu objpool
> > [   48.411216] duration: 1047128us  hits:   30601306miss:   
> >0sync & hrtimer: percpu objpool from vmalloc
> > [   48.417231] duration: 1051695us  hits:3468287miss:   
> >   892881sync overrun: percpu objpool
> > [   48.422405] duration: 1054003us  hits:3549491miss:   
> >   898141sync overrun: percpu objpool from vmalloc
> > [   48.428425] duration: 1052946us  hits:   19005228miss:   
> >0async: percpu objpool
> > [   48.433597] duration: 1051757us  hits:   19670866miss:   
> >0async: percpu objpool from vmalloc
> > [   48.439280] duration: 1042951us  

Re: [PATCH v1] samples: kprobes: Fixes a typo

2023-10-09 Thread Steven Rostedt
On Sat, 7 Oct 2023 21:09:00 +0530
Atul Kumar Pant  wrote:

> On Sat, Sep 23, 2023 at 11:00:46PM +0530, Atul Kumar Pant wrote:
> > On Thu, Aug 17, 2023 at 10:38:19PM +0530, Atul Kumar Pant wrote:  
> > > Fixes typo in a function name.
> > > 
> > > Signed-off-by: Atul Kumar Pant 
> > > ---
> > >  samples/kprobes/kretprobe_example.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/samples/kprobes/kretprobe_example.c 
> > > b/samples/kprobes/kretprobe_example.c
> > > index cbf16542d84e..ed79fd3d48fb 100644
> > > --- a/samples/kprobes/kretprobe_example.c
> > > +++ b/samples/kprobes/kretprobe_example.c
> > > @@ -35,7 +35,7 @@ struct my_data {
> > >   ktime_t entry_stamp;
> > >  };
> > >  
> > > -/* Here we use the entry_hanlder to timestamp function entry */
> > > +/* Here we use the entry_handler to timestamp function entry */
> > >  static int entry_handler(struct kretprobe_instance *ri, struct pt_regs 
> > > *regs)
> > >  {
> > >   struct my_data *data;
> > > -- 
> > > 2.25.1
> > >   
> > 
> > Hi all, can someone provide comments on this change.  
> 
>   Hi all, can someone please review this change. It has 
> been not
>   reviewed for quite some time.

That's because trivial typos in comments are considered very low priority,
and are usually only added (if they are ever added) if the maintainer has
extra time, which may not be for a while.

-- Steve



Re: [PATCH v4] eventfs: Remove eventfs_file and just use eventfs_inode

2023-10-09 Thread Steven Rostedt
On Sat, 7 Oct 2023 19:44:45 +0900
Masami Hiramatsu (Google)  wrote:

> [...]
> > @@ -118,6 +72,7 @@ static struct dentry *create_file(const char *name, 
> > umode_t mode,
> > if (WARN_ON_ONCE(!S_ISREG(mode)))
> > return NULL;
> >  
> > +   WARN_ON_ONCE(!parent);  
> 
> Nit: This looks a wrong case and should not create a file under root 
> directory.
>  So it should return NULL. (but it shouldn't happen.)

Yes it should never happen (hence the warn on), but if it does happen, it
shouldn't cause any real harm, so I decided not to return early.

> 
> > dentry = eventfs_start_creating(name, parent);
> >  
> > if (IS_ERR(dentry))  
> 
>


> > +static struct dentry *
> > +create_file_dentry(struct eventfs_inode *ei, struct dentry **e_dentry,
> > +  struct dentry *parent, const char *name, umode_t mode, void 
> > *data,
> > +  const struct file_operations *fops, bool lookup)
> > +{
> > +   struct dentry *dentry;
> > +   bool invalidate = false;
> > +
> > +   mutex_lock(_mutex);
> > +   /* If the e_dentry already has a dentry, use it */
> > +   if (*e_dentry) {
> > +   /* lookup does not need to up the ref count */
> > +   if (!lookup)
> > +   dget(*e_dentry);
> > +   mutex_unlock(_mutex);
> > +   return *e_dentry;
> > +   }
> > +   mutex_unlock(_mutex);
> > +
> > +   /* The lookup already has the parent->d_inode locked */
> > +   if (!lookup)
> > +   inode_lock(parent->d_inode);
> > +
> > +   dentry = create_file(name, mode, parent, data, fops);
> > +
> > +   if (!lookup)
> > +   inode_unlock(parent->d_inode);
> > +
> > +   mutex_lock(_mutex);
> > +
> > +   if (IS_ERR_OR_NULL(dentry)) {
> > +   /*
> > +* When the mutex was released, something else could have
> > +* created the dentry for this e_dentry. In which case
> > +* use that one.
> > +*
> > +* Note, with the mutex held, the e_dentry cannot have content
> > +* and the ei->is_freed be true at the same time.
> > +*/
> > +   WARN_ON_ONCE(ei->is_freed);
> > +   dentry = *e_dentry;
> > +   /* The lookup does not need to up the dentry refcount */
> > +   if (dentry && !lookup)
> > +   dget(dentry);
> > +   mutex_unlock(_mutex);
> > +   return dentry;
> > +   }
> > +
> > +   if (!*e_dentry && !ei->is_freed) {
> > +   *e_dentry = dentry;
> > +   dentry->d_fsdata = ei;  
> 
> Nit: If we use LSB for existing flag, this should be checked. e.g. 
> WARN_ON_ONCE(ei & 1).

But ei->is_freed is set before we set that LSB. Why should we check it here
if we already checked ei->is_freed?

> 
> 
> > +   } else {
> > +   /*
> > +* Should never happen unless we get here due to being freed.
> > +* Otherwise it means two dentries exist with the same name.
> > +*/
> > +   WARN_ON_ONCE(!ei->is_freed);
> > +   invalidate = true;
> > +   }
> > +   mutex_unlock(_mutex);
> > +
> > +   if (invalidate)
> > +   d_invalidate(dentry);
> > +
> > +   if (lookup || invalidate)
> > +   dput(dentry);
> > +
> > +   return invalidate ? NULL : dentry;
> > +}
> > +
> >  /**
> >   * eventfs_post_create_dir - post create dir routine
> > - * @ef: eventfs_file of recently created dir
> > + * @ei: eventfs_inode of recently created dir
> >   *
> >   * Map the meta-data of files within an eventfs dir to their parent dentry
> >   */
> > -static void eventfs_post_create_dir(struct eventfs_file *ef)
> > +static void eventfs_post_create_dir(struct eventfs_inode *ei)
> >  {
> > -   struct eventfs_file *ef_child;
> > +   struct eventfs_inode *ei_child;
> > struct tracefs_inode *ti;
> >  
> > /* srcu lock already held */
> > /* fill parent-child relation */
> > -   list_for_each_entry_srcu(ef_child, >ei->e_top_files, list,
> > +   list_for_each_entry_srcu(ei_child, >children, list,
> >  srcu_read_lock_held(_srcu)) {
> > -   ef_child->d_parent = ef->dentry;
> > +   ei_child->d_parent = ei->dentry;
> > }
> >  
> > -   ti = get_tracefs(ef->dentry->d_inode);
> > -   ti->private = ef->ei;
> > +   ti = get_tracefs(ei->dentry->d_inode);
> > +   ti->private = ei;
> >  }
> >  
> >  /**
> > - * create_dentry - helper function to create dentry
> > - * @ef: eventfs_file of file or directory to create
> > - * @parent: parent dentry
> > - * @lookup: true if called from lookup routine
> > + * create_dir_dentry - Create a directory dentry for the eventfs_inode
> > + * @ei: The eventfs_inode to create the directory for
> > + * @parent: The dentry of the parent of this directory
> > + * @lookup: True if this is called by the lookup code
> >   *
> > - * Used to create a dentry for file/dir, executes post dentry creation 
> > routine
> > + * This creates and attaches a directory dentry to the eventfs_inode @ei.
> >   

Re: [PATCH v5] scripts/link-vmlinux.sh: Add alias to duplicate symbols for kallsyms

2023-10-09 Thread Alessandro Carminati
Hello Kris,

Thank you for your contribution and for having your thought shared with me.

Allow me to begin this conversation by explaining what came to mind when
I decided to propose a patch that creates aliases.

The objective was to address a specific problem I was facing while
minimizing any potential impact on other aspects.
My initial consideration was the existence of numerous tools, both in the
kernel and in userspace, that rely on the current kallsyms implementation.
Both Nick and I shared the concern that making changes to elements upon
which these tools depend on could have significant consequences.

To the best of my knowledge, Nick's strategy has been to duplicate kallsyms
with something new - a new, improved kallsyms file.

However, even if Nick's patch were to be accepted, it wouldn't fully meet
my personal requirements.
This is because my goal was to utilize kprobe on a symbol that shares its
name with others. Nick's work wouldn't allow me to do this, and that's why,
I proposed an alternative.

As a result, my strategy was more modest and focused solely on creating
aliases for duplicate symbols.
By adding these aliases, existing tools would remain unaffected, and the
current system state and ecosystem would be preserved.
For instance, mechanisms like live patching could continue to use the
symbol hit count.

On the flip side, introducing these new symbols would enable tracers to
directly employ the new names without any modifications, and humans could
easily identify the symbol they are dealing with just by examining the
name.
These are the fundamental principles behind my patch - introducing aliases.

Il giorno gio 5 ott 2023 alle ore 18:25 Kris Van Hees
 ha scritto:
>
> On Wed, Sep 27, 2023 at 05:35:16PM +, Alessandro Carminati (Red Hat) 
> wrote:
> > It is not uncommon for drivers or modules related to similar peripherals
> > to have symbols with the exact same name.
> > While this is not a problem for the kernel's binary itself, it becomes an
> > issue when attempting to trace or probe specific functions using
> > infrastructure like ftrace or kprobe.
> >
> > The tracing subsystem relies on the `nm -n vmlinux` output, which provides
> > symbol information from the kernel's ELF binary. However, when multiple
> > symbols share the same name, the standard nm output does not differentiate
> > between them. This can lead to confusion and difficulty when trying to
> > probe the intended symbol.
> >
> >  ~ # cat /proc/kallsyms | grep " name_show"
> >  8c4f76d0 t name_show
> >  8c9cccb0 t name_show
> >  8cb0ac20 t name_show
> >  8cc728c0 t name_show
> >  8ce0efd0 t name_show
> >  8ce126c0 t name_show
> >  8ce1dd20 t name_show
> >  8ce24e70 t name_show
> >  8d1104c0 t name_show
> >  8d1fe480 t name_show
>
> One problem that remains as far as I can see is that this approach does not
> take into consideration that there can be duplicate symbols in the core
> kernel, but also between core kernel and loadable modules, and even between
> loadable modules.  So, I think more is needed to also ensure that this
> approach of adding alias symbols is also done for loadable modules.
>
> Earlier work that cover all symbols (core kernel and loadable modules) was
> posted quite a while ago by Nick Alcock:
>
> https://lore.kernel.org/all/20221205163157.269335-1-nick.alc...@oracle.com/
>
> It takes a different approach and adds in other info that is very useful for
> tracing, but unfortunately it has been dormant for a long time now.
>
> While module symbols are handled quite differently (for kallsyms) from the
> core kernel symbols, I think that a similar approach tied in with modpost
> ought to be quite possible.  It will add to the size of modules because the
> data needs to be stored in the .ko but that is unavoidable.  But not doing it
> unfortunately would mean that the duplicate symbol issue remains unresolved
> in the presence of loadable modules.
>

In the current implementation, my work is capable of "addressing" the issue
only for duplicate symbols within the core kernel image.
This is a fact.
Currently, modules are not within the scope of my work.

Originally, my intention was to deal with duplicates in modules as the
subsequent objective.

I agree with you that addressing duplicate symbols in modules is something
that needs to be resolved, but from my perspective, it appeared to be of
lesser importance.
This is because, when trace tools are used manually, there is already an
indication of the modules where the name originates.

In this context, I would welcome the opportunity to explore additional use
cases.
By doing so, I can better align any future proposals with these real-world
scenarios and requirements I'm not considering in this moment.

In my initial intention, I had hoped to extend the use of aliases to modules
as well.
However, I encountered some challenges that need to be addressed first.
If I were to 

Re: [PATCH 2/5] UML: remove unused cmd_vdso_install

2023-10-09 Thread Richard Weinberger
- Ursprüngliche Mail -
> Von: "masahiroy" 
> An: "linux-kbuild" 
> CC: "linux-kernel" , "linux-arm-kernel" 
> ,
> linux-c...@vger.kernel.org, "linux-parisc" , 
> linux-ri...@lists.infradead.org,
> linux-s...@vger.kernel.org, "linux-um" , 
> "loongarch" ,
> "sparclinux" , "x86" , 
> "masahiroy" , "anton ivanov"
> , "bp" , "dave hansen" 
> , "hpa"
> , "mingo" , "Johannes Berg" 
> , "richard" ,
> "tglx" 
> Gesendet: Montag, 9. Oktober 2023 14:42:07
> Betreff: [PATCH 2/5] UML: remove unused cmd_vdso_install

> You cannot run this code because arch/um/Makefile does not define the
> vdso_install target.
> 
> It appears that this code was blindly copied from another architecture.
> 
> Remove the dead code.
> 
> Signed-off-by: Masahiro Yamada 

Acked-by: Richard Weinberger 

Thanks,
//richard


Re: [PATCH v2 3/6] ACPI: AC: Replace acpi_driver with platform_driver

2023-10-09 Thread Wilczynski, Michal



On 10/9/2023 2:27 PM, Rafael J. Wysocki wrote:
> On Mon, Oct 9, 2023 at 10:40 AM Wilczynski, Michal
>  wrote:
>>
>> Hi !
>>
>> Thanks a lot for a review, to both of you ! :-)
>>
>> On 10/7/2023 12:43 PM, Rafael J. Wysocki wrote:
>>> On Sat, Oct 7, 2023 at 12:41 PM Rafael J. Wysocki  wrote:
 On Sat, Oct 7, 2023 at 9:56 AM Andy Shevchenko
  wrote:
> On Fri, Oct 06, 2023 at 09:47:57PM +0200, Rafael J. Wysocki wrote:
>> On Fri, Oct 6, 2023 at 8:33 PM Michal Wilczynski
>>  wrote:
> ...
>
>>>  struct acpi_ac {
>>> struct power_supply *charger;
>>> struct power_supply_desc charger_desc;
>>> -   struct acpi_device *device;
>>> +   struct device *dev;
>> I'm not convinced about this change.
>>
>> If I'm not mistaken, you only use the dev pointer above to get the
>> ACPI_COMPANION() of it, but the latter is already found in _probe(),
>> so it can be stored in struct acpi_ac for later use and then the dev
>> pointer in there will not be necessary any more.
>>
>> That will save you a bunch of ACPI_HANDLE() evaluations and there's
>> nothing wrong with using ac->device->handle.  The patch will then
>> become almost trivial AFAICS and if you really need to get from ac to
>> the underlying platform device, a pointer to it can be added to struct
>> acpi_ac without removing the ACPI device pointer from it.
>> Yeah we could add platform device without removing acpi device, and
>> yes that would introduce data duplication, like Andy noticed.
> But he hasn't answered my question regarding what data duplication he
> meant, exactly.
>
> So what data duplication do you mean?

So what I meant was that many drivers would find it useful to have
a struct device embedded in their 'private structure', and in that case
there would be a duplication of platform_device and acpi_device as
both pointers would be needed. Mind this if you only have struct device
it's easy to get acpi device, but it's not the case the other way around.

So for this driver it's just a matter of sticking to convention, for the others
like it would be duplication.

In my version of this patch we managed to avoid this duplication, thanks
to the contextual argument introduced before, but look at this patch:
https://github.com/mwilczy/linux-pm/commit/cc8ef52707341e67a12067d6ead991d56ea017ca

Author of this patch had to introduce platform_device and acpi_device to the 
struct ac, so
there was some duplication. That is the case for many drivers to come, so I 
decided it's better
to change this and have a pattern with keeping only 'struct device'.

>
>> And yes, maybe for this particular driver there is little impact (as struct 
>> device is not
>> really used), but in my opinion we should adhere to some common coding
>> pattern among all acpi drivers in order for the code to be easier to maintain
>> and improve readability, also for any newcomer.
> Well, maybe.
>
> But then please minimize the number of code lines changed in this
> particular patch and please avoid changes that are not directly
> related to the purpose of the patch.

Sure, like I've stated before I felt this is part of this patch, we only 
narrowly
escaped the duplication by introducing contextual argument before ;-)

>
> The idea behind is to eliminate data duplication.
 What data duplication exactly do you mean?

 struct acpi_device *device is replaced with struct device *dev which
 is the same size.  The latter is then used to obtain a struct
 acpi_device pointer.  Why is it better to do this than to store the
 struct acpi_device itself?
>>> This should be "store the struct acpi_device pointer itself", sorry.
>> Hi,
>> So let me explain the reasoning here:
>>
>> 1) I've had a pleasure to work with different drivers in ACPI directory. In 
>> my
>> opinion the best ones I've seen, were build around the concept of struct
>> device (e.g NFIT). It's a struct that's well understood in the kernel, 
>> and
>> it's easier to understand without having to read any ACPI specific code.
>> If I see something like ACPI_HANDLE(dev), I think 'ah-ha -  having a 
>> struct
>> device I can easily get struct acpi_device - they're connected'. And I 
>> think
>> using a standardized macro, instead of manually dereferencing pointers is
>> also much easier for ACPI newbs reading the code, as it hides a bit 
>> complexity
>> of getting acpi device from struct device. And to be honest I don't 
>> think there would
>> be any measurable performance change, as those code paths are far from
>> being considered 'hot paths'. So if we can get the code easier to 
>> understand
>> from a newcomer perspective, why not do it.
> I have a differing opinion on a couple of points above, and let's make
> one change at a time.

OK

>
>> 2) I think it would be good to stick to the convention, and introduce some 
>> coding
>>  pattern, for now 

[PATCH 5/5] kbuild: unify no-compiler-targets and no-sync-config-targets

2023-10-09 Thread Masahiro Yamada
Now that vdso_install does not depend on any in-tree build artifact,
it no longer invokes a compiler, making no-compiler-targets the same
as no-sync-config-targets.

Signed-off-by: Masahiro Yamada 
---

 Makefile | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/Makefile b/Makefile
index 2170d56630e8..982b1ad33287 100644
--- a/Makefile
+++ b/Makefile
@@ -277,10 +277,6 @@ no-dot-config-targets := $(clean-targets) \
 $(version_h) headers headers_% archheaders archscripts 
\
 %asm-generic kernelversion %src-pkg dt_binding_check \
 outputmakefile rustavailable rustfmt rustfmtcheck
-# Installation targets should not require compiler. Unfortunately, vdso_install
-# is an exception where build artifacts may be updated. This must be fixed.
-no-compiler-targets := $(no-dot-config-targets) install dtbs_install \
-   headers_install modules_install modules_sign 
kernelrelease image_name
 no-sync-config-targets := $(no-dot-config-targets) %install modules_sign 
kernelrelease \
  image_name
 single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o %.rsi %.s %.symtypes 
%/
@@ -288,7 +284,6 @@ single-targets := %.a %.i %.ko %.lds %.ll %.lst %.mod %.o 
%.rsi %.s %.symtypes %
 config-build   :=
 mixed-build:=
 need-config:= 1
-need-compiler  := 1
 may-sync-config:= 1
 single-build   :=
 
@@ -298,12 +293,6 @@ ifneq ($(filter $(no-dot-config-targets), 
$(MAKECMDGOALS)),)
endif
 endif
 
-ifneq ($(filter $(no-compiler-targets), $(MAKECMDGOALS)),)
-   ifeq ($(filter-out $(no-compiler-targets), $(MAKECMDGOALS)),)
-   need-compiler :=
-   endif
-endif
-
 ifneq ($(filter $(no-sync-config-targets), $(MAKECMDGOALS)),)
ifeq ($(filter-out $(no-sync-config-targets), $(MAKECMDGOALS)),)
may-sync-config :=
@@ -675,7 +664,7 @@ endif
 
 # Include this also for config targets because some architectures need
 # cc-cross-prefix to determine CROSS_COMPILE.
-ifdef need-compiler
+ifdef may-sync-config
 include $(srctree)/scripts/Makefile.compiler
 endif
 
-- 
2.39.2



[PATCH 4/5] kbuild: unify vdso_install rules

2023-10-09 Thread Masahiro Yamada
Currently, there is no standard implementation for vdso_install,
leading to various issues:

 1. Code duplication

Many architectures duplicate similar code just for copying files
to the install destination.

Some architectures (arm, sparc, x86) create build-id symlinks,
introducing more code duplication.

 2. Accidental updates of in-tree build artifacts

The vdso_install rule depends on the vdso files to install.
It may update in-tree build artifacts. This can be problematic,
as explained in commit 19514fc665ff ("arm, kbuild: make
"make install" not depend on vmlinux").

 3. Broken code in some architectures

Makefile code is often copied from one architecture to another
without proper adaptation or testing.

The previous commits removed broken code from csky, UML, and parisc.

Another issue is that 'make vdso_install' for ARCH=s390 installs
vdso64, but not vdso32.

To address these problems, this commit introduces the generic vdso_install.

Architectures that support vdso_install need to define vdso-install-y
in arch/*/Makefile.

vdso-install-y lists the files to install. For example, arch/x86/Makefile
looks like this:

  vdso-install-$(CONFIG_X86_64)   += arch/x86/entry/vdso/vdso64.so.dbg
  vdso-install-$(CONFIG_X86_X32_ABI)  += arch/x86/entry/vdso/vdsox32.so.dbg
  vdso-install-$(CONFIG_X86_32)   += arch/x86/entry/vdso/vdso32.so.dbg
  vdso-install-$(CONFIG_IA32_EMULATION)   += arch/x86/entry/vdso/vdso32.so.dbg

These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix,
if exists, stripped away.

vdso-install-y can optionally take the second field after the colon
separator. This is needed because some architectures install vdso
files as a different base name.

The following is a snippet from arch/arm64/Makefile.

  vdso-install-$(CONFIG_COMPAT_VDSO)  += 
arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so

This will rename vdso.so.dbg to vdso32.so during installation. If such
architectures change their implementation so that the file names match,
this workaround will go away.

Signed-off-by: Masahiro Yamada 
---

 Makefile   |  9 ++
 arch/arm/Makefile  |  7 +---
 arch/arm/vdso/Makefile | 25 --
 arch/arm64/Makefile|  9 ++
 arch/arm64/kernel/vdso/Makefile| 10 --
 arch/arm64/kernel/vdso32/Makefile  | 10 --
 arch/loongarch/Makefile|  4 +--
 arch/loongarch/vdso/Makefile   | 10 --
 arch/riscv/Makefile|  9 ++
 arch/riscv/kernel/compat_vdso/Makefile | 10 --
 arch/riscv/kernel/vdso/Makefile| 10 --
 arch/s390/Makefile |  6 ++--
 arch/s390/kernel/vdso32/Makefile   | 10 --
 arch/s390/kernel/vdso64/Makefile   | 10 --
 arch/sparc/Makefile|  5 ++-
 arch/sparc/vdso/Makefile   | 27 
 arch/x86/Makefile  |  7 ++--
 arch/x86/entry/vdso/Makefile   | 27 
 scripts/Makefile.vdsoinst  | 45 ++
 19 files changed, 71 insertions(+), 179 deletions(-)
 create mode 100644 scripts/Makefile.vdsoinst

diff --git a/Makefile b/Makefile
index 373649c7374e..2170d56630e8 100644
--- a/Makefile
+++ b/Makefile
@@ -1317,6 +1317,14 @@ scripts_unifdef: scripts_basic
 quiet_cmd_install = INSTALL $(INSTALL_PATH)
   cmd_install = unset sub_make_done; $(srctree)/scripts/install.sh
 
+# ---
+# vDSO install
+
+PHONY += vdso_install
+vdso_install: export INSTALL_FILES = $(vdso-install-y)
+vdso_install:
+   $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.vdsoinst
+
 # ---
 # Tools
 
@@ -1560,6 +1568,7 @@ help:
@echo  '* vmlinux - Build the bare kernel'
@echo  '* modules - Build all modules'
@echo  '  modules_install - Install all modules to INSTALL_MOD_PATH 
(default: /)'
+   @echo  '  vdso_install- Install unstripped vdso to INSTALL_MOD_PATH 
(default: /)'
@echo  '  dir/- Build all files in dir and below'
@echo  '  dir/file.[ois]  - Build specified target only'
@echo  '  dir/file.ll - Build the LLVM assembly file'
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 547e5856eaa0..5ba42f69f8ce 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -304,11 +304,7 @@ $(INSTALL_TARGETS): KBUILD_IMAGE = $(boot)/$(patsubst 
%install,%Image,$@)
 $(INSTALL_TARGETS):
$(call cmd,install)
 
-PHONY += vdso_install
-vdso_install:
-ifeq ($(CONFIG_VDSO),y)
-   $(Q)$(MAKE) $(build)=arch/arm/vdso $@
-endif
+vdso-install-$(CONFIG_VDSO) += arch/arm/vdso/vdso.so.dbg
 
 # My testing targets (bypasses dependencies)
 bp:;   $(Q)$(MAKE) $(build)=$(boot) $(boot)/bootpImage
@@ -331,7 +327,6 @@ define 

[PATCH 3/5] parisc: remove broken vdso_install

2023-10-09 Thread Masahiro Yamada
'make ARCH=parisc vdso_install' has never worked. It attempts to
descend into arch/parisc/kernel/vdso/, which does not exist.

The command just fails:

  scripts/Makefile.build:41: arch/parisc/kernel/vdso/Makefile: No such file or 
directory

The second line is also meaningless because parisc does not define
CONFIG_COMPAT_VDSO.

It appears that this code was copied from another architecture without
proper adaptation.

Remove the broken code.

Signed-off-by: Masahiro Yamada 
---

 arch/parisc/Makefile | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/parisc/Makefile b/arch/parisc/Makefile
index 968ebe17494c..4222fa73c34a 100644
--- a/arch/parisc/Makefile
+++ b/arch/parisc/Makefile
@@ -177,13 +177,6 @@ vdso_prepare: prepare0
$(Q)$(MAKE) $(build)=arch/parisc/kernel/vdso32 
include/generated/vdso32-offsets.h
 endif
 
-PHONY += vdso_install
-
-vdso_install:
-   $(Q)$(MAKE) $(build)=arch/parisc/kernel/vdso $@
-   $(if $(CONFIG_COMPAT_VDSO), \
-   $(Q)$(MAKE) $(build)=arch/parisc/kernel/vdso32 $@)
-
 install: KBUILD_IMAGE := vmlinux
 zinstall: KBUILD_IMAGE := vmlinuz
 install zinstall:
-- 
2.39.2



[PATCH 2/5] UML: remove unused cmd_vdso_install

2023-10-09 Thread Masahiro Yamada
You cannot run this code because arch/um/Makefile does not define the
vdso_install target.

It appears that this code was blindly copied from another architecture.

Remove the dead code.

Signed-off-by: Masahiro Yamada 
---

 arch/x86/um/vdso/Makefile | 12 
 1 file changed, 12 deletions(-)

diff --git a/arch/x86/um/vdso/Makefile b/arch/x86/um/vdso/Makefile
index 6825e146a62f..b86d634730b2 100644
--- a/arch/x86/um/vdso/Makefile
+++ b/arch/x86/um/vdso/Makefile
@@ -67,15 +67,3 @@ quiet_cmd_vdso = VDSO$@
 
 VDSO_LDFLAGS = -fPIC -shared -Wl,--hash-style=sysv -z noexecstack
 GCOV_PROFILE := n
-
-#
-# Install the unstripped copy of vdso*.so listed in $(vdso-install-y).
-#
-quiet_cmd_vdso_install = INSTALL $@
-  cmd_vdso_install = cp $(obj)/$@.dbg $(MODLIB)/vdso/$@
-$(vdso-install-y): %.so: $(obj)/%.so.dbg FORCE
-   @mkdir -p $(MODLIB)/vdso
-   $(call cmd,vdso_install)
-
-PHONY += vdso_install $(vdso-install-y)
-vdso_install: $(vdso-install-y)
-- 
2.39.2



[PATCH 1/5] csky: remove unused cmd_vdso_install

2023-10-09 Thread Masahiro Yamada
You cannot run this code because arch/csky/Makefile does not define the
vdso_install target.

It appears that this code was blindly copied from another architecture.

Remove the dead code.

Signed-off-by: Masahiro Yamada 
---

 arch/csky/kernel/vdso/Makefile | 10 --
 1 file changed, 10 deletions(-)

diff --git a/arch/csky/kernel/vdso/Makefile b/arch/csky/kernel/vdso/Makefile
index 299e4e41ebc5..ddf784a62c11 100644
--- a/arch/csky/kernel/vdso/Makefile
+++ b/arch/csky/kernel/vdso/Makefile
@@ -58,13 +58,3 @@ quiet_cmd_vdsold = VDSOLD  $@
 # that contains the same symbols at the same offsets.
 quiet_cmd_so2s = SO2S$@
   cmd_so2s = $(NM) -D $< | $(srctree)/$(src)/so2s.sh > $@
-
-# install commands for the unstripped file
-quiet_cmd_vdso_install = INSTALL $@
-  cmd_vdso_install = cp $(obj)/$@.dbg $(MODLIB)/vdso/$@
-
-vdso.so: $(obj)/vdso.so.dbg
-   @mkdir -p $(MODLIB)/vdso
-   $(call cmd,vdso_install)
-
-vdso_install: vdso.so
-- 
2.39.2



Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-09 Thread Masahiro Yamada
On Mon, Oct 9, 2023 at 5:52 PM Michal Suchánek  wrote:
>
> Hello,
>
> On Mon, Oct 09, 2023 at 05:31:02PM +0900, Masahiro Yamada wrote:
> > On Fri, Oct 6, 2023 at 12:49 AM Michal Suchanek  wrote:
> > >
> > > The default MODLIB value is composed of two variables and the hardcoded
> > > string '/lib/modules/'.
> > >
> > > MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
> > >
> > > Defining this middle part as a variable was rejected on the basis that
> > > users can pass the whole MODLIB to make, such as
> >
> >
> > In other words, do you want to say
> >
> > "If defining this middle part as a variable had been accepted,
> > this patch would have been unneeded." ?
>
> If it were accepted I would not have to guess what the middle part is,
> and could use the variable that unambiguosly defines it instead.


How?

scripts/package/kernel.spec hardcodes 'lib/modules'
in a couple of places.

I am asking how to derive the module path.




-- 
Best Regards
Masahiro Yamada



Re: [PATCH v2 3/6] ACPI: AC: Replace acpi_driver with platform_driver

2023-10-09 Thread Rafael J. Wysocki
On Mon, Oct 9, 2023 at 10:40 AM Wilczynski, Michal
 wrote:
>
>
> Hi !
>
> Thanks a lot for a review, to both of you ! :-)
>
> On 10/7/2023 12:43 PM, Rafael J. Wysocki wrote:
> > On Sat, Oct 7, 2023 at 12:41 PM Rafael J. Wysocki  wrote:
> >> On Sat, Oct 7, 2023 at 9:56 AM Andy Shevchenko
> >>  wrote:
> >>> On Fri, Oct 06, 2023 at 09:47:57PM +0200, Rafael J. Wysocki wrote:
>  On Fri, Oct 6, 2023 at 8:33 PM Michal Wilczynski
>   wrote:
> >>> ...
> >>>
> >  struct acpi_ac {
> > struct power_supply *charger;
> > struct power_supply_desc charger_desc;
> > -   struct acpi_device *device;
> > +   struct device *dev;
>  I'm not convinced about this change.
> 
>  If I'm not mistaken, you only use the dev pointer above to get the
>  ACPI_COMPANION() of it, but the latter is already found in _probe(),
>  so it can be stored in struct acpi_ac for later use and then the dev
>  pointer in there will not be necessary any more.
> 
>  That will save you a bunch of ACPI_HANDLE() evaluations and there's
>  nothing wrong with using ac->device->handle.  The patch will then
>  become almost trivial AFAICS and if you really need to get from ac to
>  the underlying platform device, a pointer to it can be added to struct
>  acpi_ac without removing the ACPI device pointer from it.
>
> Yeah we could add platform device without removing acpi device, and
> yes that would introduce data duplication, like Andy noticed.

But he hasn't answered my question regarding what data duplication he
meant, exactly.

So what data duplication do you mean?

> And yes, maybe for this particular driver there is little impact (as struct 
> device is not
> really used), but in my opinion we should adhere to some common coding
> pattern among all acpi drivers in order for the code to be easier to maintain
> and improve readability, also for any newcomer.

Well, maybe.

But then please minimize the number of code lines changed in this
particular patch and please avoid changes that are not directly
related to the purpose of the patch.

> >>> The idea behind is to eliminate data duplication.
> >> What data duplication exactly do you mean?
> >>
> >> struct acpi_device *device is replaced with struct device *dev which
> >> is the same size.  The latter is then used to obtain a struct
> >> acpi_device pointer.  Why is it better to do this than to store the
> >> struct acpi_device itself?
> > This should be "store the struct acpi_device pointer itself", sorry.
>
> Hi,
> So let me explain the reasoning here:
>
> 1) I've had a pleasure to work with different drivers in ACPI directory. In my
> opinion the best ones I've seen, were build around the concept of struct
> device (e.g NFIT). It's a struct that's well understood in the kernel, and
> it's easier to understand without having to read any ACPI specific code.
> If I see something like ACPI_HANDLE(dev), I think 'ah-ha -  having a 
> struct
> device I can easily get struct acpi_device - they're connected'. And I 
> think
> using a standardized macro, instead of manually dereferencing pointers is
> also much easier for ACPI newbs reading the code, as it hides a bit 
> complexity
> of getting acpi device from struct device. And to be honest I don't think 
> there would
> be any measurable performance change, as those code paths are far from
> being considered 'hot paths'. So if we can get the code easier to 
> understand
> from a newcomer perspective, why not do it.

I have a differing opinion on a couple of points above, and let's make
one change at a time.

>
> 2) I think it would be good to stick to the convention, and introduce some 
> coding
>  pattern, for now some drivers store the struct device pointer, and other 
> store
>  acpi device pointer . As I'm doing this change acpi device pointer 
> become less
>  relevant, as we're using platform device. So to reflect that loss of 
> relevance
>  we can choose to adhere to a pattern where we use it less and less, and 
> the
>  winning approach would be to use 'struct device' by default everywhere 
> we can
>  so maybe eventually we would be able to lose acpi_device altogether at 
> some point,
>  as most of the usage is to retrieve acpi handle and execute some AML 
> method.
>  So in my understanding acpi device is already obsolete at this point, if 
> we can
>  manage to use it less and less, and eventually wipe it out then why not 
> ;-)

No, ACPI device is not obsolete, it will still be needed for various
things, like power management and hotplug.

Also, I'd rather store a struct acpi_device than acpi_handle, because
the latter is much better from the compile-time type correctness
checks and similar.

I can send my version of the $subject patch just fine if you strongly
disagree with me.



Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Yajun Deng



On 2023/10/9 18:16, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng  wrote:


On 2023/10/9 17:30, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng  wrote:

On 2023/10/9 16:20, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng  wrote:

On 2023/10/9 15:53, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:


'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
the trace work well.

They all have 'pop' instructions in them. This may be the key to making
the trace work well.

Hi all,

I need your help on percpu and ftrace.


I do not think you made sure netdev_core_stats_inc() was never inlined.

Adding more code in it is simply changing how the compiler decides to
inline or not.

Yes, you are right. It needs to add the 'noinline' prefix. The
disassembly code will have 'pop'

instruction.


The function was fine, you do not need anything like push or pop.

The only needed stuff was the call __fentry__.

The fact that the function was inlined for some invocations was the
issue, because the trace point
is only planted in the out of line function.

But somehow the following code isn't inline? They didn't need to add the
'noinline' prefix.

+   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+   WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;


I think you are very confused.

You only want to trace netdev_core_stats_inc() entry point, not
arbitrary pieces of it.


Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace

+   field = (__force unsigned long
__percpu *)((__force void *)p + offset);
+   this_cpu_inc(*field);

with

+   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+   WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;

The netdev_core_stats_inc() entry point will work fine even if it doesn't
have 'noinline' prefix.

I don't know why this code needs to add 'noinline' prefix.
+   field = (__force unsigned long __percpu *)((__force void *)p + 
offset);
+   this_cpu_inc(*field);


C compiler decides to inline or not, depending on various factors.

The most efficient (and small) code is generated by this_cpu_inc()
version, allowing the compiler to inline it.

If you copy/paste this_cpu_inc()  twenty times, then the compiler
would  not inline the function anymore.



Got it. Thank you.




Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Eric Dumazet
On Mon, Oct 9, 2023 at 11:43 AM Yajun Deng  wrote:
>
>
> On 2023/10/9 17:30, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng  wrote:
> >>
> >> On 2023/10/9 16:20, Eric Dumazet wrote:
> >>> On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng  wrote:
>  On 2023/10/9 15:53, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:
> >
> >> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> >> the trace work well.
> >>
> >> They all have 'pop' instructions in them. This may be the key to making
> >> the trace work well.
> >>
> >> Hi all,
> >>
> >> I need your help on percpu and ftrace.
> >>
> > I do not think you made sure netdev_core_stats_inc() was never inlined.
> >
> > Adding more code in it is simply changing how the compiler decides to
> > inline or not.
>  Yes, you are right. It needs to add the 'noinline' prefix. The
>  disassembly code will have 'pop'
> 
>  instruction.
> 
> >>> The function was fine, you do not need anything like push or pop.
> >>>
> >>> The only needed stuff was the call __fentry__.
> >>>
> >>> The fact that the function was inlined for some invocations was the
> >>> issue, because the trace point
> >>> is only planted in the out of line function.
> >>
> >> But somehow the following code isn't inline? They didn't need to add the
> >> 'noinline' prefix.
> >>
> >> +   field = (unsigned long *)((void *)this_cpu_ptr(p) + 
> >> offset);
> >> +   WRITE_ONCE(*field, READ_ONCE(*field) + 1);
> >>
> >> Or
> >> +   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
> >>
> > I think you are very confused.
> >
> > You only want to trace netdev_core_stats_inc() entry point, not
> > arbitrary pieces of it.
>
>
> Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace
>
> +   field = (__force unsigned long
> __percpu *)((__force void *)p + offset);
> +   this_cpu_inc(*field);
>
> with
>
> +   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> +   WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>
> Or
> +   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>
> The netdev_core_stats_inc() entry point will work fine even if it doesn't
> have 'noinline' prefix.
>
> I don't know why this code needs to add 'noinline' prefix.
> +   field = (__force unsigned long __percpu *)((__force void *)p 
> + offset);
> +   this_cpu_inc(*field);
>

C compiler decides to inline or not, depending on various factors.

The most efficient (and small) code is generated by this_cpu_inc()
version, allowing the compiler to inline it.

If you copy/paste this_cpu_inc()  twenty times, then the compiler
would  not inline the function anymore.



Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Yajun Deng



On 2023/10/9 17:30, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng  wrote:


On 2023/10/9 16:20, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng  wrote:

On 2023/10/9 15:53, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:


'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
the trace work well.

They all have 'pop' instructions in them. This may be the key to making
the trace work well.

Hi all,

I need your help on percpu and ftrace.


I do not think you made sure netdev_core_stats_inc() was never inlined.

Adding more code in it is simply changing how the compiler decides to
inline or not.

Yes, you are right. It needs to add the 'noinline' prefix. The
disassembly code will have 'pop'

instruction.


The function was fine, you do not need anything like push or pop.

The only needed stuff was the call __fentry__.

The fact that the function was inlined for some invocations was the
issue, because the trace point
is only planted in the out of line function.


But somehow the following code isn't inline? They didn't need to add the
'noinline' prefix.

+   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+   WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;


I think you are very confused.

You only want to trace netdev_core_stats_inc() entry point, not
arbitrary pieces of it.



Yes, I will trace netdev_core_stats_inc() entry point. I mean to replace

+                                   field = (__force unsigned long 
__percpu *)((__force void *)p + offset);

+                                   this_cpu_inc(*field);

with

+   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+   WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;

The netdev_core_stats_inc() entry point will work fine even if it doesn't
have 'noinline' prefix.

I don't know why this code needs to add 'noinline' prefix.
+           field = (__force unsigned long __percpu *)((__force void *)p + 
offset);
+           this_cpu_inc(*field);




Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Eric Dumazet
On Mon, Oct 9, 2023 at 10:36 AM Yajun Deng  wrote:
>
>
> On 2023/10/9 16:20, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng  wrote:
> >>
> >> On 2023/10/9 15:53, Eric Dumazet wrote:
> >>> On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:
> >>>
>  'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
>  the trace work well.
> 
>  They all have 'pop' instructions in them. This may be the key to making
>  the trace work well.
> 
>  Hi all,
> 
>  I need your help on percpu and ftrace.
> 
> >>> I do not think you made sure netdev_core_stats_inc() was never inlined.
> >>>
> >>> Adding more code in it is simply changing how the compiler decides to
> >>> inline or not.
> >>
> >> Yes, you are right. It needs to add the 'noinline' prefix. The
> >> disassembly code will have 'pop'
> >>
> >> instruction.
> >>
> > The function was fine, you do not need anything like push or pop.
> >
> > The only needed stuff was the call __fentry__.
> >
> > The fact that the function was inlined for some invocations was the
> > issue, because the trace point
> > is only planted in the out of line function.
>
>
> But somehow the following code isn't inline? They didn't need to add the
> 'noinline' prefix.
>
> +   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
> +   WRITE_ONCE(*field, READ_ONCE(*field) + 1);
>
> Or
> +   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;
>

I think you are very confused.

You only want to trace netdev_core_stats_inc() entry point, not
arbitrary pieces of it.



Re: [PATCH v9 1/5] lib: objpool added: ring-array based lockless MPMC

2023-10-09 Thread wuqiang

Hello Masami,

Just got time for the new patch and got that ages[] was removed. ages[] is
introduced the way like 2-phase commit to keep consitency and must be kept.

Thinking of the following 2 cases that two cpu nodes are operating the same
objpool_slot simultaneously:

Case 1:

  NODE 1:  NODE 2:
  push to an empty slotpop will get wrong value

  try_cmpxchg_acquire(>tail, , next)
   try_cmpxchg_release(>head, , head + 1)
   return slot->entries[head & slot->mask]
  WRITE_ONCE(slot->entries[tail & slot->mask], obj)


Case 2:

  NODE 1:  NODE 2
  push to slot w/ 1 objpop will get wrong value

   try_cmpxchg_release(>head, , head + 1)
  try_cmpxchg_acquire(>tail, , next)
  WRITE_ONCE(slot->entries[tail & slot->mask], obj)
   return slot->entries[head & slot->mask]


Regards,
wuqiang

On 2023/9/25 17:42, Masami Hiramatsu (Google) wrote:

Hi Wuqiang,

On Tue,  5 Sep 2023 09:52:51 +0800
"wuqiang.matt"  wrote:


The object pool is a scalable implementaion of high performance queue
for object allocation and reclamation, such as kretprobe instances.

With leveraging percpu ring-array to mitigate the hot spot of memory
contention, it could deliver near-linear scalability for high parallel
scenarios. The objpool is best suited for following cases:
1) Memory allocation or reclamation are prohibited or too expensive
2) Consumers are of different priorities, such as irqs and threads

Limitations:
1) Maximum objects (capacity) is determined during pool initializing
and can't be modified (extended) after objpool creation
2) The memory of objects won't be freed until objpool is finalized
3) Object allocation (pop) may fail after trying all cpu slots


I made a simplifying patch on this by (mainly) removing ages array.
I also rename local variable to use more readable names, like slot,
pool, and obj.

Here the results which I run the test_objpool.ko.

Original:
[   50.500235] Summary of testcases:
[   50.503139] duration: 1027135us  hits:   30628293miss:  
0sync: percpu objpool
[   50.510416] duration: 1047977us  hits:   30453023miss:  
0sync: percpu objpool from vmalloc
[   50.517421] duration: 1047098us  hits:   31145034miss:  0
sync & hrtimer: percpu objpool
[   50.524671] duration: 1053233us  hits:   30919640miss:  0
sync & hrtimer: percpu objpool from vmalloc
[   50.531382] duration: 1055822us  hits:3407221miss: 
830219sync overrun: percpu objpool
[   50.538135] duration: 1055998us  hits:3404624miss: 
854160sync overrun: percpu objpool from vmalloc
[   50.546686] duration: 1046104us  hits:   19464798miss:  
0async: percpu objpool
[   50.552989] duration: 1033054us  hits:   18957836miss:  
0async: percpu objpool from vmalloc
[   50.560289] duration: 1041778us  hits:   33806868miss:  0
async & hrtimer: percpu objpool
[   50.567425] duration: 1048901us  hits:   34211860miss:  0
async & hrtimer: percpu objpool from vmalloc

Simplified:
[   48.393236] Summary of testcases:
[   48.395384] duration: 1013002us  hits:   29661448miss:  
0sync: percpu objpool
[   48.400351] duration: 1057231us  hits:   31035578miss:  
0sync: percpu objpool from vmalloc
[   48.405660] duration: 1043196us  hits:   30546652miss:  0
sync & hrtimer: percpu objpool
[   48.411216] duration: 1047128us  hits:   30601306miss:  0
sync & hrtimer: percpu objpool from vmalloc
[   48.417231] duration: 1051695us  hits:3468287miss: 
892881sync overrun: percpu objpool
[   48.422405] duration: 1054003us  hits:3549491miss: 
898141sync overrun: percpu objpool from vmalloc
[   48.428425] duration: 1052946us  hits:   19005228miss:  
0async: percpu objpool
[   48.433597] duration: 1051757us  hits:   19670866miss:  
0async: percpu objpool from vmalloc
[   48.439280] duration: 1042951us  hits:   37135332miss:  0
async & hrtimer: percpu objpool
[   48.445085] duration: 1029803us  hits:   37093248miss:  0
async & hrtimer: percpu objpool from vmalloc

Can you test it too?

Thanks,

 From f1f442ff653e329839e5452b8b88463a80a12ff3 Mon Sep 17 00:00:00 2001
From: "Masami Hiramatsu (Google)" 
Date: Mon, 25 Sep 2023 16:07:12 +0900
Subject: [PATCH] objpool: Simplify objpool by removing ages array

Simplify the objpool code by removing ages array from per-cpu slot.
It chooses enough big capacity (which is a rounded up power of 2 value
of nr_objects + 1) for the entries array, the tail never 

Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-09 Thread Michal Suchánek
Hello,

On Mon, Oct 09, 2023 at 05:31:02PM +0900, Masahiro Yamada wrote:
> On Fri, Oct 6, 2023 at 12:49 AM Michal Suchanek  wrote:
> >
> > The default MODLIB value is composed of two variables and the hardcoded
> > string '/lib/modules/'.
> >
> > MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
> >
> > Defining this middle part as a variable was rejected on the basis that
> > users can pass the whole MODLIB to make, such as
> 
> 
> In other words, do you want to say
> 
> "If defining this middle part as a variable had been accepted,
> this patch would have been unneeded." ?

If it were accepted I would not have to guess what the middle part is,
and could use the variable that unambiguosly defines it instead.

Thanks

Michal

> 
> 
> If your original patch were accepted, how would this patch look like?
> 
> kernel.spec needs to know the module directory somehow.
> 
> 
> Would you add the following in scripts/package/mkspec ?
> 
> %define MODLIB $(pkg-config --print-variables kmod 2>/dev/null | grep
> '^module_directory$' >/dev/null && pkg-config
> --variable=module_directory kmod || echo /lib/modules)
> 
> 
> 
> 
> 
> 
> 
> 
> >
> > make 'MODLIB=$(INSTALL_MOD_PATH)/usr/lib/modules/$(KERNELRELEASE)'
> >
> > However, this middle part of MODLIB is independently hardcoded by
> > rpm-pkg, and when the user alters MODLIB this is not reflected when
> > building the package.
> >
> > Given that $(INSTALL_MOD_PATH) is overridden during the rpm package build
> > it is likely going to be empty. Then MODLIB can be passed to the rpm
> > package, and used in place of the whole
> > /usr/lib/modules/$(KERNELRELEASE) part.
> >
> > Signed-off-by: Michal Suchanek 
> > ---
> >  scripts/package/kernel.spec | 8 
> >  scripts/package/mkspec  | 1 +
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/scripts/package/kernel.spec b/scripts/package/kernel.spec
> > index 3eee0143e0c5..15f49c5077db 100644
> > --- a/scripts/package/kernel.spec
> > +++ b/scripts/package/kernel.spec
> > @@ -67,7 +67,7 @@ cp $(%{make} %{makeflags} -s image_name) 
> > %{buildroot}/boot/vmlinuz-%{KERNELRELEA
> >  %{make} %{makeflags} INSTALL_HDR_PATH=%{buildroot}/usr headers_install
> >  cp System.map %{buildroot}/boot/System.map-%{KERNELRELEASE}
> >  cp .config %{buildroot}/boot/config-%{KERNELRELEASE}
> > -ln -fns /usr/src/kernels/%{KERNELRELEASE} 
> > %{buildroot}/lib/modules/%{KERNELRELEASE}/build
> > +ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}%{MODLIB}/build
> >  %if %{with_devel}
> >  %{make} %{makeflags} run-command 
> > KBUILD_RUN_COMMAND='${srctree}/scripts/package/install-extmod-build 
> > %{buildroot}/usr/src/kernels/%{KERNELRELEASE}'
> >  %endif
> > @@ -98,8 +98,8 @@ fi
> >
> >  %files
> >  %defattr (-, root, root)
> > -/lib/modules/%{KERNELRELEASE}
> > -%exclude /lib/modules/%{KERNELRELEASE}/build
> > +%{MODLIB}
> > +%exclude %{MODLIB}/build
> >  /boot/*
> >
> >  %files headers
> > @@ -110,5 +110,5 @@ fi
> >  %files devel
> >  %defattr (-, root, root)
> >  /usr/src/kernels/%{KERNELRELEASE}
> > -/lib/modules/%{KERNELRELEASE}/build
> > +%{MODLIB}/build
> >  %endif
> > diff --git a/scripts/package/mkspec b/scripts/package/mkspec
> > index d41608efb747..d41b2e5304ac 100755
> > --- a/scripts/package/mkspec
> > +++ b/scripts/package/mkspec
> > @@ -18,6 +18,7 @@ fi
> >  cat< >  %define ARCH ${ARCH}
> >  %define KERNELRELEASE ${KERNELRELEASE}
> > +%define MODLIB ${MODLIB}
> >  %define pkg_release $("${srctree}/init/build-version")
> >  EOF
> >
> > --
> > 2.42.0
> >
> 
> 
> --
> Best Regards
> Masahiro Yamada



Re: [PATCH 3/7] dt-bindings: arm: qcom: Add Xiaomi Redmi Note 9S

2023-10-09 Thread Krzysztof Kozlowski
On 07/10/2023 15:58, David Wronek wrote:
> Document the Xiaomi Redmi Note 9S (curtana) smartphone, which is based
> on the Qualcomm SM7125 SoC.

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof



Re: [PATCH 2/7] dt-bindings: phy: Add QMP UFS PHY compatible for SC7180

2023-10-09 Thread Krzysztof Kozlowski
On 07/10/2023 15:58, David Wronek wrote:
> Document the QMP UFS PHY compatible for SC7180
> 
> Signed-off-by: David Wronek 
> ---
>  .../devicetree/bindings/phy/qcom,sc8280xp-qmp-ufs-phy.yaml   | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git 
> a/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-ufs-phy.yaml 
> b/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-ufs-phy.yaml
> index f3a3296c811c..f2eee8b5326f 100644
> --- a/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-ufs-phy.yaml
> +++ b/Documentation/devicetree/bindings/phy/qcom,sc8280xp-qmp-ufs-phy.yaml
> @@ -19,6 +19,7 @@ properties:
>- qcom,msm8996-qmp-ufs-phy
>- qcom,msm8998-qmp-ufs-phy
>- qcom,sa8775p-qmp-ufs-phy
> +  - qcom,sc7180-qmp-ufs-phy

You also need to update the if: for clocks.

Best regards,
Krzysztof



Re: [PATCH 1/7] dt-bindings: ufs: qcom: Add SC7180 compatible string

2023-10-09 Thread Krzysztof Kozlowski
On 07/10/2023 15:58, David Wronek wrote:
> Document the compatible for the UFS found on SC7180.
> 
> Signed-off-by: David Wronek 
> ---
>  Documentation/devicetree/bindings/ufs/qcom,ufs.yaml | 2 ++
>  1 file changed, 2 insertions(+)

Reviewed-by: Krzysztof Kozlowski 

Best regards,
Krzysztof



Re: [PATCH v2 3/6] ACPI: AC: Replace acpi_driver with platform_driver

2023-10-09 Thread Wilczynski, Michal


Hi !

Thanks a lot for a review, to both of you ! :-)

On 10/7/2023 12:43 PM, Rafael J. Wysocki wrote:
> On Sat, Oct 7, 2023 at 12:41 PM Rafael J. Wysocki  wrote:
>> On Sat, Oct 7, 2023 at 9:56 AM Andy Shevchenko
>>  wrote:
>>> On Fri, Oct 06, 2023 at 09:47:57PM +0200, Rafael J. Wysocki wrote:
 On Fri, Oct 6, 2023 at 8:33 PM Michal Wilczynski
  wrote:
>>> ...
>>>
>  struct acpi_ac {
> struct power_supply *charger;
> struct power_supply_desc charger_desc;
> -   struct acpi_device *device;
> +   struct device *dev;
 I'm not convinced about this change.

 If I'm not mistaken, you only use the dev pointer above to get the
 ACPI_COMPANION() of it, but the latter is already found in _probe(),
 so it can be stored in struct acpi_ac for later use and then the dev
 pointer in there will not be necessary any more.

 That will save you a bunch of ACPI_HANDLE() evaluations and there's
 nothing wrong with using ac->device->handle.  The patch will then
 become almost trivial AFAICS and if you really need to get from ac to
 the underlying platform device, a pointer to it can be added to struct
 acpi_ac without removing the ACPI device pointer from it.

Yeah we could add platform device without removing acpi device, and
yes that would introduce data duplication, like Andy noticed. And yes,
maybe for this particular driver there is little impact (as struct device is not
really used), but in my opinion we should adhere to some common coding
pattern among all acpi drivers in order for the code to be easier to maintain
and improve readability, also for any newcomer.

>>> The idea behind is to eliminate data duplication.
>> What data duplication exactly do you mean?
>>
>> struct acpi_device *device is replaced with struct device *dev which
>> is the same size.  The latter is then used to obtain a struct
>> acpi_device pointer.  Why is it better to do this than to store the
>> struct acpi_device itself?
> This should be "store the struct acpi_device pointer itself", sorry.

Hi,
So let me explain the reasoning here:

1) I've had a pleasure to work with different drivers in ACPI directory. In my
    opinion the best ones I've seen, were build around the concept of struct
    device (e.g NFIT). It's a struct that's well understood in the kernel, and
    it's easier to understand without having to read any ACPI specific code.
    If I see something like ACPI_HANDLE(dev), I think 'ah-ha -  having a struct
    device I can easily get struct acpi_device - they're connected'. And I think
    using a standardized macro, instead of manually dereferencing pointers is
    also much easier for ACPI newbs reading the code, as it hides a bit 
complexity
    of getting acpi device from struct device. And to be honest I don't think 
there would
    be any measurable performance change, as those code paths are far from
    being considered 'hot paths'. So if we can get the code easier to understand
    from a newcomer perspective, why not do it.
   
   
2) I think it would be good to stick to the convention, and introduce some 
coding
 pattern, for now some drivers store the struct device pointer, and other 
store
 acpi device pointer . As I'm doing this change acpi device pointer become 
less
 relevant, as we're using platform device. So to reflect that loss of 
relevance
 we can choose to adhere to a pattern where we use it less and less, and the
 winning approach would be to use 'struct device' by default everywhere we 
can
 so maybe eventually we would be able to lose acpi_device altogether at 
some point,
 as most of the usage is to retrieve acpi handle and execute some AML 
method.
 So in my understanding acpi device is already obsolete at this point, if 
we can
 manage to use it less and less, and eventually wipe it out then why not ;-)
   

Thanks again !

Michał





Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Yajun Deng



On 2023/10/9 16:20, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng  wrote:


On 2023/10/9 15:53, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:


'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
the trace work well.

They all have 'pop' instructions in them. This may be the key to making
the trace work well.

Hi all,

I need your help on percpu and ftrace.


I do not think you made sure netdev_core_stats_inc() was never inlined.

Adding more code in it is simply changing how the compiler decides to
inline or not.


Yes, you are right. It needs to add the 'noinline' prefix. The
disassembly code will have 'pop'

instruction.


The function was fine, you do not need anything like push or pop.

The only needed stuff was the call __fentry__.

The fact that the function was inlined for some invocations was the
issue, because the trace point
is only planted in the out of line function.



But somehow the following code isn't inline? They didn't need to add the 
'noinline' prefix.


+   field = (unsigned long *)((void *)this_cpu_ptr(p) + offset);
+   WRITE_ONCE(*field, READ_ONCE(*field) + 1);

Or
+   (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++;




Re: [PATCH rebased] kbuild: rpm-pkg: Fix build with non-default MODLIB

2023-10-09 Thread Masahiro Yamada
On Fri, Oct 6, 2023 at 12:49 AM Michal Suchanek  wrote:
>
> The default MODLIB value is composed of two variables and the hardcoded
> string '/lib/modules/'.
>
> MODLIB = $(INSTALL_MOD_PATH)/lib/modules/$(KERNELRELEASE)
>
> Defining this middle part as a variable was rejected on the basis that
> users can pass the whole MODLIB to make, such as


In other words, do you want to say

"If defining this middle part as a variable had been accepted,
this patch would have been unneeded." ?


I do not think so.


If your original patch were accepted, how would this patch look like?

kernel.spec needs to know the module directory somehow.


Would you add the following in scripts/package/mkspec ?

%define MODLIB $(pkg-config --print-variables kmod 2>/dev/null | grep
'^module_directory$' >/dev/null && pkg-config
--variable=module_directory kmod || echo /lib/modules)








>
> make 'MODLIB=$(INSTALL_MOD_PATH)/usr/lib/modules/$(KERNELRELEASE)'
>
> However, this middle part of MODLIB is independently hardcoded by
> rpm-pkg, and when the user alters MODLIB this is not reflected when
> building the package.
>
> Given that $(INSTALL_MOD_PATH) is overridden during the rpm package build
> it is likely going to be empty. Then MODLIB can be passed to the rpm
> package, and used in place of the whole
> /usr/lib/modules/$(KERNELRELEASE) part.
>
> Signed-off-by: Michal Suchanek 
> ---
>  scripts/package/kernel.spec | 8 
>  scripts/package/mkspec  | 1 +
>  2 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/scripts/package/kernel.spec b/scripts/package/kernel.spec
> index 3eee0143e0c5..15f49c5077db 100644
> --- a/scripts/package/kernel.spec
> +++ b/scripts/package/kernel.spec
> @@ -67,7 +67,7 @@ cp $(%{make} %{makeflags} -s image_name) 
> %{buildroot}/boot/vmlinuz-%{KERNELRELEA
>  %{make} %{makeflags} INSTALL_HDR_PATH=%{buildroot}/usr headers_install
>  cp System.map %{buildroot}/boot/System.map-%{KERNELRELEASE}
>  cp .config %{buildroot}/boot/config-%{KERNELRELEASE}
> -ln -fns /usr/src/kernels/%{KERNELRELEASE} 
> %{buildroot}/lib/modules/%{KERNELRELEASE}/build
> +ln -fns /usr/src/kernels/%{KERNELRELEASE} %{buildroot}%{MODLIB}/build
>  %if %{with_devel}
>  %{make} %{makeflags} run-command 
> KBUILD_RUN_COMMAND='${srctree}/scripts/package/install-extmod-build 
> %{buildroot}/usr/src/kernels/%{KERNELRELEASE}'
>  %endif
> @@ -98,8 +98,8 @@ fi
>
>  %files
>  %defattr (-, root, root)
> -/lib/modules/%{KERNELRELEASE}
> -%exclude /lib/modules/%{KERNELRELEASE}/build
> +%{MODLIB}
> +%exclude %{MODLIB}/build
>  /boot/*
>
>  %files headers
> @@ -110,5 +110,5 @@ fi
>  %files devel
>  %defattr (-, root, root)
>  /usr/src/kernels/%{KERNELRELEASE}
> -/lib/modules/%{KERNELRELEASE}/build
> +%{MODLIB}/build
>  %endif
> diff --git a/scripts/package/mkspec b/scripts/package/mkspec
> index d41608efb747..d41b2e5304ac 100755
> --- a/scripts/package/mkspec
> +++ b/scripts/package/mkspec
> @@ -18,6 +18,7 @@ fi
>  cat<  %define ARCH ${ARCH}
>  %define KERNELRELEASE ${KERNELRELEASE}
> +%define MODLIB ${MODLIB}
>  %define pkg_release $("${srctree}/init/build-version")
>  EOF
>
> --
> 2.42.0
>


--
Best Regards
Masahiro Yamada



Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Eric Dumazet
On Mon, Oct 9, 2023 at 10:14 AM Yajun Deng  wrote:
>
>
> On 2023/10/9 15:53, Eric Dumazet wrote:
> > On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:
> >
> >> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> >> the trace work well.
> >>
> >> They all have 'pop' instructions in them. This may be the key to making
> >> the trace work well.
> >>
> >> Hi all,
> >>
> >> I need your help on percpu and ftrace.
> >>
> > I do not think you made sure netdev_core_stats_inc() was never inlined.
> >
> > Adding more code in it is simply changing how the compiler decides to
> > inline or not.
>
>
> Yes, you are right. It needs to add the 'noinline' prefix. The
> disassembly code will have 'pop'
>
> instruction.
>

The function was fine, you do not need anything like push or pop.

The only needed stuff was the call __fentry__.

The fact that the function was inlined for some invocations was the
issue, because the trace point
is only planted in the out of line function.



Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Yajun Deng



On 2023/10/9 15:53, Eric Dumazet wrote:

On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:


'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
the trace work well.

They all have 'pop' instructions in them. This may be the key to making
the trace work well.

Hi all,

I need your help on percpu and ftrace.


I do not think you made sure netdev_core_stats_inc() was never inlined.

Adding more code in it is simply changing how the compiler decides to
inline or not.



Yes, you are right. It needs to add the 'noinline' prefix. The 
disassembly code will have 'pop'


instruction.

Thanks.




Re: [PATCH net-next v7] net/core: Introduce netdev_core_stats_inc()

2023-10-09 Thread Eric Dumazet
On Mon, Oct 9, 2023 at 5:07 AM Yajun Deng  wrote:

> 'this_cpu_read + this_cpu_write' and 'pr_info + this_cpu_inc' will make
> the trace work well.
>
> They all have 'pop' instructions in them. This may be the key to making
> the trace work well.
>
> Hi all,
>
> I need your help on percpu and ftrace.
>

I do not think you made sure netdev_core_stats_inc() was never inlined.

Adding more code in it is simply changing how the compiler decides to
inline or not.