Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-11 Thread Ryota Ozaki
On Fri, Dec 8, 2017 at 6:51 PM, Kengo NAKAHARA  wrote:
> Hi,
>
> On 2017/12/07 14:21, Taylor R Campbell wrote:
>> I dropped this thread on the floor a while ago and I forget what the
>> status was.  I've had a patch sitting in my tree for a while which I
>> brushed off to put the list update logic in separate functions, as I
>> think chuq requested a while ago, but still keep it all isolated to
>> subr_psref.c and avoid defining any new macros.
>>
>> If you've measured that SLIST works better -- which would make sense
>> because the typical bracketed psref_acquire/release nesting makes a
>> nice LIFO stack of the things so that SLIST_REMOVE should usually be
>> done in the first iteration -- then I'm happy to replace it by SLIST.
>> We should just make sure to fix this bug before netbsd-8 goes out!
>>
>> Thoughts?
>
> I measure IPv4 forwarding performance your patch(PSREF_LIST) version
> and SLIST version. At first, the result is quite affected by
> optimization option "-falign-functions"... Based on this, it seems
> there is almost no difference between PSREF_LIST and SLIST.
>
> However, it seems your patch has large diff... From the point of code
> stability, smaller diff SLIST version would be better for netbsd-8 branch
> to fix the bug. Because your patch causes some new ATF failures such as
> ldp_regen and route_change_ifp (reported by ozaki-r@n.o). We can probably
> fix them at once but guaranteeing its stability would take more time.

Hmm, the same failures also happen with the SLIST version. IIRC there
was no regression with it two months ago and so I didn't test it enough
days ago. Anyway I guess the failures are not because of your patch.
I'm sorry for that.

I have no idea why the failures happen. The patches may dig out a
latent bug in -current.

  ozaki-r


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-11 Thread Taylor R Campbell
> Date: Mon, 11 Dec 2017 11:55:27 +0100
> From: Edgar Fuß 
> 
> >  struct psref {
> > -   LIST_ENTRY(psref)   psref_entry;
> > +   SLIST_ENTRY(psref)  psref_entry;
> > +   /* To keep ABI with LIST_ENTRY(psref) version. */
> > +   void*psref_unused0;
> Isn't it somewhat fishy to manally pad this, knowing how big a [S]LIST_ENTRY 
> is? Wouldn't it be cleaner to use a union?

Also fine by me, but a little more work to implement.  Feel free to
add some ctasserts to raise your confidence in this.  I don't expect
LIST_ENTRY or SLIST_ENTRY to change, but I don't mind assertions to
add additional confidence.


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-11 Thread Edgar Fuß
>  struct psref {
> - LIST_ENTRY(psref)   psref_entry;
> + SLIST_ENTRY(psref)  psref_entry;
> + /* To keep ABI with LIST_ENTRY(psref) version. */
> + void*psref_unused0;
Isn't it somewhat fishy to manally pad this, knowing how big a [S]LIST_ENTRY 
is? Wouldn't it be cleaner to use a union?


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-10 Thread Kengo NAKAHARA
Hi,

On 2017/12/11 11:09, Taylor R Campbell wrote:
>> Date: Mon, 11 Dec 2017 10:53:11 +0900
>> From: Kengo NAKAHARA 
>>
>> Thank you for your reviewing. I see. Here is the updated patch.
>> [...]
>> If there is no problem, can I commit and pullup -8 the patch?
> 
> LGTM, please commit!

Thank you for your quickly reviewing. I commit it as subr_psref.c:r1.8
and psref.h:r1.3. I will send pullup request -8 branch.


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA 


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-10 Thread Taylor R Campbell
> Date: Mon, 11 Dec 2017 10:53:11 +0900
> From: Kengo NAKAHARA 
> 
> Thank you for your reviewing. I see. Here is the updated patch.
> [...]
> If there is no problem, can I commit and pullup -8 the patch?

LGTM, please commit!


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-10 Thread Kengo NAKAHARA
Hi,

On 2017/12/09 0:18, Taylor R Campbell wrote:
>> Date: Fri, 8 Dec 2017 18:51:28 +0900
>> From: Kengo NAKAHARA 
>>
>> However, it seems your patch has large diff... From the point of code
>> stability, smaller diff SLIST version would be better for netbsd-8 branch
>> to fix the bug. Because your patch causes some new ATF failures such as
>> ldp_regen and route_change_ifp (reported by ozaki-r@n.o). We can probably
>> fix them at once but guaranteeing its stability would take more time.
> 
> Not surprising I made a bug somewhere in there!  The SLIST version is
> fine by me.  However, there's one small nit:
> 
>> diff --git a/sys/sys/psref.h b/sys/sys/psref.h
>> index 88db6dbb603..9096a3798d6 100644
>> --- a/sys/sys/psref.h
>> +++ b/sys/sys/psref.h
>> @@ -69,7 +69,7 @@ struct psref_target {
>>   *  written only on the local CPU.
>>   */
>>  struct psref {
>> -LIST_ENTRY(psref)   psref_entry;
>> +SLIST_ENTRY(psref)  psref_entry;
>>  const struct psref_target   *psref_target;
> 
> In order to avoid changing the kernel ABI in netbsd-8, please add a
> void *psref_unused0 after psref_entry.  That way we don't have to
> think about the consequences of an ABI change in the branch, and we
> still have the opportunity to switch back to a doubly-linked list if
> we want.

Thank you for your reviewing. I see. Here is the updated patch.

diff --git a/sys/kern/subr_psref.c b/sys/kern/subr_psref.c
index c3f76ab0e74..9eac19def3f 100644
--- a/sys/kern/subr_psref.c
+++ b/sys/kern/subr_psref.c
@@ -78,7 +78,7 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c,v 1.7 2017/06/01 
02:45:13 chs Exp $");
 #include 
 #include 
 
-LIST_HEAD(psref_head, psref);
+SLIST_HEAD(psref_head, psref);
 
 static bool_psref_held(const struct psref_target *, struct psref_class *,
bool);
@@ -135,7 +135,7 @@ psref_cpu_drained_p(void *p, void *cookie, struct cpu_info 
*ci __unused)
const struct psref_cpu *pcpu = p;
bool *retp = cookie;
 
-   if (!LIST_EMPTY(>pcpu_head))
+   if (!SLIST_EMPTY(>pcpu_head))
*retp = false;
 }
 
@@ -194,7 +194,7 @@ psref_check_duplication(struct psref_cpu *pcpu, struct 
psref *psref,
bool found = false;
struct psref *_psref;
 
-   LIST_FOREACH(_psref, >pcpu_head, psref_entry) {
+   SLIST_FOREACH(_psref, >pcpu_head, psref_entry) {
if (_psref == psref &&
_psref->psref_target == target) {
found = true;
@@ -250,7 +250,7 @@ psref_acquire(struct psref *psref, const struct 
psref_target *target,
 #endif
 
/* Record our reference.  */
-   LIST_INSERT_HEAD(>pcpu_head, psref, psref_entry);
+   SLIST_INSERT_HEAD(>pcpu_head, psref, psref_entry);
psref->psref_target = target;
psref->psref_lwp = curlwp;
psref->psref_cpu = curcpu();
@@ -273,6 +273,7 @@ void
 psref_release(struct psref *psref, const struct psref_target *target,
 struct psref_class *class)
 {
+   struct psref_cpu *pcpu;
int s;
 
KASSERTMSG((kpreempt_disabled() || cpu_softintr_p() ||
@@ -302,7 +303,9 @@ psref_release(struct psref *psref, const struct 
psref_target *target,
 * (as does blocking interrupts).
 */
s = splraiseipl(class->prc_iplcookie);
-   LIST_REMOVE(psref, psref_entry);
+   pcpu = percpu_getref(class->prc_percpu);
+   SLIST_REMOVE(>pcpu_head, psref, psref, psref_entry);
+   percpu_putref(class->prc_percpu);
splx(s);
 
/* If someone is waiting for users to drain, notify 'em.  */
@@ -353,7 +356,7 @@ psref_copy(struct psref *pto, const struct psref *pfrom,
pcpu = percpu_getref(class->prc_percpu);
 
/* Record the new reference.  */
-   LIST_INSERT_HEAD(>pcpu_head, pto, psref_entry);
+   SLIST_INSERT_HEAD(>pcpu_head, pto, psref_entry);
pto->psref_target = pfrom->psref_target;
pto->psref_lwp = curlwp;
pto->psref_cpu = curcpu();
@@ -474,7 +477,7 @@ _psref_held(const struct psref_target *target, struct 
psref_class *class,
pcpu = percpu_getref(class->prc_percpu);
 
/* Search through all the references on this CPU.  */
-   LIST_FOREACH(psref, >pcpu_head, psref_entry) {
+   SLIST_FOREACH(psref, >pcpu_head, psref_entry) {
/* Sanity-check the reference's CPU.  */
KASSERTMSG((psref->psref_cpu == curcpu()),
"passive reference transferred from CPU %u to CPU %u",
diff --git a/sys/sys/psref.h b/sys/sys/psref.h
index 88db6dbb603..d1ab606e117 100644
--- a/sys/sys/psref.h
+++ b/sys/sys/psref.h
@@ -69,7 +69,9 @@ struct psref_target {
  * written only on the local CPU.
  */
 struct psref {
-   LIST_ENTRY(psref)   psref_entry;
+   SLIST_ENTRY(psref)  psref_entry;
+   /* To keep ABI with LIST_ENTRY(psref) version. */
+   void*psref_unused0;

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-08 Thread Taylor R Campbell
> Date: Fri, 8 Dec 2017 18:51:28 +0900
> From: Kengo NAKAHARA 
> 
> However, it seems your patch has large diff... From the point of code
> stability, smaller diff SLIST version would be better for netbsd-8 branch
> to fix the bug. Because your patch causes some new ATF failures such as
> ldp_regen and route_change_ifp (reported by ozaki-r@n.o). We can probably
> fix them at once but guaranteeing its stability would take more time.

Not surprising I made a bug somewhere in there!  The SLIST version is
fine by me.  However, there's one small nit:

> diff --git a/sys/sys/psref.h b/sys/sys/psref.h
> index 88db6dbb603..9096a3798d6 100644
> --- a/sys/sys/psref.h
> +++ b/sys/sys/psref.h
> @@ -69,7 +69,7 @@ struct psref_target {
>   *   written only on the local CPU.
>   */
>  struct psref {
> - LIST_ENTRY(psref)   psref_entry;
> + SLIST_ENTRY(psref)  psref_entry;
>   const struct psref_target   *psref_target;

In order to avoid changing the kernel ABI in netbsd-8, please add a
void *psref_unused0 after psref_entry.  That way we don't have to
think about the consequences of an ABI change in the branch, and we
still have the opportunity to switch back to a doubly-linked list if
we want.


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-08 Thread Kengo NAKAHARA
Hi,

On 2017/12/07 14:21, Taylor R Campbell wrote:
> I dropped this thread on the floor a while ago and I forget what the
> status was.  I've had a patch sitting in my tree for a while which I
> brushed off to put the list update logic in separate functions, as I
> think chuq requested a while ago, but still keep it all isolated to
> subr_psref.c and avoid defining any new macros.
> 
> If you've measured that SLIST works better -- which would make sense
> because the typical bracketed psref_acquire/release nesting makes a
> nice LIFO stack of the things so that SLIST_REMOVE should usually be
> done in the first iteration -- then I'm happy to replace it by SLIST.
> We should just make sure to fix this bug before netbsd-8 goes out!
> 
> Thoughts?

I measure IPv4 forwarding performance your patch(PSREF_LIST) version
and SLIST version. At first, the result is quite affected by
optimization option "-falign-functions"... Based on this, it seems
there is almost no difference between PSREF_LIST and SLIST.

However, it seems your patch has large diff... From the point of code
stability, smaller diff SLIST version would be better for netbsd-8 branch
to fix the bug. Because your patch causes some new ATF failures such as
ldp_regen and route_change_ifp (reported by ozaki-r@n.o). We can probably
fix them at once but guaranteeing its stability would take more time.

The SLIST version patch is following.

diff --git a/sys/kern/subr_psref.c b/sys/kern/subr_psref.c
index c3f76ab0e74..9eac19def3f 100644
--- a/sys/kern/subr_psref.c
+++ b/sys/kern/subr_psref.c
@@ -78,7 +78,7 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c,v 1.7 2017/06/01 
02:45:13 chs Exp $");
 #include 
 #include 
 
-LIST_HEAD(psref_head, psref);
+SLIST_HEAD(psref_head, psref);
 
 static bool_psref_held(const struct psref_target *, struct psref_class *,
bool);
@@ -135,7 +135,7 @@ psref_cpu_drained_p(void *p, void *cookie, struct cpu_info 
*ci __unused)
const struct psref_cpu *pcpu = p;
bool *retp = cookie;
 
-   if (!LIST_EMPTY(>pcpu_head))
+   if (!SLIST_EMPTY(>pcpu_head))
*retp = false;
 }
 
@@ -194,7 +194,7 @@ psref_check_duplication(struct psref_cpu *pcpu, struct 
psref *psref,
bool found = false;
struct psref *_psref;
 
-   LIST_FOREACH(_psref, >pcpu_head, psref_entry) {
+   SLIST_FOREACH(_psref, >pcpu_head, psref_entry) {
if (_psref == psref &&
_psref->psref_target == target) {
found = true;
@@ -250,7 +250,7 @@ psref_acquire(struct psref *psref, const struct 
psref_target *target,
 #endif
 
/* Record our reference.  */
-   LIST_INSERT_HEAD(>pcpu_head, psref, psref_entry);
+   SLIST_INSERT_HEAD(>pcpu_head, psref, psref_entry);
psref->psref_target = target;
psref->psref_lwp = curlwp;
psref->psref_cpu = curcpu();
@@ -273,6 +273,7 @@ void
 psref_release(struct psref *psref, const struct psref_target *target,
 struct psref_class *class)
 {
+   struct psref_cpu *pcpu;
int s;
 
KASSERTMSG((kpreempt_disabled() || cpu_softintr_p() ||
@@ -302,7 +303,9 @@ psref_release(struct psref *psref, const struct 
psref_target *target,
 * (as does blocking interrupts).
 */
s = splraiseipl(class->prc_iplcookie);
-   LIST_REMOVE(psref, psref_entry);
+   pcpu = percpu_getref(class->prc_percpu);
+   SLIST_REMOVE(>pcpu_head, psref, psref, psref_entry);
+   percpu_putref(class->prc_percpu);
splx(s);
 
/* If someone is waiting for users to drain, notify 'em.  */
@@ -353,7 +356,7 @@ psref_copy(struct psref *pto, const struct psref *pfrom,
pcpu = percpu_getref(class->prc_percpu);
 
/* Record the new reference.  */
-   LIST_INSERT_HEAD(>pcpu_head, pto, psref_entry);
+   SLIST_INSERT_HEAD(>pcpu_head, pto, psref_entry);
pto->psref_target = pfrom->psref_target;
pto->psref_lwp = curlwp;
pto->psref_cpu = curcpu();
@@ -474,7 +477,7 @@ _psref_held(const struct psref_target *target, struct 
psref_class *class,
pcpu = percpu_getref(class->prc_percpu);
 
/* Search through all the references on this CPU.  */
-   LIST_FOREACH(psref, >pcpu_head, psref_entry) {
+   SLIST_FOREACH(psref, >pcpu_head, psref_entry) {
/* Sanity-check the reference's CPU.  */
KASSERTMSG((psref->psref_cpu == curcpu()),
"passive reference transferred from CPU %u to CPU %u",
diff --git a/sys/sys/psref.h b/sys/sys/psref.h
index 88db6dbb603..9096a3798d6 100644
--- a/sys/sys/psref.h
+++ b/sys/sys/psref.h
@@ -69,7 +69,7 @@ struct psref_target {
  * written only on the local CPU.
  */
 struct psref {
-   LIST_ENTRY(psref)   psref_entry;
+   SLIST_ENTRY(psref)  psref_entry;
const struct psref_target   *psref_target;
struct lwp  *psref_lwp;

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-06 Thread Taylor R Campbell
> Date: Thu, 7 Dec 2017 05:23:59 +
> From: Taylor R Campbell 
> 
> > Date: Thu, 7 Dec 2017 05:21:58 +
> > From: Taylor R Campbell 
> > 
> > I dropped this thread on the floor a while ago and I forget what the
> > status was.  I've had a patch sitting in my tree for a while which I
> > brushed off to put the list update logic in separate functions, as I
> > think chuq requested a while ago, but still keep it all isolated to
> > subr_psref.c and avoid defining any new macros.
> 
> ...One of these days, I will teach the computer to remind me when I
> didn't attach the patch I mentioned.

Also I should maybe finish reviwwing the patch before attaching it.
(Caveat: compile-tested only.)
Index: sys/kern/subr_psref.c
===
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.7
diff -p -u -r1.7 subr_psref.c
--- sys/kern/subr_psref.c   1 Jun 2017 02:45:13 -   1.7
+++ sys/kern/subr_psref.c   7 Dec 2017 05:29:12 -
@@ -78,12 +78,133 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c
 #include 
 #include 
 
-LIST_HEAD(psref_head, psref);
-
 static bool_psref_held(const struct psref_target *, struct psref_class *,
bool);
 
 /*
+ * struct psref_head
+ *
+ * Head of a list of psrefs.
+ *
+ * We use a custom list data structure here in which the first
+ * element of the list has a null prevp, instead of a prevp
+ * pointing at the list head.  We must do this because the list
+ * head can migrate in memory due to percpu_alloc.
+ *
+ * Avoiding the back-pointer into the head further requires
+ * passing the head to psref_list_remove.  We can't use _any_ of
+ * the existing LIST_* because we can't distinguish which prevp
+ * points at the head with them.  So we can't even partly use the
+ * LIST_* macros with a kludge.
+ *
+ * This structure is of limited utility, so it is restricted to
+ * this file for now.
+ */
+struct psref_head {
+   struct psref*prh_first;
+};
+
+/*
+ * PSREF_LIST_FOREACH(psref, head)
+ *
+ * Loop header for iterating over the list of psrefs starting at
+ * head.
+ */
+#definePSREF_LIST_FOREACH(PSREF, HEAD) 
  \
+   for ((PSREF) = (HEAD)->prh_first; \
+(PSREF) != NULL; \
+(PSREF) = (PSREF)->psref_next)
+
+/*
+ * psref_list_empty(head)
+ *
+ * True iff the list of psrefs starting at head is empty.
+ */
+static inline bool
+psref_list_empty(const struct psref_head *head)
+{
+
+   return head->prh_first == NULL;
+}
+
+/*
+ * psref_list_insert_head(head, psref)
+ *
+ * Insert psref at the front of the list starting at head.
+ */
+static inline void
+psref_list_insert_head(struct psref_head *head, struct psref *psref)
+{
+
+   KASSERTMSG((head->prh_first == NULL ||
+   head->prh_first->psref_prevp == NULL),
+   "psref_head %p [cpu%d] psref %p prevp is %p, not null",
+   head, cpu_index(curcpu()), head->prh_first,
+   head->prh_first->psref_prevp);
+
+   /*
+* Set the next element of our new element to whatever the
+* first element is, if there is one, and if there is one, set
+* its previous pointer to point into our new element.
+*/
+   if ((psref->psref_next = head->prh_first) != NULL)
+   psref->psref_next->psref_prevp = >psref_next;
+
+   /* Mark the new element as the first in the list.  */
+   psref->psref_prevp = NULL;
+
+   /* Publish.  */
+   head->prh_first = psref;
+}
+
+/*
+ * psref_list_remove(head, psref)
+ *
+ * Remove psref from the list it is on, which must start at head.
+ */
+static inline void
+psref_list_remove(struct psref_head *head, struct psref *psref)
+{
+
+   /*
+* Either the previous pointer, if there is one, or the list
+* head, if there isn't because we're at the head of the list,
+* had better point at this psref.  If this is not at the head
+* of the list, the head of the list had better *not* point at
+* this psref.
+*/
+   KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == psref,
+   "psref %p prevp %p points at %p",
+   psref, psref->psref_prevp, *psref->psref_prevp);
+   KASSERTMSG((psref->psref_prevp == NULL) == (head->prh_first == psref),
+   "psref %p prevp %p psref_head %p [cpu%d] first is %p",
+   psref, psref->psref_prevp, head, cpu_index(curcpu()),
+   head->prh_first);
+
+   /* If there is a next element, set its previous pointer to ours.  */
+   if (psref->psref_next != NULL)
+   psref->psref_next->psref_prevp = psref->psref_prevp;
+
+   /*
+* Replace the pointer to this one by a pointer 

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-06 Thread Taylor R Campbell
> Date: Thu, 7 Dec 2017 05:21:58 +
> From: Taylor R Campbell 
> 
> I dropped this thread on the floor a while ago and I forget what the
> status was.  I've had a patch sitting in my tree for a while which I
> brushed off to put the list update logic in separate functions, as I
> think chuq requested a while ago, but still keep it all isolated to
> subr_psref.c and avoid defining any new macros.

...One of these days, I will teach the computer to remind me when I
didn't attach the patch I mentioned.
Index: sys/kern/subr_psref.c
===
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.7
diff -p -u -r1.7 subr_psref.c
--- sys/kern/subr_psref.c   1 Jun 2017 02:45:13 -   1.7
+++ sys/kern/subr_psref.c   7 Dec 2017 05:12:44 -
@@ -78,12 +78,129 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c
 #include 
 #include 
 
-LIST_HEAD(psref_head, psref);
-
 static bool_psref_held(const struct psref_target *, struct psref_class *,
bool);
 
 /*
+ * struct psref_head
+ *
+ * Head of a list of psrefs.
+ *
+ * We use a custom list data structure here in which the first
+ * element of the list has a null prevp, instead of a prevp
+ * pointing at the list head.  We must do this because the list
+ * head can migrate in memory due to percpu_alloc.
+ *
+ * Avoiding the back-pointer into the head further requires
+ * passing the head to psref_list_remove.  We can't use _any_ of
+ * the existing LIST_* because we can't distinguish which prevp
+ * points at the head with them.  So we can't even partly use the
+ * LIST_* macros with a kludge.
+ *
+ * This structure is of limited utility, so it is restricted to
+ * this file for now.
+ */
+struct psref_head {
+   struct psref*prh_first;
+};
+
+/*
+ * PSREF_LIST_FOREACH(psref, head)
+ *
+ * Loop header for iterating over the list of psrefs starting at
+ * head.
+ */
+#definePSREF_LIST_FOREACH(PSREF, HEAD) 
  \
+   for ((PSREF) = (HEAD)->prh_first; \
+(PSREF) != NULL; \
+(PSREF) = (PSREF)->psref_next)
+
+/*
+ * psref_list_empty(head)
+ *
+ * True iff the list of psrefs starting at head is empty.
+ */
+static inline bool
+psref_list_empty(const struct psref_head *head)
+{
+
+   return head->prh_first == NULL;
+}
+
+/*
+ * psref_list_insert_head(head, psref)
+ *
+ * Insert psref at the front of the list starting at head.
+ */
+static inline void
+psref_list_insert_head(struct psref_head *head, struct psref *psref)
+{
+
+   KASSERTMSG((head->prh_first == NULL ||
+   head->prh_first->psref_prevp == NULL),
+   "psref_head %p [cpu%d] psref %p prevp is %p, not null",
+   head, cpu_index(curcpu()), head->prh_first,
+   head->prh_first->psref_prevp);
+
+   /*
+* Set the next element of our new element to whatever the
+* first element is, if there is one, and if there is one, set
+* its previous pointer to point at our new element.
+*/
+   if ((psref->psref_next = head->prh_first) != NULL)
+   psref->psref_next->psref_prevp = >psref_next;
+
+   /* Mark the new element as the first in the list.  */
+   psref->psref_prevp = NULL;
+
+   /* Publish.  */
+   head->prh_first = psref;
+}
+
+/*
+ * psref_list_remove(head, psref)
+ *
+ * Remove psref from the list it is on, which must start at head.
+ */
+static inline void
+psref_list_remove(struct psref_head *head, struct psref *psref)
+{
+
+   /*
+* Either the previous pointer, if there is one, or the list
+* head, if there isn't because we're at the head of the list,
+* had better point at this psref.  If this is not at the head
+* of the list, the head of the list had better *not* point at
+* this psref.
+*/
+   KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == psref,
+   "psref %p prevp %p points at %p",
+   psref, psref->psref_prevp, *psref->psref_prevp);
+   KASSERTMSG((psref->psref_prevp == NULL) == (head->prh_first == psref),
+   "psref %p prevp %p psref_head %p [cpu%d] first is %p",
+   psref, psref->psref_prevp, head, cpu_index(curcpu()),
+   head->prh_first);
+
+   /* If there is a next element, set its previous pointer to ours.  */
+   if (psref->psref_next != NULL)
+   psref->psref_next->psref_prevp = psref->psref_prevp;
+
+   /*
+* If it's at the head of the list, we must modify the head
+* explicitly.  We can't have psref->psref_prevp be a
+* back-pointer into the head because percpu_alloc may move the
+* per-CPU objects around in memory, which would invalidate
+* that 

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-12-06 Thread Taylor R Campbell
I dropped this thread on the floor a while ago and I forget what the
status was.  I've had a patch sitting in my tree for a while which I
brushed off to put the list update logic in separate functions, as I
think chuq requested a while ago, but still keep it all isolated to
subr_psref.c and avoid defining any new macros.

If you've measured that SLIST works better -- which would make sense
because the typical bracketed psref_acquire/release nesting makes a
nice LIFO stack of the things so that SLIST_REMOVE should usually be
done in the first iteration -- then I'm happy to replace it by SLIST.
We should just make sure to fix this bug before netbsd-8 goes out!

Thoughts?


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-06 Thread Ryota Ozaki
On Fri, Oct 6, 2017 at 4:24 PM, Ryota Ozaki  wrote:
> On Fri, Oct 6, 2017 at 1:14 PM, Ryota Ozaki  wrote:
>> On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
>>  wrote:
 Date: Fri, 6 Oct 2017 11:26:40 +0900
 From: Ryota Ozaki 

 On Mon, Oct 2, 2017 at 11:13 PM, Taylor R Campbell
  wrote:
 > Quick summary of the problem:
 >
 > Possible solutions.  I'm leaning toward (6), to open-code the linked
 > list operations for this special purpose, with compile-tested patch
 > attached.  This changes the text of psref.h, but shouldn't change the
 > ABI.  Comments?

 How about using SLIST instead of open-coding? The instructions of them
 are very similar, but the SLIST version is slightly simpler.
>>>
>>> I avoided that because n psref_release operations takes expected and
>>> worst-case O(n^2) time and there's no constant bound on the latency of
>>> a single psref_release operation.  But maybe n is always small enough
>>> that it doesn't matter -- perhaps enough that the concrete cost of
>>> maintaining a doubly-linked list is higher.
>>
>> I also suppose that a target being released is at the head of the list
>> because targets of psref are typically manipulated in last-in-first-out
>> manner. But yes psref_release of SLIST version can be O(n) in worst
>> cases theoretically.
>>
>> Well, when I think this kind of problems I tend to think of our usages,
>> i.e., network processing as a router. In such usages, psref is used in
>> softint and the last-in-first-out manner would keep in many cases. But
>> in other usages such as uses in threads the assumption is perhaps not
>> really.
>>
>>>
>>> (My desire to avoid thinking about bounds on n is also what motivated
>>> me to use a linked list instead of an array in the first place.)
>>>
>>> Note that your patch changes the ABI of struct psref!
>>
>> Let's bump the kernel version!
>>
>>>
>>> I wonder whether the open-coded version would do better if it
>>> unconditionally loaded the percpu:
>>>
>>> pcpu = percpu_getref(class->prc_percpu);
>>> KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == 
>>> psref,
>>> "psref %p prevp %p points at %p",
>>> psref, psref->psref_prevp, *psref->psref_prevp);
>>> KASSERTMSG(psref->psref_prevp != NULL || pcpu->pcpu_first == psref,
>>> "psref %p marked as first but psref_cpu %p on %d first is %p",
>>> psref, pcpu, cpu_index(curcpu()), pcpu->pcpu_first);
>>> *(psref->psref_prevp ? psref->psref_prevp : >pcpu_first) =
>>> psref->psref_next;
>>> percpu_putref(class->prc_percpu);
>>>
>>> With DIAGNOSTIC disabled, I get a conditional move instruction instead
>>> of branches this way:
>>>
>>>  4f9:   e8 00 00 00 00  callq  4fe 
>>> 4fa: R_X86_64_PC32 
>>> percpu_getref+0xfffc
>>>  4fe:   48 8b 53 08 mov0x8(%rbx),%rdx
>>>  502:   48 85 d2test   %rdx,%rdx
>>>  505:   48 0f 44 d0 cmove  %rax,%rdx
>>>  509:   48 8b 03mov(%rbx),%rax
>>>  50c:   48 89 02mov%rax,(%rdx)
>>>  50f:   49 8b 7c 24 20  mov0x20(%r12),%rdi
>>>  514:   e8 00 00 00 00  callq  519 
>>> 515: R_X86_64_PC32 
>>> percpu_putref+0xfffc
>>>
>>> Also, my original patch was missing a percpu_putref.  I hope you put
>>> it back before you ran your test!
>>
>> I'll test with the patch in the other mail later.
>
> The above results were actually measured by knakahara@ and
> this time I've measured by myself (the measurement hardware
> is the same and the kernel configs of his and mine should
> be almost the same though).

I notice a difference between his and mine: check-out date of
the source code. Mine is checked out today while his is probably
some days ago.

>
> ...so the results show a slightly different trend:
>   - original:  137.9 138.0 (144.7) Mbps
>   - open-code: 135.2 134.6 (138.5) Mbps
>   - SLIST: 140.7 141.4 (140.1) Mbps

...though still these results are puzzling.

  ozaki-r

>
> I think knakahara@ (or someone else) needs to re-evaluate :-/
>
>   ozaki-r


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-06 Thread Manuel Bouyer
On Fri, Oct 06, 2017 at 04:21:48AM +, Taylor R Campbell wrote:
> > Date: Fri, 6 Oct 2017 13:14:14 +0900
> > From: Ryota Ozaki 
> > 
> > On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
> >  wrote:
> > > Note that your patch changes the ABI of struct psref!
> > 
> > Let's bump the kernel version!
> 
> If we want to pull the fix up to netbsd-8 (which we almost certainly
> do), then we should really avoid changing the ABI.  I forget what the
> rules are for pullups to pre-release branches, but changing the ABI is
> a big no-no post-release.

I think it's OK until the first release it tagged.
We already had a userland ABI change in netbsd-8

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-06 Thread Ryota Ozaki
On Fri, Oct 6, 2017 at 1:14 PM, Ryota Ozaki  wrote:
> On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
>  wrote:
>>> Date: Fri, 6 Oct 2017 11:26:40 +0900
>>> From: Ryota Ozaki 
>>>
>>> On Mon, Oct 2, 2017 at 11:13 PM, Taylor R Campbell
>>>  wrote:
>>> > Quick summary of the problem:
>>> >
>>> > Possible solutions.  I'm leaning toward (6), to open-code the linked
>>> > list operations for this special purpose, with compile-tested patch
>>> > attached.  This changes the text of psref.h, but shouldn't change the
>>> > ABI.  Comments?
>>>
>>> How about using SLIST instead of open-coding? The instructions of them
>>> are very similar, but the SLIST version is slightly simpler.
>>
>> I avoided that because n psref_release operations takes expected and
>> worst-case O(n^2) time and there's no constant bound on the latency of
>> a single psref_release operation.  But maybe n is always small enough
>> that it doesn't matter -- perhaps enough that the concrete cost of
>> maintaining a doubly-linked list is higher.
>
> I also suppose that a target being released is at the head of the list
> because targets of psref are typically manipulated in last-in-first-out
> manner. But yes psref_release of SLIST version can be O(n) in worst
> cases theoretically.
>
> Well, when I think this kind of problems I tend to think of our usages,
> i.e., network processing as a router. In such usages, psref is used in
> softint and the last-in-first-out manner would keep in many cases. But
> in other usages such as uses in threads the assumption is perhaps not
> really.
>
>>
>> (My desire to avoid thinking about bounds on n is also what motivated
>> me to use a linked list instead of an array in the first place.)
>>
>> Note that your patch changes the ABI of struct psref!
>
> Let's bump the kernel version!
>
>>
>> I wonder whether the open-coded version would do better if it
>> unconditionally loaded the percpu:
>>
>> pcpu = percpu_getref(class->prc_percpu);
>> KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == 
>> psref,
>> "psref %p prevp %p points at %p",
>> psref, psref->psref_prevp, *psref->psref_prevp);
>> KASSERTMSG(psref->psref_prevp != NULL || pcpu->pcpu_first == psref,
>> "psref %p marked as first but psref_cpu %p on %d first is %p",
>> psref, pcpu, cpu_index(curcpu()), pcpu->pcpu_first);
>> *(psref->psref_prevp ? psref->psref_prevp : >pcpu_first) =
>> psref->psref_next;
>> percpu_putref(class->prc_percpu);
>>
>> With DIAGNOSTIC disabled, I get a conditional move instruction instead
>> of branches this way:
>>
>>  4f9:   e8 00 00 00 00  callq  4fe 
>> 4fa: R_X86_64_PC32 
>> percpu_getref+0xfffc
>>  4fe:   48 8b 53 08 mov0x8(%rbx),%rdx
>>  502:   48 85 d2test   %rdx,%rdx
>>  505:   48 0f 44 d0 cmove  %rax,%rdx
>>  509:   48 8b 03mov(%rbx),%rax
>>  50c:   48 89 02mov%rax,(%rdx)
>>  50f:   49 8b 7c 24 20  mov0x20(%r12),%rdi
>>  514:   e8 00 00 00 00  callq  519 
>> 515: R_X86_64_PC32 
>> percpu_putref+0xfffc
>>
>> Also, my original patch was missing a percpu_putref.  I hope you put
>> it back before you ran your test!
>
> I'll test with the patch in the other mail later.

The above results were actually measured by knakahara@ and
this time I've measured by myself (the measurement hardware
is the same and the kernel configs of his and mine should
be almost the same though).

...so the results show a slightly different trend:
  - original:  137.9 138.0 (144.7) Mbps
  - open-code: 135.2 134.6 (138.5) Mbps
  - SLIST: 140.7 141.4 (140.1) Mbps

I think knakahara@ (or someone else) needs to re-evaluate :-/

  ozaki-r


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-05 Thread Ryota Ozaki
On Fri, Oct 6, 2017 at 1:21 PM, Taylor R Campbell
 wrote:
>> Date: Fri, 6 Oct 2017 13:14:14 +0900
>> From: Ryota Ozaki 
>>
>> On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
>>  wrote:
>> > Note that your patch changes the ABI of struct psref!
>>
>> Let's bump the kernel version!
>
> If we want to pull the fix up to netbsd-8 (which we almost certainly
> do), then we should really avoid changing the ABI.  I forget what the
> rules are for pullups to pre-release branches, but changing the ABI is
> a big no-no post-release.

Hmm, I thought we could change ABI until the release (or tagging an RC).
I mean my past pullups to netbsd-8 sometimes included ABI changes :-/

  ozaki-r


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-05 Thread Taylor R Campbell
> Date: Fri, 6 Oct 2017 13:14:14 +0900
> From: Ryota Ozaki 
> 
> On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
>  wrote:
> > Note that your patch changes the ABI of struct psref!
> 
> Let's bump the kernel version!

If we want to pull the fix up to netbsd-8 (which we almost certainly
do), then we should really avoid changing the ABI.  I forget what the
rules are for pullups to pre-release branches, but changing the ABI is
a big no-no post-release.


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-05 Thread Ryota Ozaki
On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
 wrote:
>> Date: Fri, 6 Oct 2017 11:26:40 +0900
>> From: Ryota Ozaki 
>>
>> On Mon, Oct 2, 2017 at 11:13 PM, Taylor R Campbell
>>  wrote:
>> > Quick summary of the problem:
>> >
>> > Possible solutions.  I'm leaning toward (6), to open-code the linked
>> > list operations for this special purpose, with compile-tested patch
>> > attached.  This changes the text of psref.h, but shouldn't change the
>> > ABI.  Comments?
>>
>> How about using SLIST instead of open-coding? The instructions of them
>> are very similar, but the SLIST version is slightly simpler.
>
> I avoided that because n psref_release operations takes expected and
> worst-case O(n^2) time and there's no constant bound on the latency of
> a single psref_release operation.  But maybe n is always small enough
> that it doesn't matter -- perhaps enough that the concrete cost of
> maintaining a doubly-linked list is higher.

I also suppose that a target being released is at the head of the list
because targets of psref are typically manipulated in last-in-first-out
manner. But yes psref_release of SLIST version can be O(n) in worst
cases theoretically.

Well, when I think this kind of problems I tend to think of our usages,
i.e., network processing as a router. In such usages, psref is used in
softint and the last-in-first-out manner would keep in many cases. But
in other usages such as uses in threads the assumption is perhaps not
really.

>
> (My desire to avoid thinking about bounds on n is also what motivated
> me to use a linked list instead of an array in the first place.)
>
> Note that your patch changes the ABI of struct psref!

Let's bump the kernel version!

>
> I wonder whether the open-coded version would do better if it
> unconditionally loaded the percpu:
>
> pcpu = percpu_getref(class->prc_percpu);
> KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == psref,
> "psref %p prevp %p points at %p",
> psref, psref->psref_prevp, *psref->psref_prevp);
> KASSERTMSG(psref->psref_prevp != NULL || pcpu->pcpu_first == psref,
> "psref %p marked as first but psref_cpu %p on %d first is %p",
> psref, pcpu, cpu_index(curcpu()), pcpu->pcpu_first);
> *(psref->psref_prevp ? psref->psref_prevp : >pcpu_first) =
> psref->psref_next;
> percpu_putref(class->prc_percpu);
>
> With DIAGNOSTIC disabled, I get a conditional move instruction instead
> of branches this way:
>
>  4f9:   e8 00 00 00 00  callq  4fe 
> 4fa: R_X86_64_PC32 
> percpu_getref+0xfffc
>  4fe:   48 8b 53 08 mov0x8(%rbx),%rdx
>  502:   48 85 d2test   %rdx,%rdx
>  505:   48 0f 44 d0 cmove  %rax,%rdx
>  509:   48 8b 03mov(%rbx),%rax
>  50c:   48 89 02mov%rax,(%rdx)
>  50f:   49 8b 7c 24 20  mov0x20(%r12),%rdi
>  514:   e8 00 00 00 00  callq  519 
> 515: R_X86_64_PC32 
> percpu_putref+0xfffc
>
> Also, my original patch was missing a percpu_putref.  I hope you put
> it back before you ran your test!

I'll test with the patch in the other mail later.

Thanks,
  ozaki-r


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-05 Thread Taylor R Campbell
> Date: Fri, 6 Oct 2017 02:58:20 +
> From: Taylor R Campbell 
> 
> I wonder whether the open-coded version would do better if it
> unconditionally loaded the percpu:

For concreteness, in case you'd like to test this, I've attached an
updated patch that yields the conditional move sequence I quoted.
Index: sys/sys/psref.h
===
RCS file: /cvsroot/src/sys/sys/psref.h,v
retrieving revision 1.2
diff -p -u -r1.2 psref.h
--- sys/sys/psref.h 16 Dec 2016 20:12:11 -  1.2
+++ sys/sys/psref.h 6 Oct 2017 03:08:58 -
@@ -69,7 +69,8 @@ struct psref_target {
  * written only on the local CPU.
  */
 struct psref {
-   LIST_ENTRY(psref)   psref_entry;
+   struct psref*psref_next;
+   struct psref**psref_prevp;
const struct psref_target   *psref_target;
struct lwp  *psref_lwp;
struct cpu_info *psref_cpu;
Index: sys/kern/subr_psref.c
===
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.7
diff -p -u -r1.7 subr_psref.c
--- sys/kern/subr_psref.c   1 Jun 2017 02:45:13 -   1.7
+++ sys/kern/subr_psref.c   6 Oct 2017 03:08:58 -
@@ -78,8 +78,6 @@ __KERNEL_RCSID(0, "$NetBSD: subr_psref.c
 #include 
 #include 
 
-LIST_HEAD(psref_head, psref);
-
 static bool_psref_held(const struct psref_target *, struct psref_class *,
bool);
 
@@ -103,7 +101,7 @@ struct psref_class {
  * Not exposed by the API.
  */
 struct psref_cpu {
-   struct psref_head   pcpu_head;
+   struct psref*pcpu_first;
 };
 
 /*
@@ -135,7 +133,7 @@ psref_cpu_drained_p(void *p, void *cooki
const struct psref_cpu *pcpu = p;
bool *retp = cookie;
 
-   if (!LIST_EMPTY(>pcpu_head))
+   if (pcpu->pcpu_first != NULL)
*retp = false;
 }
 
@@ -194,7 +192,9 @@ psref_check_duplication(struct psref_cpu
bool found = false;
struct psref *_psref;
 
-   LIST_FOREACH(_psref, >pcpu_head, psref_entry) {
+   for (_psref = pcpu->pcpu_first;
+_psref != NULL;
+_psref = _psref->psref_next) {
if (_psref == psref &&
_psref->psref_target == target) {
found = true;
@@ -250,7 +250,11 @@ psref_acquire(struct psref *psref, const
 #endif
 
/* Record our reference.  */
-   LIST_INSERT_HEAD(>pcpu_head, psref, psref_entry);
+   if ((psref->psref_next = pcpu->pcpu_first) != NULL)
+   psref->psref_next->psref_prevp = >psref_next;
+   psref->psref_prevp = NULL;  /* head of list */
+   pcpu->pcpu_first = psref;
+
psref->psref_target = target;
psref->psref_lwp = curlwp;
psref->psref_cpu = curcpu();
@@ -273,6 +277,7 @@ void
 psref_release(struct psref *psref, const struct psref_target *target,
 struct psref_class *class)
 {
+   struct psref_cpu *pcpu;
int s;
 
KASSERTMSG((kpreempt_disabled() || cpu_softintr_p() ||
@@ -297,12 +302,31 @@ psref_release(struct psref *psref, const
 
/*
 * Block interrupts and remove the psref from the current CPU's
-* list.  No need to percpu_getref or get the head of the list,
-* and the caller guarantees that we are bound to a CPU anyway
-* (as does blocking interrupts).
+* list.
 */
s = splraiseipl(class->prc_iplcookie);
-   LIST_REMOVE(psref, psref_entry);
+   if (psref->psref_next != NULL)
+   psref->psref_next->psref_prevp = psref->psref_prevp;
+
+   /*
+* If it's at the head of the list, we must modify the head
+* explicitly.  We can't use a back-pointer into the head
+* because percpu_alloc may move the per-CPU objects around in
+* memory.
+*/
+   pcpu = percpu_getref(class->prc_percpu);
+   KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == psref,
+   "psref %p prevp %p points at %p",
+   psref, psref->psref_prevp, *psref->psref_prevp);
+   KASSERTMSG(psref->psref_prevp != NULL || pcpu->pcpu_first == psref,
+   "psref %p marked as first but psref_cpu %p on %d first is %p",
+   psref, pcpu, cpu_index(curcpu()), pcpu->pcpu_first);
+   *(psref->psref_prevp ? psref->psref_prevp : >pcpu_first) =
+   psref->psref_next;
+   percpu_putref(class->prc_percpu);
+
+   psref->psref_prevp = (void *)1; /* poison */
+   psref->psref_next = (void *)2;  /* poison */
splx(s);
 
/* If someone is waiting for users to drain, notify 'em.  */
@@ -353,7 +377,11 @@ psref_copy(struct psref *pto, const stru
pcpu = percpu_getref(class->prc_percpu);
 
/* Record the new reference.  */
-   LIST_INSERT_HEAD(>pcpu_head, pto, 

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-05 Thread Taylor R Campbell
> Date: Fri, 6 Oct 2017 11:26:40 +0900
> From: Ryota Ozaki 
> 
> On Mon, Oct 2, 2017 at 11:13 PM, Taylor R Campbell
>  wrote:
> > Quick summary of the problem:
> >
> > Possible solutions.  I'm leaning toward (6), to open-code the linked
> > list operations for this special purpose, with compile-tested patch
> > attached.  This changes the text of psref.h, but shouldn't change the
> > ABI.  Comments?
> 
> How about using SLIST instead of open-coding? The instructions of them
> are very similar, but the SLIST version is slightly simpler.

I avoided that because n psref_release operations takes expected and
worst-case O(n^2) time and there's no constant bound on the latency of
a single psref_release operation.  But maybe n is always small enough
that it doesn't matter -- perhaps enough that the concrete cost of
maintaining a doubly-linked list is higher.

(My desire to avoid thinking about bounds on n is also what motivated
me to use a linked list instead of an array in the first place.)

Note that your patch changes the ABI of struct psref!

I wonder whether the open-coded version would do better if it
unconditionally loaded the percpu:

pcpu = percpu_getref(class->prc_percpu);
KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == psref,
"psref %p prevp %p points at %p",
psref, psref->psref_prevp, *psref->psref_prevp);
KASSERTMSG(psref->psref_prevp != NULL || pcpu->pcpu_first == psref,
"psref %p marked as first but psref_cpu %p on %d first is %p",
psref, pcpu, cpu_index(curcpu()), pcpu->pcpu_first);
*(psref->psref_prevp ? psref->psref_prevp : >pcpu_first) =
psref->psref_next;
percpu_putref(class->prc_percpu);

With DIAGNOSTIC disabled, I get a conditional move instruction instead
of branches this way:

 4f9:   e8 00 00 00 00  callq  4fe 
4fa: R_X86_64_PC32 percpu_getref+0xfffc
 4fe:   48 8b 53 08 mov0x8(%rbx),%rdx
 502:   48 85 d2test   %rdx,%rdx
 505:   48 0f 44 d0 cmove  %rax,%rdx
 509:   48 8b 03mov(%rbx),%rax
 50c:   48 89 02mov%rax,(%rdx)
 50f:   49 8b 7c 24 20  mov0x20(%r12),%rdi
 514:   e8 00 00 00 00  callq  519 
515: R_X86_64_PC32 percpu_putref+0xfffc

Also, my original patch was missing a percpu_putref.  I hope you put
it back before you ran your test!


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-05 Thread Ryota Ozaki
On Mon, Oct 2, 2017 at 11:13 PM, Taylor R Campbell
 wrote:
> Quick summary of the problem:
>
>percpu_alloc(9) may move existing per-CPU objects to different
>locations in memory.  This means you can't reliably store pointers
>to the per-CPU objects anywhere.
>
>psref(9) currently uses queue.h LIST, with a per-CPU list head --
>but LIST stores back-pointers into the list head, which breaks when
>percpu_alloc(9) moves the list head.
>
> Possible solutions.  I'm leaning toward (6), to open-code the linked
> list operations for this special purpose, with compile-tested patch
> attached.  This changes the text of psref.h, but shouldn't change the
> ABI.  Comments?

How about using SLIST instead of open-coding? The instructions of them
are very similar, but the SLIST version is slightly simpler.

This is a patch: http://www.netbsd.org/~ozaki-r/psref-SLIST.diff

One worry of the implementation is that percpu_{getref,putref} need to
be always called in psref_release while the open-coded one doesn't need
it. However the result of performance evaluations shows that the overhead
is not so big and moreover SLIST version overtakes open-coded one slightly.

These are the results of IP forwarding performance evaluations
with 64 byte frames:
  - original:  144.7 Mbps
  - open-code: 138.5 Mbps
  - SLIST: 140.1 Mbps

  ozaki-r

>
>
> 1. Make psref(9) store a pointer to a separately kmem_alloc-allocated
>list head.  (This is what Nakahara-san considered but rejected.)
>
> struct psref_cpu {
> -   struct psref_head   pcpu_head;
> +   struct psref_head   *pcpu_head;
> };
>
>   Problem: We need to initialize these before psref_acquire/release.
>We could initialize it in psref_class_create, but for various
>applications of psref, we may not know how many CPUs there are at
>that time.
>
>To fix this properly we need to teach percpu(9) how to initialize
>and finalize per-CPU objects as CPUs come online, which as a bonus
>would make it work with dynamic CPU attachment and detachment.  And
>that's a lot of work.
>
> 2. Make percpu(9) always store pointers into separately kmem_alloc-
>allocated memory, like userland TLS and pthread keys do.
>
>   Problem: It's a bit of work to basically rewrite subr_percpu.c, and
>may have performance implications.  I started last night, spent an
>hour rewriting it, and didn't finish before bedtime came.
>
> 3. Invent a new variant of LIST called PERCPU_LIST that doesn't
>require back-pointers into the head.  (This is what Nakahara-san
>suggested before I vanished into a EuroBSDcon vortex.)
>
>   Problem: This is almost the same as LIST -- supports essentially all
>the same operations with the same performance characteristics --
>and has very limited utility.  So unless we find other applications
>that need exactly this too, I'm reluctant to expose it as a
>general-purpose kernel API right now.
>
>It can't be a drop-in replacement for LIST, because we have
>LIST_REMOVE(obj, field) but we need MHLIST_REMOVE(head, obj, field)
>in order to allow the head to move.
>
>And we can't just add LIST_REMOVE_MAYBEHEAD(head, obj, field) or
>something to LIST, because there's no way to distinguish with
>standard LIST whether an object is at the head of the list or not,
>without knowing where the head *was* in memory when the object (or
>one of its predecessors) was inserted.
>
> 4. Teach the percpu(9) API to move objects with an operation other
>than memcpy.  This would look like
>
> percpu_t *percpu_alloc(size_t);
> +   percpu_t *percpu_alloc_movable(size_t,
> +   void *(*)(void *, const void *, size_t));
>
>with percpu_alloc(sz) := percpu_alloc_movable(sz, ).  Then
>for psref(9), we could use LIST_MOVE to migrate the old list head
>to the new list head and fix the one back-pointer that would have
>been invalidated.
>
>   Problem: This is again an operation of limited utility, since most
>per-CPU objects (like counters) don't need special operations to
>move in memory.  And it would compete with extending the API for
>dynamic CPU initialization/finalization operations, so I'm not sure
>I want to preemptively step on the toes of that API design space at
>the moment.
>
> 5. Violate the abstraction of LIST in psref(9), as I proposed as a
>stop-gap measure in
>.
>
>   Problem: It's gross, makes maintenance of the LIST implementation
>more difficult, makes understanding the psref(9) code more
>difficult, breaks QUEUEDEBUG, 
>
> 6. Just open-code it in subr_psref.c.  If we ever find more
>applications of this idiom, we can always upgrade it without
>trouble to an explicit PERCPU_LIST abstraction.
>
>   Problem: We don't get to share any common LIST code.  This 

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-10-02 Thread Taylor R Campbell
Quick summary of the problem:

   percpu_alloc(9) may move existing per-CPU objects to different
   locations in memory.  This means you can't reliably store pointers
   to the per-CPU objects anywhere.

   psref(9) currently uses queue.h LIST, with a per-CPU list head --
   but LIST stores back-pointers into the list head, which breaks when
   percpu_alloc(9) moves the list head.

Possible solutions.  I'm leaning toward (6), to open-code the linked
list operations for this special purpose, with compile-tested patch
attached.  This changes the text of psref.h, but shouldn't change the
ABI.  Comments?


1. Make psref(9) store a pointer to a separately kmem_alloc-allocated
   list head.  (This is what Nakahara-san considered but rejected.)

struct psref_cpu {
-   struct psref_head   pcpu_head;
+   struct psref_head   *pcpu_head;
};

  Problem: We need to initialize these before psref_acquire/release.
   We could initialize it in psref_class_create, but for various
   applications of psref, we may not know how many CPUs there are at
   that time.

   To fix this properly we need to teach percpu(9) how to initialize
   and finalize per-CPU objects as CPUs come online, which as a bonus
   would make it work with dynamic CPU attachment and detachment.  And
   that's a lot of work.

2. Make percpu(9) always store pointers into separately kmem_alloc-
   allocated memory, like userland TLS and pthread keys do.

  Problem: It's a bit of work to basically rewrite subr_percpu.c, and
   may have performance implications.  I started last night, spent an
   hour rewriting it, and didn't finish before bedtime came.

3. Invent a new variant of LIST called PERCPU_LIST that doesn't
   require back-pointers into the head.  (This is what Nakahara-san
   suggested before I vanished into a EuroBSDcon vortex.)

  Problem: This is almost the same as LIST -- supports essentially all
   the same operations with the same performance characteristics --
   and has very limited utility.  So unless we find other applications
   that need exactly this too, I'm reluctant to expose it as a
   general-purpose kernel API right now.

   It can't be a drop-in replacement for LIST, because we have
   LIST_REMOVE(obj, field) but we need MHLIST_REMOVE(head, obj, field)
   in order to allow the head to move.

   And we can't just add LIST_REMOVE_MAYBEHEAD(head, obj, field) or
   something to LIST, because there's no way to distinguish with
   standard LIST whether an object is at the head of the list or not,
   without knowing where the head *was* in memory when the object (or
   one of its predecessors) was inserted.

4. Teach the percpu(9) API to move objects with an operation other
   than memcpy.  This would look like

percpu_t *percpu_alloc(size_t);
+   percpu_t *percpu_alloc_movable(size_t,
+   void *(*)(void *, const void *, size_t));

   with percpu_alloc(sz) := percpu_alloc_movable(sz, ).  Then
   for psref(9), we could use LIST_MOVE to migrate the old list head
   to the new list head and fix the one back-pointer that would have
   been invalidated.

  Problem: This is again an operation of limited utility, since most
   per-CPU objects (like counters) don't need special operations to
   move in memory.  And it would compete with extending the API for
   dynamic CPU initialization/finalization operations, so I'm not sure
   I want to preemptively step on the toes of that API design space at
   the moment.

5. Violate the abstraction of LIST in psref(9), as I proposed as a
   stop-gap measure in
   .

  Problem: It's gross, makes maintenance of the LIST implementation
   more difficult, makes understanding the psref(9) code more
   difficult, breaks QUEUEDEBUG, 

6. Just open-code it in subr_psref.c.  If we ever find more
   applications of this idiom, we can always upgrade it without
   trouble to an explicit PERCPU_LIST abstraction.

  Problem: We don't get to share any common LIST code.  This seems
   like a small price for a single-use gizmo like this.
Index: sys/sys/psref.h
===
RCS file: /cvsroot/src/sys/sys/psref.h,v
retrieving revision 1.2
diff -p -u -r1.2 psref.h
--- sys/sys/psref.h 16 Dec 2016 20:12:11 -  1.2
+++ sys/sys/psref.h 2 Oct 2017 13:47:30 -
@@ -69,7 +69,8 @@ struct psref_target {
  * written only on the local CPU.
  */
 struct psref {
-   LIST_ENTRY(psref)   psref_entry;
+   struct psref*psref_next;
+   struct psref**psref_prevp;
const struct psref_target   *psref_target;
struct lwp  *psref_lwp;
struct cpu_info *psref_cpu;
Index: sys/kern/subr_psref.c
===
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving 

Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-09-19 Thread Ryota Ozaki
On Tue, Sep 19, 2017 at 1:38 PM, Kengo NAKAHARA  wrote:
> Hi,
>
> To fix PR kern/52515(*), in particular psref(9), I implement PERCPU_LIST which
> is dedicated for percpu_alloc'ed percpu struct. Here is the patch which 
> consists
> of PERCULI_LIST implementation and applying to subr_psref.c.
> http://www.NetBSD.org/~knakahara/percpu-list/percpu-list.patch
>
> (*) http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=52515
>
> The details are following.
>
> As written in PR/52515, the root of the problem is back-pointer to
> percpu_alloc'ed struct. At first, I tried generic solution to let
> percpu_alloc'ed struct use a pointer instead of itself. Here is a part of the
> PoC patch.
> == bad patch ==
> @@ -103,9 +103,18 @@ struct psref_class {
>   * Not exposed by the API.
>   */
>  struct psref_cpu {
> -   struct psref_head   pcpu_head;
> +   struct psref_head   *pcpu_head;
>  };
>
> +static void
> +psref_class_pcpu_init_p(void *p, void *cookie, struct cpu_info *ci)
> +{
> +   struct psref_cpu *pcpu = p;
> +
> +   pcpu->pcpu_head = kmem_alloc(sizeof(*(pcpu->pcpu_head)), KM_SLEEP);
> +   LIST_INIT(pcpu->pcpu_head);
> +}
> +
>  /*
>   * psref_class_create(name, ipl)
>   *
> @@ -121,6 +130,7 @@ psref_class_create(const char *name, int ipl)
>
> class = kmem_alloc(sizeof(*class), KM_SLEEP);
> class->prc_percpu = percpu_alloc(sizeof(struct psref_cpu));
> +   percpu_foreach(class->prc_percpu, _class_pcpu_init_p, NULL);
> mutex_init(>prc_lock, MUTEX_DEFAULT, ipl);
> cv_init(>prc_cv, name);
> class->prc_iplcookie = makeiplcookie(ipl);
> == bad patch ==
>
> Howerver, it causes initialization dependency problems to apply this
> modification to psref(9). That result from that percpu_foreach() must be 
> called
> after ncpu is fixed, that is,
> (A) psref_class_create() for "ifnet_psref_class" must be called after ncpu
> is fixed
> - as psref_class_create() use percpu_foreach() in this 
> modification
> (B) "ifnet_psref_class" must be initialized before any (real and pseudo)
> network interfaces are attached
>
> To satisfy both (A) and (B), it is required the MI hook called immediately
> after ncpu is fixed, yes, that hook is not implemented yet. Furthermore,
> the modification is too large even if that hook is already implemented.
> So, I gave up to use this generic solution.
>
> After that, I decide to use list specific solution to use PERCPU_LIST
> implemented for percpu_alloc'ed struct. The key design is an omission of the
> back-pointer to list head in the first list element, that is, when it wants
> to access list head, it always call percpu_getref() and then use the returned
> pointer.
> # There are some PR kern/52515 problems which cannot be fix by percpu list,
> # such as ipforward_rt_percpu. They can be fixed by using a pointer instead
> # of itself because they don't have complicated initialization dependency.

For rtcaches, I recommend that we stop manipulating rtcaches by global lists
(dom->dom_rtcache) for cache invalidation and instead use a global generation
counter for cache invalidation (inspired by IPsec SP cache). By doing so we
can remove LIST_ENTRY from struct route and avoid the issue.

This is a patch:
  http://www.netbsd.org/~ozaki-r/rtcache-gen.diff

Thanks,
  ozaki-r


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-09-19 Thread Kengo NAKAHARA
Hi,

Thank you for your quick response.

On 2017/09/19 14:36, Taylor R Campbell wrote:
>> Date: Tue, 19 Sep 2017 13:38:15 +0900
>> From: Kengo NAKAHARA 
>>
>> To fix PR kern/52515(*), in particular psref(9), I implement
>> PERCPU_LIST which is dedicated for percpu_alloc'ed percpu struct.
>> Here is the patch which consists of PERCULI_LIST implementation and
>> applying to subr_psref.c.
>> http://www.NetBSD.org/~knakahara/percpu-list/percpu-list.patch
>>
>> (*) http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=52515
> 
> Derp.  That's an embarrassing bug!  Especially since I knew that about
> percpu(9) and didn't connect the dots...
> 
> First, what you propose is a reasonable strategy for working around
> the problem, but I'm reluctant to commit to a new style of list
> abstraction for just this purpose.  Can you confine it to subr_psref.c
> for now, and retain only the operations that are absolutely necessary?
> E.g., no need for insert_before/after -- just insert_head, remove,
> foreach, empty.

Hmm, you are right. I thought percpu list can be use by other new
components in the future, however only psref(9) must have the
initialization dependency problem. Therefore, other components should
use pointer instead of itself in percpu(9) data like percpu_urandom_cprng
in sys/dev/rndpseudo.c.

> I've attached a small patch that might serve as a quicker stop-gap
> (untested).  It's gross, but it doesn't change the ABI, add any header
> files, 

I tested your patch(with tiny typo fix s/perpcu_getref/percpu_getref/),
it works fine in my workload, thanks!

> Second, we should maybe have a more robust way to handle per-CPU data
> that require nontrivial initialization or finalization so that it's
> not necessary to do this in the first place.  Maybe something like
> 
> struct percpu *percpu_create(size_t nbytes,
> void (*init)(void *, struct cpu_info *),
> void (*fini)(void *, struct cpu_info *));
> 
> with the callbacks invoked by percpu_cpu_enlarge.  Needs details
> filled out like can it fail to initialize or finalize and thereby
> block CPU online/offline?

I agree. I think it must be hard work...


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA 


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-09-19 Thread Taylor R Campbell
> Date: Tue, 19 Sep 2017 05:44:27 +
> From: Taylor R Campbell 
> 
> > Date: Tue, 19 Sep 2017 05:36:40 +
> > From: Taylor R Campbell 
> > 
> > I've attached a small patch that might serve as a quicker stop-gap
> > (untested).  It's gross, but it doesn't change the ABI, add any header
> > files, 
> 
> I made sure to forget to attach it so I could keep my klutz
> credentials, but here's the standard followup with the actual
> attachment.

To increase my chances of passing the Klutz^2 certification test, I
mistyped the MIME type of the attachment as `application/pdf'.  Here
it is as plain text.  (Unless I'm going for a summa klutz laude
degree.)
diff --git a/sys/kern/subr_psref.c b/sys/kern/subr_psref.c
index c3f76ab0e743..bd393def5a4c 100644
--- a/sys/kern/subr_psref.c
+++ b/sys/kern/subr_psref.c
@@ -251,6 +251,7 @@ psref_acquire(struct psref *psref, const struct 
psref_target *target,
 
/* Record our reference.  */
LIST_INSERT_HEAD(>pcpu_head, psref, psref_entry);
+   psref->psref_entry.le_prev = NULL; /* XXX see psref_release */
psref->psref_target = target;
psref->psref_lwp = curlwp;
psref->psref_cpu = curcpu();
@@ -297,12 +298,33 @@ psref_release(struct psref *psref, const struct 
psref_target *target,
 
/*
 * Block interrupts and remove the psref from the current CPU's
-* list.  No need to percpu_getref or get the head of the list,
-* and the caller guarantees that we are bound to a CPU anyway
-* (as does blocking interrupts).
+* list.  No need to percpu_getref or get the head of the list
+* unless we turn out to be removing the head of the list, and
+* the caller guarantees that we are bound to a CPU anyway (as
+* does blocking interrupts).
 */
s = splraiseipl(class->prc_iplcookie);
-   LIST_REMOVE(psref, psref_entry);
+   /* XXX begin abstraction violation */
+   /*
+* XXX We cannot reliably use >pcpu_head.lh_first,
+* because percpu(9) may move it around when allocating other
+* per-CPU storage.  But LIST_REMOVE relies on *le_prev
+* working.  So work around it by getting the prev-pointer from
+* the percpu if we're at the head of the list.
+*/
+   if (psref->psref_entry.le_prev != NULL) {
+   LIST_REMOVE(psref, psref_entry);
+   } else {
+   struct psref_cpu *pcpu;
+
+   if (LIST_NEXT(psref, psref_entry) != NULL)
+   LIST_NEXT(psref, psref_entry)->psref_entry.le_prev =
+   psref->psref_entry.le_prev;
+   pcpu = perpcu_getref(class->prc_percpu);
+   pcpu->pcpu_head.lh_first = LIST_NEXT(psref, psref_entry);
+   percpu_putref(class->prc_percpu);
+   }
+   /* XXX end abstraction violation */
splx(s);
 
/* If someone is waiting for users to drain, notify 'em.  */
@@ -354,6 +376,7 @@ psref_copy(struct psref *pto, const struct psref *pfrom,
 
/* Record the new reference.  */
LIST_INSERT_HEAD(>pcpu_head, pto, psref_entry);
+   pto->psref_entry.le_prev = NULL; /* XXX see psref_release */
pto->psref_target = pfrom->psref_target;
pto->psref_lwp = curlwp;
pto->psref_cpu = curcpu();


Re: RFC: PERCPU_LIST to fix PR kern/52515

2017-09-18 Thread Taylor R Campbell
> Date: Tue, 19 Sep 2017 05:36:40 +
> From: Taylor R Campbell 
> 
> I've attached a small patch that might serve as a quicker stop-gap
> (untested).  It's gross, but it doesn't change the ABI, add any header
> files, 

I made sure to forget to attach it so I could keep my klutz
credentials, but here's the standard followup with the actual
attachment.


psref.patch
Description: Adobe PDF document


RFC: PERCPU_LIST to fix PR kern/52515

2017-09-18 Thread Kengo NAKAHARA
Hi,

To fix PR kern/52515(*), in particular psref(9), I implement PERCPU_LIST which
is dedicated for percpu_alloc'ed percpu struct. Here is the patch which consists
of PERCULI_LIST implementation and applying to subr_psref.c.
http://www.NetBSD.org/~knakahara/percpu-list/percpu-list.patch

(*) http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=52515

The details are following.

As written in PR/52515, the root of the problem is back-pointer to
percpu_alloc'ed struct. At first, I tried generic solution to let
percpu_alloc'ed struct use a pointer instead of itself. Here is a part of the
PoC patch.
== bad patch ==
@@ -103,9 +103,18 @@ struct psref_class {
  * Not exposed by the API.
  */
 struct psref_cpu {
-   struct psref_head   pcpu_head;
+   struct psref_head   *pcpu_head;
 };
 
+static void
+psref_class_pcpu_init_p(void *p, void *cookie, struct cpu_info *ci)
+{
+   struct psref_cpu *pcpu = p;
+
+   pcpu->pcpu_head = kmem_alloc(sizeof(*(pcpu->pcpu_head)), KM_SLEEP);
+   LIST_INIT(pcpu->pcpu_head);
+}
+
 /*
  * psref_class_create(name, ipl)
  *
@@ -121,6 +130,7 @@ psref_class_create(const char *name, int ipl)
 
class = kmem_alloc(sizeof(*class), KM_SLEEP);
class->prc_percpu = percpu_alloc(sizeof(struct psref_cpu));
+   percpu_foreach(class->prc_percpu, _class_pcpu_init_p, NULL);
mutex_init(>prc_lock, MUTEX_DEFAULT, ipl);
cv_init(>prc_cv, name);
class->prc_iplcookie = makeiplcookie(ipl);
== bad patch ==

Howerver, it causes initialization dependency problems to apply this
modification to psref(9). That result from that percpu_foreach() must be called
after ncpu is fixed, that is,
(A) psref_class_create() for "ifnet_psref_class" must be called after ncpu
is fixed
- as psref_class_create() use percpu_foreach() in this modification
(B) "ifnet_psref_class" must be initialized before any (real and pseudo)
network interfaces are attached

To satisfy both (A) and (B), it is required the MI hook called immediately
after ncpu is fixed, yes, that hook is not implemented yet. Furthermore,
the modification is too large even if that hook is already implemented.
So, I gave up to use this generic solution.

After that, I decide to use list specific solution to use PERCPU_LIST
implemented for percpu_alloc'ed struct. The key design is an omission of the
back-pointer to list head in the first list element, that is, when it wants
to access list head, it always call percpu_getref() and then use the returned
pointer.
# There are some PR kern/52515 problems which cannot be fix by percpu list,
# such as ipforward_rt_percpu. They can be fixed by using a pointer instead
# of itself because they don't have complicated initialization dependency.


Could you comment the above patch or idea?


Thanks,

-- 
//
Internet Initiative Japan Inc.

Device Engineering Section,
IoT Platform Development Department,
Network Division,
Technology Unit

Kengo NAKAHARA