Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-06 Thread Kurt Lidl

On 2/5/17 1:59 PM, Jason Harmening wrote:

Actually attaching the patch this time ( gmail client)

On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening
> wrote:

Hmm, it's a good idea to consider the possibility of a barrier
issue.  It wouldn't be the first time we've had such a problem on a
weakly-ordered architecture. That said, I don't see a problem in
this case.  smp_rendezvous_cpus() takes a spinlock and then issues
atomic_store_rel_int()  to ensure the rendezvous params are visible
to other cpus.  The latter corresponds to lwsync on powerpc, which
AFAIK should be sufficient to ensure visibility of prior stores.

For now I'm going with the simpler explanation that I made a bad
assumption  in the powerpc get_pcpu() and there is some context in
which the read of sprg0 doesn't return a consistent pointer value.
Unfortunately I don't see where that might be right now.

On the mips side, Kurt/Alexander can you test the attached patch?
It contains a simple fix to ensure get_pcpu() returns the consistent
per-cpu pointer.


I applied this patch on top of r313347 (which I had verified that a
kernel built from that revisions to boot from successfully).
The kernel from r313347+(this patch) least gets to multi-user on my ERL.

So, that's a big improvement.

I'll start a native buildworld/buildkernel on the ERL, and that ought
to give it a reasonable workout.

-Kurt



On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus > wrote:

Probably not related. But when I took short look to the patch to see
what could go wrong, I walked into the following comment in
_rm_wlock(): "Assumes rm->rm_writecpus update is visible on
other CPUs
before rm_cleanIPI is called." There is no explicit barrier to
ensure
it. However, there might be some barriers inside of
smp_rendezvous_cpus(). I have no idea what could happened if this
assumption is not met. Note that rm_cleanIPI() is affected by the
patch.



On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
>
wrote:
> Can you post an example of such panic?  Only 2 MI pieces were
changed,
> netisr and rmlock.  I haven't seen problems on my own
amd64/i386/arm testing
> of this, so a backtrace might help to narrow down the cause.
>
> On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler
>
> wrote:
>>
>> On 04.02.17 20:54, Jason Harmening wrote:
>>>
>>> I suspect this broke rmlocks for mips because the rmlock
implementation
>>> takes the address of the per-CPU pc_rm_queue when building
tracker
>>> lists.  That address may be later accessed from another CPU
and will
>>> then translate to the wrong physical region if the address
was taken
>>> relative to the globally-constant pcpup VA used on mips.
>>>
>>> Regardless, for mips get_pcpup() should be implemented as
>>> pcpu_find(curcpu) since returning an address that may mean
something
>>> different depending on the CPU seems like a big POLA
violation if
>>> nothing else.
>>>
>>> I'm more concerned about the report of powerpc breakage.
For powerpc we
>>> simply take each pcpu pointer from the pc_allcpu list (which
is the same
>>> value stored in the cpuid_to_pcpu array) and pass it through
the ap_pcpu
>>> global to each AP's startup code, which then stores it in
sprg0.  It
>>> should be globally unique and won't have the
variable-translation issues
>>> seen on mips.   Andreas, are you certain this change was
responsible the
>>> breakage you saw, and was it the same sort of hang observed
on mips?
>>
>>
>> I'm really sure. 313036 booted fine, allowed me to execute heavy
>> compilation jobs, np. 313037 on the other side gave me
various patterns of
>> panics. During startup, but I also succeeded to get into
multiuser and then
>> the panic happend during port building.
>>
>> I have no deeper inside where pcpu data is used. Justin
mentioned netisr?
>>
>> Andreas
>>
>





___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-05 Thread Andreas Tobler

On 05.02.17 19:59, Jason Harmening wrote:

Actually attaching the patch this time ( gmail client)

On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening
> wrote:

Hmm, it's a good idea to consider the possibility of a barrier
issue.  It wouldn't be the first time we've had such a problem on a
weakly-ordered architecture. That said, I don't see a problem in
this case.  smp_rendezvous_cpus() takes a spinlock and then issues
atomic_store_rel_int()  to ensure the rendezvous params are visible
to other cpus.  The latter corresponds to lwsync on powerpc, which
AFAIK should be sufficient to ensure visibility of prior stores.

For now I'm going with the simpler explanation that I made a bad
assumption  in the powerpc get_pcpu() and there is some context in
which the read of sprg0 doesn't return a consistent pointer value.
Unfortunately I don't see where that might be right now.

On the mips side, Kurt/Alexander can you test the attached patch?
It contains a simple fix to ensure get_pcpu() returns the consistent
per-cpu pointer.



Here the panic I received with the latest patch you sent. World & kernel 
are on 313286 + patch.


https://people.freebsd.org/~andreast/pcpu/

It is the same panic, pic 2 is with a try to get a core 

The load issue was a gmake -j8 bootstrap of todays gcc trunk sources. 
The machine has 2 physical CPU's, two threads pre cpu :)


I revert now and see if the situation becomes stable again or if there 
is something else.


Andreas

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-05 Thread Jason Harmening
Actually attaching the patch this time ( gmail client)

On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening 
wrote:

> Hmm, it's a good idea to consider the possibility of a barrier issue.  It
> wouldn't be the first time we've had such a problem on a weakly-ordered
> architecture. That said, I don't see a problem in this case.
>  smp_rendezvous_cpus() takes a spinlock and then issues
> atomic_store_rel_int()  to ensure the rendezvous params are visible to
> other cpus.  The latter corresponds to lwsync on powerpc, which AFAIK
> should be sufficient to ensure visibility of prior stores.
>
> For now I'm going with the simpler explanation that I made a bad
> assumption  in the powerpc get_pcpu() and there is some context in which
> the read of sprg0 doesn't return a consistent pointer value.  Unfortunately
> I don't see where that might be right now.
>
> On the mips side, Kurt/Alexander can you test the attached patch?  It
> contains a simple fix to ensure get_pcpu() returns the consistent per-cpu
> pointer.
>
> On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus  wrote:
>
>> Probably not related. But when I took short look to the patch to see
>> what could go wrong, I walked into the following comment in
>> _rm_wlock(): "Assumes rm->rm_writecpus update is visible on other CPUs
>> before rm_cleanIPI is called." There is no explicit barrier to ensure
>> it. However, there might be some barriers inside of
>> smp_rendezvous_cpus(). I have no idea what could happened if this
>> assumption is not met. Note that rm_cleanIPI() is affected by the
>> patch.
>>
>>
>>
>> On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
>>  wrote:
>> > Can you post an example of such panic?  Only 2 MI pieces were changed,
>> > netisr and rmlock.  I haven't seen problems on my own amd64/i386/arm
>> testing
>> > of this, so a backtrace might help to narrow down the cause.
>> >
>> > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler 
>> > wrote:
>> >>
>> >> On 04.02.17 20:54, Jason Harmening wrote:
>> >>>
>> >>> I suspect this broke rmlocks for mips because the rmlock
>> implementation
>> >>> takes the address of the per-CPU pc_rm_queue when building tracker
>> >>> lists.  That address may be later accessed from another CPU and will
>> >>> then translate to the wrong physical region if the address was taken
>> >>> relative to the globally-constant pcpup VA used on mips.
>> >>>
>> >>> Regardless, for mips get_pcpup() should be implemented as
>> >>> pcpu_find(curcpu) since returning an address that may mean something
>> >>> different depending on the CPU seems like a big POLA violation if
>> >>> nothing else.
>> >>>
>> >>> I'm more concerned about the report of powerpc breakage.  For powerpc
>> we
>> >>> simply take each pcpu pointer from the pc_allcpu list (which is the
>> same
>> >>> value stored in the cpuid_to_pcpu array) and pass it through the
>> ap_pcpu
>> >>> global to each AP's startup code, which then stores it in sprg0.  It
>> >>> should be globally unique and won't have the variable-translation
>> issues
>> >>> seen on mips.   Andreas, are you certain this change was responsible
>> the
>> >>> breakage you saw, and was it the same sort of hang observed on mips?
>> >>
>> >>
>> >> I'm really sure. 313036 booted fine, allowed me to execute heavy
>> >> compilation jobs, np. 313037 on the other side gave me various
>> patterns of
>> >> panics. During startup, but I also succeeded to get into multiuser and
>> then
>> >> the panic happend during port building.
>> >>
>> >> I have no deeper inside where pcpu data is used. Justin mentioned
>> netisr?
>> >>
>> >> Andreas
>> >>
>> >
>>
>
>
Index: sys/amd64/include/pcpu.h
===
--- sys/amd64/include/pcpu.h(revision 313193)
+++ sys/amd64/include/pcpu.h(working copy)
@@ -78,6 +78,7 @@
 
 extern struct pcpu *pcpup;
 
+#defineget_pcpu()  (pcpup)
 #definePCPU_GET(member)(pcpup->pc_ ## member)
 #definePCPU_ADD(member, val)   (pcpup->pc_ ## member += (val))
 #definePCPU_INC(member)PCPU_ADD(member, 1)
@@ -203,6 +204,15 @@
}   \
 }
 
+#defineget_pcpu() __extension__ ({ 
\
+   struct pcpu *__pc;  \
+   \
+   __asm __volatile("movq %%gs:%1,%0"  \
+   : "=r" (__pc)   \
+   : "m" (*(struct pcpu *)(__pcpu_offset(pc_prvspace;  \
+   __pc;   \
+})
+
 #definePCPU_GET(member)__PCPU_GET(pc_ ## member)
 #definePCPU_ADD(member, val)   __PCPU_ADD(pc_ ## member, val)
 #define

Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-05 Thread Jason Harmening
Hmm, it's a good idea to consider the possibility of a barrier issue.  It
wouldn't be the first time we've had such a problem on a weakly-ordered
architecture. That said, I don't see a problem in this case.
 smp_rendezvous_cpus() takes a spinlock and then issues
atomic_store_rel_int()  to ensure the rendezvous params are visible to
other cpus.  The latter corresponds to lwsync on powerpc, which AFAIK
should be sufficient to ensure visibility of prior stores.

For now I'm going with the simpler explanation that I made a bad assumption
 in the powerpc get_pcpu() and there is some context in which the read of
sprg0 doesn't return a consistent pointer value.  Unfortunately I don't see
where that might be right now.

On the mips side, Kurt/Alexander can you test the attached patch?  It
contains a simple fix to ensure get_pcpu() returns the consistent per-cpu
pointer.

On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus  wrote:

> Probably not related. But when I took short look to the patch to see
> what could go wrong, I walked into the following comment in
> _rm_wlock(): "Assumes rm->rm_writecpus update is visible on other CPUs
> before rm_cleanIPI is called." There is no explicit barrier to ensure
> it. However, there might be some barriers inside of
> smp_rendezvous_cpus(). I have no idea what could happened if this
> assumption is not met. Note that rm_cleanIPI() is affected by the
> patch.
>
>
>
> On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
>  wrote:
> > Can you post an example of such panic?  Only 2 MI pieces were changed,
> > netisr and rmlock.  I haven't seen problems on my own amd64/i386/arm
> testing
> > of this, so a backtrace might help to narrow down the cause.
> >
> > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler 
> > wrote:
> >>
> >> On 04.02.17 20:54, Jason Harmening wrote:
> >>>
> >>> I suspect this broke rmlocks for mips because the rmlock implementation
> >>> takes the address of the per-CPU pc_rm_queue when building tracker
> >>> lists.  That address may be later accessed from another CPU and will
> >>> then translate to the wrong physical region if the address was taken
> >>> relative to the globally-constant pcpup VA used on mips.
> >>>
> >>> Regardless, for mips get_pcpup() should be implemented as
> >>> pcpu_find(curcpu) since returning an address that may mean something
> >>> different depending on the CPU seems like a big POLA violation if
> >>> nothing else.
> >>>
> >>> I'm more concerned about the report of powerpc breakage.  For powerpc
> we
> >>> simply take each pcpu pointer from the pc_allcpu list (which is the
> same
> >>> value stored in the cpuid_to_pcpu array) and pass it through the
> ap_pcpu
> >>> global to each AP's startup code, which then stores it in sprg0.  It
> >>> should be globally unique and won't have the variable-translation
> issues
> >>> seen on mips.   Andreas, are you certain this change was responsible
> the
> >>> breakage you saw, and was it the same sort of hang observed on mips?
> >>
> >>
> >> I'm really sure. 313036 booted fine, allowed me to execute heavy
> >> compilation jobs, np. 313037 on the other side gave me various patterns
> of
> >> panics. During startup, but I also succeeded to get into multiuser and
> then
> >> the panic happend during port building.
> >>
> >> I have no deeper inside where pcpu data is used. Justin mentioned
> netisr?
> >>
> >> Andreas
> >>
> >
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-04 Thread Svatopluk Kraus
Probably not related. But when I took short look to the patch to see
what could go wrong, I walked into the following comment in
_rm_wlock(): "Assumes rm->rm_writecpus update is visible on other CPUs
before rm_cleanIPI is called." There is no explicit barrier to ensure
it. However, there might be some barriers inside of
smp_rendezvous_cpus(). I have no idea what could happened if this
assumption is not met. Note that rm_cleanIPI() is affected by the
patch.



On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
 wrote:
> Can you post an example of such panic?  Only 2 MI pieces were changed,
> netisr and rmlock.  I haven't seen problems on my own amd64/i386/arm testing
> of this, so a backtrace might help to narrow down the cause.
>
> On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler 
> wrote:
>>
>> On 04.02.17 20:54, Jason Harmening wrote:
>>>
>>> I suspect this broke rmlocks for mips because the rmlock implementation
>>> takes the address of the per-CPU pc_rm_queue when building tracker
>>> lists.  That address may be later accessed from another CPU and will
>>> then translate to the wrong physical region if the address was taken
>>> relative to the globally-constant pcpup VA used on mips.
>>>
>>> Regardless, for mips get_pcpup() should be implemented as
>>> pcpu_find(curcpu) since returning an address that may mean something
>>> different depending on the CPU seems like a big POLA violation if
>>> nothing else.
>>>
>>> I'm more concerned about the report of powerpc breakage.  For powerpc we
>>> simply take each pcpu pointer from the pc_allcpu list (which is the same
>>> value stored in the cpuid_to_pcpu array) and pass it through the ap_pcpu
>>> global to each AP's startup code, which then stores it in sprg0.  It
>>> should be globally unique and won't have the variable-translation issues
>>> seen on mips.   Andreas, are you certain this change was responsible the
>>> breakage you saw, and was it the same sort of hang observed on mips?
>>
>>
>> I'm really sure. 313036 booted fine, allowed me to execute heavy
>> compilation jobs, np. 313037 on the other side gave me various patterns of
>> panics. During startup, but I also succeeded to get into multiuser and then
>> the panic happend during port building.
>>
>> I have no deeper inside where pcpu data is used. Justin mentioned netisr?
>>
>> Andreas
>>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-04 Thread Jason Harmening
Can you post an example of such panic?  Only 2 MI pieces were changed,
netisr and rmlock.  I haven't seen problems on my own amd64/i386/arm
testing of this, so a backtrace might help to narrow down the cause.

On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler 
wrote:

> On 04.02.17 20:54, Jason Harmening wrote:
>
>> I suspect this broke rmlocks for mips because the rmlock implementation
>> takes the address of the per-CPU pc_rm_queue when building tracker
>> lists.  That address may be later accessed from another CPU and will
>> then translate to the wrong physical region if the address was taken
>> relative to the globally-constant pcpup VA used on mips.
>>
>> Regardless, for mips get_pcpup() should be implemented as
>> pcpu_find(curcpu) since returning an address that may mean something
>> different depending on the CPU seems like a big POLA violation if
>> nothing else.
>>
>> I'm more concerned about the report of powerpc breakage.  For powerpc we
>> simply take each pcpu pointer from the pc_allcpu list (which is the same
>> value stored in the cpuid_to_pcpu array) and pass it through the ap_pcpu
>> global to each AP's startup code, which then stores it in sprg0.  It
>> should be globally unique and won't have the variable-translation issues
>> seen on mips.   Andreas, are you certain this change was responsible the
>> breakage you saw, and was it the same sort of hang observed on mips?
>>
>
> I'm really sure. 313036 booted fine, allowed me to execute heavy
> compilation jobs, np. 313037 on the other side gave me various patterns of
> panics. During startup, but I also succeeded to get into multiuser and then
> the panic happend during port building.
>
> I have no deeper inside where pcpu data is used. Justin mentioned netisr?
>
> Andreas
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-04 Thread Andreas Tobler

On 04.02.17 20:54, Jason Harmening wrote:

I suspect this broke rmlocks for mips because the rmlock implementation
takes the address of the per-CPU pc_rm_queue when building tracker
lists.  That address may be later accessed from another CPU and will
then translate to the wrong physical region if the address was taken
relative to the globally-constant pcpup VA used on mips.

Regardless, for mips get_pcpup() should be implemented as
pcpu_find(curcpu) since returning an address that may mean something
different depending on the CPU seems like a big POLA violation if
nothing else.

I'm more concerned about the report of powerpc breakage.  For powerpc we
simply take each pcpu pointer from the pc_allcpu list (which is the same
value stored in the cpuid_to_pcpu array) and pass it through the ap_pcpu
global to each AP's startup code, which then stores it in sprg0.  It
should be globally unique and won't have the variable-translation issues
seen on mips.   Andreas, are you certain this change was responsible the
breakage you saw, and was it the same sort of hang observed on mips?


I'm really sure. 313036 booted fine, allowed me to execute heavy 
compilation jobs, np. 313037 on the other side gave me various patterns 
of panics. During startup, but I also succeeded to get into multiuser 
and then the panic happend during port building.


I have no deeper inside where pcpu data is used. Justin mentioned netisr?

Andreas

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-04 Thread Alexander Kabaev
On Fri, 3 Feb 2017 21:29:33 -0800
Jason Harmening  wrote:

> Hi,
> 
> I'm a bit confused as to how this change breaks MIPS.  The new
> function, get_pcpu() is intended to be used only to access the
> per-cpu data pointer locally.  It returns pcpup, which is the per-cpu
> pointer wired into the local TLB to translate to the local CPU's
> physical data region, correct?
> 
> This is the same value used by the per-CPU accessors such as PCPU_ADD
> and PCPU_GET.  The MI portions of this change only use get_pcpu() to
> access the local CPU's data, e.g. under a critical section in the
> rmlock.  It is not intended to be used for iterating all CPUs.
> 
> If I've missed something and MIPS is truly broken by this, then I'll
> gladly revert, but (maybe because it's late) I'm not seeing where
> this goes wrong on MIPS.
> 
> Thanks,
> Jason

The hang is actually due to _rmlock_assert spinning in
rm_trackers_present with interrupts disabled, due to this constructs:

for (queue = pc->pc_rm_queue.rmq_next; queue !=
>pc_rm_queue; queue = queue->rmq_next) {

You changed the code to pass get_pcpu() results instead of direct
pointer to the corresponding pcpu storage and that is NOT the pointer
that was being used in pcpu_init. On architectures that rig TLB to make
local PCPU available at constant address from each CPU, these two are
generally not same, so while empty pc_rm_queue is a cyclic list
consisting of just one element pc_rm_queue, the loop terminating
condition will never be met, as >pc_rm_qeue value will be very
different from the one that pc->pc_rm_queue.rmq_next was set at
initialisation time.

This affects MIPS in SMP configuration only, UP configs do not bother
with TLB hackery.

-- 
Alexander Kabaev


pgpz8CADzOmvi.pgp
Description: Цифровая подпись OpenPGP


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-04 Thread Jason Harmening
I suspect this broke rmlocks for mips because the rmlock implementation
takes the address of the per-CPU pc_rm_queue when building tracker lists.
That address may be later accessed from another CPU and will then translate
to the wrong physical region if the address was taken relative to the
globally-constant pcpup VA used on mips.

Regardless, for mips get_pcpup() should be implemented as pcpu_find(curcpu)
since returning an address that may mean something different depending on
the CPU seems like a big POLA violation if nothing else.

I'm more concerned about the report of powerpc breakage.  For powerpc we
simply take each pcpu pointer from the pc_allcpu list (which is the same
value stored in the cpuid_to_pcpu array) and pass it through the ap_pcpu
global to each AP's startup code, which then stores it in sprg0.  It should
be globally unique and won't have the variable-translation issues seen on
mips.   Andreas, are you certain this change was responsible the breakage
you saw, and was it the same sort of hang observed on mips?

On Sat, Feb 4, 2017 at 1:25 AM, Andreas Tobler  wrote:

> On 04.02.17 07:27, Jason Harmening wrote:
>
>> It's hard to argue with that:)  I've backed it out until we can figure
>> out what's going on.
>> Sorry for the breakage.
>>
>
>
> For the record, powerpc64 was also affected.
>
> Andreas
>
>>
>> On Fri, Feb 3, 2017 at 9:54 PM, Kurt Lidl > > wrote:
>>
>> Having just spent a couple of hours bisecting what broke the kernel on
>> my mips64 machine, I can definitively state it was this commit.
>>
>> With this commit in place, the kernel hangs early in the
>> autoconfiguration:
>>
>> gcc version 4.2.1 20070831 patched [FreeBSD]
>> Preloaded elf kernel "kernel" at 0x80aa96a0.
>> real memory  = 523239424 (510976K bytes)
>> Physical memory chunk(s):
>> 0x00bf3000 - 0x080d5fff, 122564608 bytes (29923 pages)
>> 0x08101000 - 0x0ff00fff, 132120576 bytes (32256 pages)
>> 0x41000 - 0x41f196fff, 253325312 bytes (61847 pages)
>> avail memory = 504360960 (480MB)
>> Create COP2 context zone
>> AP #1 started!
>> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>>  hangs here 
>>
>> -Kurt
>>
>> On 2/4/17 12:29 AM, Jason Harmening wrote:
>>
>> Hi,
>>
>> I'm a bit confused as to how this change breaks MIPS.  The new
>> function,
>> get_pcpu() is intended to be used only to access the per-cpu data
>> pointer locally.  It returns pcpup, which is the per-cpu pointer
>> wired
>> into the local TLB to translate to the local CPU's physical data
>> region,
>> correct?
>>
>> This is the same value used by the per-CPU accessors such as
>> PCPU_ADD
>> and PCPU_GET.  The MI portions of this change only use get_pcpu()
>> to
>> access  the local CPU's data, e.g. under a critical section in the
>> rmlock.  It is not intended to be used for iterating all CPUs.
>>
>> If I've missed something and MIPS is truly broken by this, then
>> I'll
>> gladly revert, but (maybe because it's late) I'm not seeing
>> where this
>> goes wrong on MIPS.
>>
>> Thanks,
>> Jason
>>
>> On Fri, Feb 3, 2017 at 8:12 PM, Alexander Kabaev
>> 
>> >> wrote:
>>
>> On Wed, 1 Feb 2017 03:32:49 + (UTC)
>> "Jason A. Harmening"  wrote:
>>
>> > Author: jah
>> > Date: Wed Feb  1 03:32:49 2017
>> > New Revision: 313037
>> > URL: https://svnweb.freebsd.org/changeset/base/313037
>> 
>> > >
>> >
>> > Log:
>> >   Implement get_pcpu() for the remaining architectures and
>> use it to
>> >   replace pcpu_find(curcpu) in MI code.
>> >
>> > Modified:
>> >   head/sys/amd64/include/pcpu.h
>> >   head/sys/kern/kern_rmlock.c
>> >   head/sys/mips/include/pcpu.h
>> >   head/sys/net/netisr.c
>> >   head/sys/powerpc/include/cpufunc.h
>> >   head/sys/powerpc/include/pcpu.h
>> >   head/sys/sparc64/include/pcpu.h
>> >
>>
>> Hi,
>>
>> this change was not reviewed nor testing was thought for all
>> architectures it touches. The change happens to break MIPS
>> quite
>> thoroughly, since MIPS is using different pointers when
>> accessing PCPU
>> area locally and when doing iterations using cpu_to_cpuid
>> array. I
>> therefore 

Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-04 Thread Andreas Tobler

On 04.02.17 07:27, Jason Harmening wrote:

It's hard to argue with that:)  I've backed it out until we can figure
out what's going on.
Sorry for the breakage.



For the record, powerpc64 was also affected.

Andreas


On Fri, Feb 3, 2017 at 9:54 PM, Kurt Lidl > wrote:

Having just spent a couple of hours bisecting what broke the kernel on
my mips64 machine, I can definitively state it was this commit.

With this commit in place, the kernel hangs early in the
autoconfiguration:

gcc version 4.2.1 20070831 patched [FreeBSD]
Preloaded elf kernel "kernel" at 0x80aa96a0.
real memory  = 523239424 (510976K bytes)
Physical memory chunk(s):
0x00bf3000 - 0x080d5fff, 122564608 bytes (29923 pages)
0x08101000 - 0x0ff00fff, 132120576 bytes (32256 pages)
0x41000 - 0x41f196fff, 253325312 bytes (61847 pages)
avail memory = 504360960 (480MB)
Create COP2 context zone
AP #1 started!
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 hangs here 

-Kurt

On 2/4/17 12:29 AM, Jason Harmening wrote:

Hi,

I'm a bit confused as to how this change breaks MIPS.  The new
function,
get_pcpu() is intended to be used only to access the per-cpu data
pointer locally.  It returns pcpup, which is the per-cpu pointer
wired
into the local TLB to translate to the local CPU's physical data
region,
correct?

This is the same value used by the per-CPU accessors such as
PCPU_ADD
and PCPU_GET.  The MI portions of this change only use get_pcpu() to
access  the local CPU's data, e.g. under a critical section in the
rmlock.  It is not intended to be used for iterating all CPUs.

If I've missed something and MIPS is truly broken by this, then I'll
gladly revert, but (maybe because it's late) I'm not seeing
where this
goes wrong on MIPS.

Thanks,
Jason

On Fri, Feb 3, 2017 at 8:12 PM, Alexander Kabaev

>> wrote:

On Wed, 1 Feb 2017 03:32:49 + (UTC)
"Jason A. Harmening"  wrote:

> Author: jah
> Date: Wed Feb  1 03:32:49 2017
> New Revision: 313037
> URL: https://svnweb.freebsd.org/changeset/base/313037

>
>
> Log:
>   Implement get_pcpu() for the remaining architectures and
use it to
>   replace pcpu_find(curcpu) in MI code.
>
> Modified:
>   head/sys/amd64/include/pcpu.h
>   head/sys/kern/kern_rmlock.c
>   head/sys/mips/include/pcpu.h
>   head/sys/net/netisr.c
>   head/sys/powerpc/include/cpufunc.h
>   head/sys/powerpc/include/pcpu.h
>   head/sys/sparc64/include/pcpu.h
>

Hi,

this change was not reviewed nor testing was thought for all
architectures it touches. The change happens to break MIPS quite
thoroughly, since MIPS is using different pointers when
accessing PCPU
area locally and when doing iterations using cpu_to_cpuid
array. I
therefore officially am requesting this change to be
reverted until
reasonable solution is found to unbreak architectures that
use wired
TLBs to access local per-CPU data.

--
Alexander Kabaev






___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-03 Thread Jason Harmening
It's hard to argue with that:)  I've backed it out until we can figure out
what's going on.
Sorry for the breakage.

On Fri, Feb 3, 2017 at 9:54 PM, Kurt Lidl  wrote:

> Having just spent a couple of hours bisecting what broke the kernel on
> my mips64 machine, I can definitively state it was this commit.
>
> With this commit in place, the kernel hangs early in the
> autoconfiguration:
>
> gcc version 4.2.1 20070831 patched [FreeBSD]
> Preloaded elf kernel "kernel" at 0x80aa96a0.
> real memory  = 523239424 (510976K bytes)
> Physical memory chunk(s):
> 0x00bf3000 - 0x080d5fff, 122564608 bytes (29923 pages)
> 0x08101000 - 0x0ff00fff, 132120576 bytes (32256 pages)
> 0x41000 - 0x41f196fff, 253325312 bytes (61847 pages)
> avail memory = 504360960 (480MB)
> Create COP2 context zone
> AP #1 started!
> FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
>  hangs here 
>
> -Kurt
>
> On 2/4/17 12:29 AM, Jason Harmening wrote:
>
>> Hi,
>>
>> I'm a bit confused as to how this change breaks MIPS.  The new function,
>> get_pcpu() is intended to be used only to access the per-cpu data
>> pointer locally.  It returns pcpup, which is the per-cpu pointer wired
>> into the local TLB to translate to the local CPU's physical data region,
>> correct?
>>
>> This is the same value used by the per-CPU accessors such as PCPU_ADD
>> and PCPU_GET.  The MI portions of this change only use get_pcpu() to
>> access  the local CPU's data, e.g. under a critical section in the
>> rmlock.  It is not intended to be used for iterating all CPUs.
>>
>> If I've missed something and MIPS is truly broken by this, then I'll
>> gladly revert, but (maybe because it's late) I'm not seeing where this
>> goes wrong on MIPS.
>>
>> Thanks,
>> Jason
>>
>> On Fri, Feb 3, 2017 at 8:12 PM, Alexander Kabaev > > wrote:
>>
>> On Wed, 1 Feb 2017 03:32:49 + (UTC)
>> "Jason A. Harmening"  wrote:
>>
>> > Author: jah
>> > Date: Wed Feb  1 03:32:49 2017
>> > New Revision: 313037
>> > URL: https://svnweb.freebsd.org/changeset/base/313037
>> 
>> >
>> > Log:
>> >   Implement get_pcpu() for the remaining architectures and use it to
>> >   replace pcpu_find(curcpu) in MI code.
>> >
>> > Modified:
>> >   head/sys/amd64/include/pcpu.h
>> >   head/sys/kern/kern_rmlock.c
>> >   head/sys/mips/include/pcpu.h
>> >   head/sys/net/netisr.c
>> >   head/sys/powerpc/include/cpufunc.h
>> >   head/sys/powerpc/include/pcpu.h
>> >   head/sys/sparc64/include/pcpu.h
>> >
>>
>> Hi,
>>
>> this change was not reviewed nor testing was thought for all
>> architectures it touches. The change happens to break MIPS quite
>> thoroughly, since MIPS is using different pointers when accessing PCPU
>> area locally and when doing iterations using cpu_to_cpuid array. I
>> therefore officially am requesting this change to be reverted until
>> reasonable solution is found to unbreak architectures that use wired
>> TLBs to access local per-CPU data.
>>
>> --
>> Alexander Kabaev
>>
>>
>>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-03 Thread Kurt Lidl

Having just spent a couple of hours bisecting what broke the kernel on
my mips64 machine, I can definitively state it was this commit.

With this commit in place, the kernel hangs early in the
autoconfiguration:

gcc version 4.2.1 20070831 patched [FreeBSD]
Preloaded elf kernel "kernel" at 0x80aa96a0.
real memory  = 523239424 (510976K bytes)
Physical memory chunk(s):
0x00bf3000 - 0x080d5fff, 122564608 bytes (29923 pages)
0x08101000 - 0x0ff00fff, 132120576 bytes (32256 pages)
0x41000 - 0x41f196fff, 253325312 bytes (61847 pages)
avail memory = 504360960 (480MB)
Create COP2 context zone
AP #1 started!
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 hangs here 

-Kurt

On 2/4/17 12:29 AM, Jason Harmening wrote:

Hi,

I'm a bit confused as to how this change breaks MIPS.  The new function,
get_pcpu() is intended to be used only to access the per-cpu data
pointer locally.  It returns pcpup, which is the per-cpu pointer wired
into the local TLB to translate to the local CPU's physical data region,
correct?

This is the same value used by the per-CPU accessors such as PCPU_ADD
and PCPU_GET.  The MI portions of this change only use get_pcpu() to
access  the local CPU's data, e.g. under a critical section in the
rmlock.  It is not intended to be used for iterating all CPUs.

If I've missed something and MIPS is truly broken by this, then I'll
gladly revert, but (maybe because it's late) I'm not seeing where this
goes wrong on MIPS.

Thanks,
Jason

On Fri, Feb 3, 2017 at 8:12 PM, Alexander Kabaev > wrote:

On Wed, 1 Feb 2017 03:32:49 + (UTC)
"Jason A. Harmening"  wrote:

> Author: jah
> Date: Wed Feb  1 03:32:49 2017
> New Revision: 313037
> URL: https://svnweb.freebsd.org/changeset/base/313037

>
> Log:
>   Implement get_pcpu() for the remaining architectures and use it to
>   replace pcpu_find(curcpu) in MI code.
>
> Modified:
>   head/sys/amd64/include/pcpu.h
>   head/sys/kern/kern_rmlock.c
>   head/sys/mips/include/pcpu.h
>   head/sys/net/netisr.c
>   head/sys/powerpc/include/cpufunc.h
>   head/sys/powerpc/include/pcpu.h
>   head/sys/sparc64/include/pcpu.h
>

Hi,

this change was not reviewed nor testing was thought for all
architectures it touches. The change happens to break MIPS quite
thoroughly, since MIPS is using different pointers when accessing PCPU
area locally and when doing iterations using cpu_to_cpuid array. I
therefore officially am requesting this change to be reverted until
reasonable solution is found to unbreak architectures that use wired
TLBs to access local per-CPU data.

--
Alexander Kabaev




___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-03 Thread Jason Harmening
Hi,

I'm a bit confused as to how this change breaks MIPS.  The new function,
get_pcpu() is intended to be used only to access the per-cpu data pointer
locally.  It returns pcpup, which is the per-cpu pointer wired into the
local TLB to translate to the local CPU's physical data region, correct?

This is the same value used by the per-CPU accessors such as PCPU_ADD and
PCPU_GET.  The MI portions of this change only use get_pcpu() to access
 the local CPU's data, e.g. under a critical section in the rmlock.  It is
not intended to be used for iterating all CPUs.

If I've missed something and MIPS is truly broken by this, then I'll gladly
revert, but (maybe because it's late) I'm not seeing where this goes wrong
on MIPS.

Thanks,
Jason

On Fri, Feb 3, 2017 at 8:12 PM, Alexander Kabaev  wrote:

> On Wed, 1 Feb 2017 03:32:49 + (UTC)
> "Jason A. Harmening"  wrote:
>
> > Author: jah
> > Date: Wed Feb  1 03:32:49 2017
> > New Revision: 313037
> > URL: https://svnweb.freebsd.org/changeset/base/313037
> >
> > Log:
> >   Implement get_pcpu() for the remaining architectures and use it to
> >   replace pcpu_find(curcpu) in MI code.
> >
> > Modified:
> >   head/sys/amd64/include/pcpu.h
> >   head/sys/kern/kern_rmlock.c
> >   head/sys/mips/include/pcpu.h
> >   head/sys/net/netisr.c
> >   head/sys/powerpc/include/cpufunc.h
> >   head/sys/powerpc/include/pcpu.h
> >   head/sys/sparc64/include/pcpu.h
> >
>
> Hi,
>
> this change was not reviewed nor testing was thought for all
> architectures it touches. The change happens to break MIPS quite
> thoroughly, since MIPS is using different pointers when accessing PCPU
> area locally and when doing iterations using cpu_to_cpuid array. I
> therefore officially am requesting this change to be reverted until
> reasonable solution is found to unbreak architectures that use wired
> TLBs to access local per-CPU data.
>
> --
> Alexander Kabaev
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"


Re: svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-02-03 Thread Alexander Kabaev
On Wed, 1 Feb 2017 03:32:49 + (UTC)
"Jason A. Harmening"  wrote:

> Author: jah
> Date: Wed Feb  1 03:32:49 2017
> New Revision: 313037
> URL: https://svnweb.freebsd.org/changeset/base/313037
> 
> Log:
>   Implement get_pcpu() for the remaining architectures and use it to
>   replace pcpu_find(curcpu) in MI code.
> 
> Modified:
>   head/sys/amd64/include/pcpu.h
>   head/sys/kern/kern_rmlock.c
>   head/sys/mips/include/pcpu.h
>   head/sys/net/netisr.c
>   head/sys/powerpc/include/cpufunc.h
>   head/sys/powerpc/include/pcpu.h
>   head/sys/sparc64/include/pcpu.h
> 

Hi,

this change was not reviewed nor testing was thought for all
architectures it touches. The change happens to break MIPS quite
thoroughly, since MIPS is using different pointers when accessing PCPU
area locally and when doing iterations using cpu_to_cpuid array. I
therefore officially am requesting this change to be reverted until
reasonable solution is found to unbreak architectures that use wired
TLBs to access local per-CPU data.

-- 
Alexander Kabaev


pgpfUhLU9V2ah.pgp
Description: Цифровая подпись OpenPGP


svn commit: r313037 - in head/sys: amd64/include kern mips/include net powerpc/include sparc64/include

2017-01-31 Thread Jason A. Harmening
Author: jah
Date: Wed Feb  1 03:32:49 2017
New Revision: 313037
URL: https://svnweb.freebsd.org/changeset/base/313037

Log:
  Implement get_pcpu() for the remaining architectures and use it to
  replace pcpu_find(curcpu) in MI code.

Modified:
  head/sys/amd64/include/pcpu.h
  head/sys/kern/kern_rmlock.c
  head/sys/mips/include/pcpu.h
  head/sys/net/netisr.c
  head/sys/powerpc/include/cpufunc.h
  head/sys/powerpc/include/pcpu.h
  head/sys/sparc64/include/pcpu.h

Modified: head/sys/amd64/include/pcpu.h
==
--- head/sys/amd64/include/pcpu.h   Wed Feb  1 03:29:13 2017
(r313036)
+++ head/sys/amd64/include/pcpu.h   Wed Feb  1 03:32:49 2017
(r313037)
@@ -78,6 +78,7 @@
 
 extern struct pcpu *pcpup;
 
+#defineget_pcpu()  (pcpup)
 #definePCPU_GET(member)(pcpup->pc_ ## member)
 #definePCPU_ADD(member, val)   (pcpup->pc_ ## member += (val))
 #definePCPU_INC(member)PCPU_ADD(member, 1)
@@ -203,6 +204,15 @@ extern struct pcpu *pcpup;
}   \
 }
 
+#defineget_pcpu() __extension__ ({ 
\
+   struct pcpu *__pc;  \
+   \
+   __asm __volatile("movq %%gs:%1,%0"  \
+   : "=r" (__pc)   \
+   : "m" (*(struct pcpu *)(__pcpu_offset(pc_prvspace;  \
+   __pc;   \
+})
+
 #definePCPU_GET(member)__PCPU_GET(pc_ ## member)
 #definePCPU_ADD(member, val)   __PCPU_ADD(pc_ ## member, val)
 #definePCPU_INC(member)__PCPU_INC(pc_ ## member)

Modified: head/sys/kern/kern_rmlock.c
==
--- head/sys/kern/kern_rmlock.c Wed Feb  1 03:29:13 2017(r313036)
+++ head/sys/kern/kern_rmlock.c Wed Feb  1 03:32:49 2017(r313037)
@@ -156,7 +156,7 @@ unlock_rm(struct lock_object *lock)
 */
critical_enter();
td = curthread;
-   pc = pcpu_find(curcpu);
+   pc = get_pcpu();
for (queue = pc->pc_rm_queue.rmq_next;
queue != >pc_rm_queue; queue = queue->rmq_next) {
tracker = (struct rm_priotracker *)queue;
@@ -258,7 +258,7 @@ rm_cleanIPI(void *arg)
struct rmlock *rm = arg;
struct rm_priotracker *tracker;
struct rm_queue *queue;
-   pc = pcpu_find(curcpu);
+   pc = get_pcpu();
 
for (queue = pc->pc_rm_queue.rmq_next; queue != >pc_rm_queue;
queue = queue->rmq_next) {
@@ -355,7 +355,7 @@ _rm_rlock_hard(struct rmlock *rm, struct
struct pcpu *pc;
 
critical_enter();
-   pc = pcpu_find(curcpu);
+   pc = get_pcpu();
 
/* Check if we just need to do a proper critical_exit. */
if (!CPU_ISSET(pc->pc_cpuid, >rm_writecpus)) {
@@ -416,7 +416,7 @@ _rm_rlock_hard(struct rmlock *rm, struct
}
 
critical_enter();
-   pc = pcpu_find(curcpu);
+   pc = get_pcpu();
CPU_CLR(pc->pc_cpuid, >rm_writecpus);
rm_tracker_add(pc, tracker);
sched_pin();
@@ -641,7 +641,7 @@ _rm_rlock_debug(struct rmlock *rm, struc
 #ifdef INVARIANTS
if (!(rm->lock_object.lo_flags & LO_RECURSABLE) && !trylock) {
critical_enter();
-   KASSERT(rm_trackers_present(pcpu_find(curcpu), rm,
+   KASSERT(rm_trackers_present(get_pcpu(), rm,
curthread) == 0,
("rm_rlock: recursed on non-recursive rmlock %s @ %s:%d\n",
rm->lock_object.lo_name, file, line));
@@ -771,7 +771,7 @@ _rm_assert(const struct rmlock *rm, int 
}
 
critical_enter();
-   count = rm_trackers_present(pcpu_find(curcpu), rm, curthread);
+   count = rm_trackers_present(get_pcpu(), rm, curthread);
critical_exit();
 
if (count == 0)
@@ -797,7 +797,7 @@ _rm_assert(const struct rmlock *rm, int 
rm->lock_object.lo_name, file, line);
 
critical_enter();
-   count = rm_trackers_present(pcpu_find(curcpu), rm, curthread);
+   count = rm_trackers_present(get_pcpu(), rm, curthread);
critical_exit();
 
if (count != 0)

Modified: head/sys/mips/include/pcpu.h
==
--- head/sys/mips/include/pcpu.hWed Feb  1 03:29:13 2017
(r313036)
+++ head/sys/mips/include/pcpu.hWed Feb  1 03:32:49 2017
(r313037)
@@ -65,6 +65,7 @@ extern char pcpu_space[MAXCPU][PAGE_SIZE