Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-13 Thread Bruce Ashfield

On 09/05/2017 10:59 AM, Richard Purdie wrote:

On Tue, 2017-09-05 at 10:24 -0400, Bruce Ashfield wrote:

On 09/05/2017 10:13 AM, Richard Purdie wrote:


Hi Bruce,

We had a locked up qemuppc lsb image and I was able to find
backtraces
from the serial console log (/home/pokybuild/yocto-
autobuilder/yocto-
worker/nightly-ppc-lsb/build/build/tmp/work/qemuppc-poky-
linux/core-
image-lsb/1.0-r0/target_logs/dmesg_output.log in case anyone ever
needs
to find that). The log is below, this one is for the 4.9 kernel.

Failure as seen on the AB:
https://autobuilder.yoctoproject.org/main/builders/nightly-ppc-lsb/
buil
ds/1189/steps/Running%20Sanity%20Tests/logs/stdio

Not sure what it means, perhaps you can make more sense of it? :)

Very interesting.

I'm (un)fortunately familiar with RCU issues, and obviously, this is
only happening under load. There's clearly a driver issue as it
interacts with whatever is running in userspace.

  From the log, it looks like this is running over NFS and pinning the
CPU and the qemu ethernet isn't handling it gracefully.


Looking at the logs I've seen I don't think this is over NFS, it should
be over virtio:

"Kernel command line: root=/dev/vda"


But exactly what it is, I can't say from that trace. I'll try and do
a cpu-pinned test on qemuppc (over NFS) and see if I can trigger the
same trace.


I'm also not sure what this might be. I did a bit more staring at the
log and I think the system did come back:

NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_disk 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (249.929s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_http 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (212.547s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_reinstall (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (1501.682s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_repoinfo (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (15.952s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_running 
(oe_syslog.SyslogTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.039s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_logger 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_restart 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_startup_config 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_pam (pam.PamBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.003s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_parselogs 
(parselogs.ParseLogsTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (39.675s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_help (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.590s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_query (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.295s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_instal

So for a while there the system "locked up":

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck reinstall -y run-postinsts-dev

Process killed - no output for 1500 seconds. Total running time: 1501 seconds.

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck repoinfo
ssh: connect to host 192.168.7.2 port 22: No route to host

self.assertEqual(status, 1, msg = msg)
AssertionError: 255 != 1 : login command does not work as expected. Status and 
output:255 and ssh: connect to host 192.168.7.2 port 22: No route to host

then the system seems to have come back. All very odd...


I'm still trying to get a solid reproducer for this, but I'm now
going down the route of isolating different parts of the system.

I was looking at:

https://autobuilder.yocto.io/builders/nightly-ppc-lsb/builds/475/steps/Running%20Sanity%20Tests/logs/stdio

And I thought that this was related to the switch of the cdrom to
be virtio backed, but looking at the command line:

tmp/work/x86_64-linux/qemu-helper-native/1.0-r1/recipe-sysroot-native/usr/bin//qemu-system-ppc 
-device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:02 -netdev 
tap,id=net0,ifname=tap0,script=no,downscript=no -drive 

Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-10 Thread Bruce Ashfield

On 2017-09-05 10:59 AM, Richard Purdie wrote:

On Tue, 2017-09-05 at 10:24 -0400, Bruce Ashfield wrote:

On 09/05/2017 10:13 AM, Richard Purdie wrote:


Hi Bruce,

We had a locked up qemuppc lsb image and I was able to find
backtraces
from the serial console log (/home/pokybuild/yocto-
autobuilder/yocto-
worker/nightly-ppc-lsb/build/build/tmp/work/qemuppc-poky-
linux/core-
image-lsb/1.0-r0/target_logs/dmesg_output.log in case anyone ever
needs
to find that). The log is below, this one is for the 4.9 kernel.

Failure as seen on the AB:
https://autobuilder.yoctoproject.org/main/builders/nightly-ppc-lsb/
buil
ds/1189/steps/Running%20Sanity%20Tests/logs/stdio

Not sure what it means, perhaps you can make more sense of it? :)

Very interesting.

I'm (un)fortunately familiar with RCU issues, and obviously, this is
only happening under load. There's clearly a driver issue as it
interacts with whatever is running in userspace.

  From the log, it looks like this is running over NFS and pinning the
CPU and the qemu ethernet isn't handling it gracefully.


Looking at the logs I've seen I don't think this is over NFS, it should
be over virtio:

"Kernel command line: root=/dev/vda"


But exactly what it is, I can't say from that trace. I'll try and do
a cpu-pinned test on qemuppc (over NFS) and see if I can trigger the
same trace.


I'm also not sure what this might be. I did a bit more staring at the
log and I think the system did come back:

NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_disk 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (249.929s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_http 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (212.547s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_reinstall (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (1501.682s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_repoinfo (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (15.952s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_running 
(oe_syslog.SyslogTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.039s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_logger 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_restart 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_startup_config 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_pam (pam.PamBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.003s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_parselogs 
(parselogs.ParseLogsTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (39.675s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_help (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.590s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_query (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.295s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_instal

So for a while there the system "locked up":

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck reinstall -y run-postinsts-dev

Process killed - no output for 1500 seconds. Total running time: 1501 seconds.

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck repoinfo
ssh: connect to host 192.168.7.2 port 22: No route to host

self.assertEqual(status, 1, msg = msg)
AssertionError: 255 != 1 : login command does not work as expected. Status and 
output:255 and ssh: connect to host 192.168.7.2 port 22: No route to host

then the system seems to have come back. All very odd...


After letting my qemuppc run with a hard cpu loop for 5 days, I did
finally manage to get a RCU stall.

I still don't have a root cause, but I can confirm that I saw this
with my 4.12 kernel as well.

Bruce



Cheers,

Richard



--
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-05 Thread Bruce Ashfield

On 09/05/2017 10:59 AM, Richard Purdie wrote:

On Tue, 2017-09-05 at 10:24 -0400, Bruce Ashfield wrote:

On 09/05/2017 10:13 AM, Richard Purdie wrote:


Hi Bruce,

We had a locked up qemuppc lsb image and I was able to find
backtraces
from the serial console log (/home/pokybuild/yocto-
autobuilder/yocto-
worker/nightly-ppc-lsb/build/build/tmp/work/qemuppc-poky-
linux/core-
image-lsb/1.0-r0/target_logs/dmesg_output.log in case anyone ever
needs
to find that). The log is below, this one is for the 4.9 kernel.

Failure as seen on the AB:
https://autobuilder.yoctoproject.org/main/builders/nightly-ppc-lsb/
buil
ds/1189/steps/Running%20Sanity%20Tests/logs/stdio

Not sure what it means, perhaps you can make more sense of it? :)

Very interesting.

I'm (un)fortunately familiar with RCU issues, and obviously, this is
only happening under load. There's clearly a driver issue as it
interacts with whatever is running in userspace.

  From the log, it looks like this is running over NFS and pinning the
CPU and the qemu ethernet isn't handling it gracefully.


Looking at the logs I've seen I don't think this is over NFS, it should
be over virtio:

"Kernel command line: root=/dev/vda"


But exactly what it is, I can't say from that trace. I'll try and do
a cpu-pinned test on qemuppc (over NFS) and see if I can trigger the
same trace.


I'm also not sure what this might be. I did a bit more staring at the
log and I think the system did come back:

NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_disk 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (249.929s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_http 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (212.547s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_reinstall (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (1501.682s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_repoinfo (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (15.952s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_running 
(oe_syslog.SyslogTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.039s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_logger 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_restart 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_startup_config 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_pam (pam.PamBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.003s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_parselogs 
(parselogs.ParseLogsTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (39.675s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_help (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.590s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_query (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.295s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_instal

So for a while there the system "locked up":

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck reinstall -y run-postinsts-dev

Process killed - no output for 1500 seconds. Total running time: 1501 seconds.

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck repoinfo
ssh: connect to host 192.168.7.2 port 22: No route to host

self.assertEqual(status, 1, msg = msg)
AssertionError: 255 != 1 : login command does not work as expected. Status and 
output:255 and ssh: connect to host 192.168.7.2 port 22: No route to host

then the system seems to have come back. All very odd...


I'd expect after the stall that it would come back. But it
is good news that it isn't over NFS, since that would make things
harder to reproduce.

There's some sort of cpu intensive task -> virtio that is not
allowing softIRQd to run within limits.

We could back off the warning and increase the limit, but that
can cause more serious problems down the road.

Bruce



Cheers,

Richard



--
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-05 Thread Richard Purdie
On Tue, 2017-09-05 at 10:24 -0400, Bruce Ashfield wrote:
> On 09/05/2017 10:13 AM, Richard Purdie wrote:
> > 
> > Hi Bruce,
> > 
> > We had a locked up qemuppc lsb image and I was able to find
> > backtraces
> > from the serial console log (/home/pokybuild/yocto-
> > autobuilder/yocto-
> > worker/nightly-ppc-lsb/build/build/tmp/work/qemuppc-poky-
> > linux/core-
> > image-lsb/1.0-r0/target_logs/dmesg_output.log in case anyone ever
> > needs
> > to find that). The log is below, this one is for the 4.9 kernel.
> > 
> > Failure as seen on the AB:
> > https://autobuilder.yoctoproject.org/main/builders/nightly-ppc-lsb/
> > buil
> > ds/1189/steps/Running%20Sanity%20Tests/logs/stdio
> > 
> > Not sure what it means, perhaps you can make more sense of it? :)
> Very interesting.
> 
> I'm (un)fortunately familiar with RCU issues, and obviously, this is
> only happening under load. There's clearly a driver issue as it
> interacts with whatever is running in userspace.
> 
>  From the log, it looks like this is running over NFS and pinning the
> CPU and the qemu ethernet isn't handling it gracefully.

Looking at the logs I've seen I don't think this is over NFS, it should
be over virtio:

"Kernel command line: root=/dev/vda"

> But exactly what it is, I can't say from that trace. I'll try and do
> a cpu-pinned test on qemuppc (over NFS) and see if I can trigger the
> same trace.

I'm also not sure what this might be. I did a bit more staring at the
log and I think the system did come back:

NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_disk 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (249.929s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_install_from_http 
(dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (212.547s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_reinstall (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (1501.682s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_dnf_repoinfo (dnf.DnfRepoTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (15.952s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_running 
(oe_syslog.SyslogTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.039s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_logger 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_restart 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_syslog_startup_config 
(oe_syslog.SyslogTestConfig)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... SKIP (0.001s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_pam (pam.PamBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... FAIL (3.003s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_parselogs 
(parselogs.ParseLogsTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (39.675s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_help (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.590s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_query (rpm.RpmBasicTest)
NOTE: core-image-lsb-1.0-r0 do_testimage:  ... OK (2.295s)
NOTE: core-image-lsb-1.0-r0 do_testimage:   test_rpm_instal

So for a while there the system "locked up":

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck reinstall -y run-postinsts-dev

Process killed - no output for 1500 seconds. Total running time: 1501 seconds.

AssertionError: 255 != 0 : dnf 
--repofrompath=oe-testimage-repo-noarch,http://192.168.7.1:38838/noarch 
--repofrompath=oe-testimage-repo-qemuppc,http://192.168.7.1:38838/qemuppc 
--repofrompath=oe-testimage-repo-ppc7400,http://192.168.7.1:38838/ppc7400 
--nogpgcheck repoinfo
ssh: connect to host 192.168.7.2 port 22: No route to host

self.assertEqual(status, 1, msg = msg)
AssertionError: 255 != 1 : login command does not work as expected. Status and 
output:255 and ssh: connect to host 192.168.7.2 port 22: No route to host

then the system seems to have come back. All very odd...

Cheers,

Richard

-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-05 Thread Bruce Ashfield

On 09/05/2017 10:13 AM, Richard Purdie wrote:

Hi Bruce,

We had a locked up qemuppc lsb image and I was able to find backtraces
from the serial console log (/home/pokybuild/yocto-autobuilder/yocto-
worker/nightly-ppc-lsb/build/build/tmp/work/qemuppc-poky-linux/core-
image-lsb/1.0-r0/target_logs/dmesg_output.log in case anyone ever needs
to find that). The log is below, this one is for the 4.9 kernel.

Failure as seen on the AB:
https://autobuilder.yoctoproject.org/main/builders/nightly-ppc-lsb/buil
ds/1189/steps/Running%20Sanity%20Tests/logs/stdio

Not sure what it means, perhaps you can make more sense of it? :)


Very interesting.

I'm (un)fortunately familiar with RCU issues, and obviously, this is
only happening under load. There's clearly a driver issue as it
interacts with whatever is running in userspace.

From the log, it looks like this is running over NFS and pinning the
CPU and the qemu ethernet isn't handling it gracefully.

But exactly what it is, I can't say from that trace. I'll try and do
a cpu-pinned test on qemuppc (over NFS) and see if I can trigger the
same trace.

Bruce



Cheers,

Richard



[0.00] Total memory = 256MB; using 512kB for hash table (at cff8)
[0.00] Linux version 4.9.46-yocto-standard (oe-user@oe-host) (gcc 
version 7.2.0 (GCC) ) #1 PREEMPT Tue Sep 5 00:20:12 GMT 2017
[0.00] Found UniNorth memory controller & host bridge @ 0xf800 
revision: 0x3124be0
[0.00] Mapped at 0xfdfc
[0.00] Found a Keylargo mac-io controller, rev: 0, mapped at 0xfdf4
[0.00] Processor NAP mode on idle enabled.
[0.00] PowerMac motherboard: PowerMac G4 AGP Graphics
[0.00] Using PowerMac machine description
[0.00] bootconsole [udbg0] enabled
[0.00] -
[0.00] Hash_size = 0x8
[0.00] phys_mem_size = 0x1000
[0.00] dcache_bsize  = 0x20
[0.00] icache_bsize  = 0x20
[0.00] cpu_features  = 0x0020047a
[0.00]   possible= 0x05a6fd7f
[0.00]   always  = 0x
[0.00] cpu_user_features = 0x9c01 0x
[0.00] mmu_features  = 0x0001
[0.00] Hash  = 0xcff8
[0.00] Hash_mask = 0x1fff
[0.00] -
[0.00] Found UniNorth PCI host bridge at 0xf200. Firmware bus 
number: 0->0
[0.00] PCI host bridge /pci@f200 (primary) ranges:
[0.00]   IO 0xf200..0xf27f -> 0x
[0.00]  MEM 0x8000..0x8fff -> 0x8000
[0.00] Top of RAM: 0x1000, Total RAM: 0x1000
[0.00] Memory hole size: 0MB
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x-0x0fff]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x0fff]
[0.00] Initmem setup node 0 [mem 0x-0x0fff]
[0.00] On node 0 totalpages: 65536
[0.00] free_area_init_node: node 0, pgdat c0acf43c, node_mem_map 
cfd3
[0.00]   DMA zone: 576 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 65536 pages, LIFO batch:15
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 64960
[0.00] Kernel command line: root=/dev/vda rw highres=off  mem=256M 
ip=192.168.7.2::192.168.7.1:255.255.255.0 console=tty console=ttyS0 
console=tty1 console=ttyS0,115200n8 printk.time=1 qemurunner_pid=17112
[0.00] PID hash table entries: 1024 (order: 0, 4096 bytes)
[0.00] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[0.00] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[0.00] Sorting __ex_table...
[0.00] Memory: 247220K/262144K available (8164K kernel code, 548K 
rwdata, 1972K rodata, 440K init, 636K bss, 14924K reserved, 0K cma-reserved)
[0.00] Kernel virtual memory layout:
[0.00]   * 0xfffdf000..0xf000  : fixmap
[0.00]   * 0xfd73b000..0xfe00  : early ioremap
[0.00]   * 0xd100..0xfd73b000  : vmalloc & ioremap
[0.00] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
[0.00]  Build-time adjustment of leaf fanout to 32.
[0.00] NR_IRQS:512 nr_irqs:512 16
[0.00] mpic: Resetting
[0.00] mpic: Setting up MPIC " MPIC 1   " version 1.3 at 8004, max 
1 CPUs
[0.00] mpic: ISU size: 48, shift: 6, mask: 3f
[0.00] mpic: Initializing for 48 sources
[0.00] 

Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-05 Thread Richard Purdie
Hi Bruce,

We had a locked up qemuppc lsb image and I was able to find backtraces
from the serial console log (/home/pokybuild/yocto-autobuilder/yocto-
worker/nightly-ppc-lsb/build/build/tmp/work/qemuppc-poky-linux/core-
image-lsb/1.0-r0/target_logs/dmesg_output.log in case anyone ever needs
to find that). The log is below, this one is for the 4.9 kernel.

Failure as seen on the AB:
https://autobuilder.yoctoproject.org/main/builders/nightly-ppc-lsb/buil
ds/1189/steps/Running%20Sanity%20Tests/logs/stdio

Not sure what it means, perhaps you can make more sense of it? :)

Cheers,

Richard



[0.00] Total memory = 256MB; using 512kB for hash table (at cff8)
[0.00] Linux version 4.9.46-yocto-standard (oe-user@oe-host) (gcc 
version 7.2.0 (GCC) ) #1 PREEMPT Tue Sep 5 00:20:12 GMT 2017
[0.00] Found UniNorth memory controller & host bridge @ 0xf800 
revision: 0x3124be0
[0.00] Mapped at 0xfdfc
[0.00] Found a Keylargo mac-io controller, rev: 0, mapped at 0xfdf4
[0.00] Processor NAP mode on idle enabled.
[0.00] PowerMac motherboard: PowerMac G4 AGP Graphics
[0.00] Using PowerMac machine description
[0.00] bootconsole [udbg0] enabled
[0.00] -
[0.00] Hash_size = 0x8
[0.00] phys_mem_size = 0x1000
[0.00] dcache_bsize  = 0x20
[0.00] icache_bsize  = 0x20
[0.00] cpu_features  = 0x0020047a
[0.00]   possible= 0x05a6fd7f
[0.00]   always  = 0x
[0.00] cpu_user_features = 0x9c01 0x
[0.00] mmu_features  = 0x0001
[0.00] Hash  = 0xcff8
[0.00] Hash_mask = 0x1fff
[0.00] -
[0.00] Found UniNorth PCI host bridge at 0xf200. Firmware 
bus number: 0->0
[0.00] PCI host bridge /pci@f200 (primary) ranges:
[0.00]   IO 0xf200..0xf27f -> 0x
[0.00]  MEM 0x8000..0x8fff -> 
0x8000 
[0.00] Top of RAM: 0x1000, Total RAM: 0x1000
[0.00] Memory hole size: 0MB
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x-0x0fff]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x-0x0fff]
[0.00] Initmem setup node 0 [mem 0x-0x0fff]
[0.00] On node 0 totalpages: 65536
[0.00] free_area_init_node: node 0, pgdat c0acf43c, node_mem_map 
cfd3
[0.00]   DMA zone: 576 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 65536 pages, LIFO batch:15
[0.00] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[0.00] pcpu-alloc: [0] 0 
[0.00] Built 1 zonelists in Zone order, mobility grouping on.  Total 
pages: 64960
[0.00] Kernel command line: root=/dev/vda rw highres=off  mem=256M 
ip=192.168.7.2::192.168.7.1:255.255.255.0 console=tty console=ttyS0 
console=tty1 console=ttyS0,115200n8 printk.time=1 qemurunner_pid=17112
[0.00] PID hash table entries: 1024 (order: 0, 4096 bytes)
[0.00] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[0.00] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[0.00] Sorting __ex_table...
[0.00] Memory: 247220K/262144K available (8164K kernel code, 548K 
rwdata, 1972K rodata, 440K init, 636K bss, 14924K reserved, 0K cma-reserved)
[0.00] Kernel virtual memory layout:
[0.00]   * 0xfffdf000..0xf000  : fixmap
[0.00]   * 0xfd73b000..0xfe00  : early ioremap
[0.00]   * 0xd100..0xfd73b000  : vmalloc & ioremap
[0.00] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[0.00] Preemptible hierarchical RCU implementation.
[0.00]  Build-time adjustment of leaf fanout to 32.
[0.00] NR_IRQS:512 nr_irqs:512 16
[0.00] mpic: Resetting
[0.00] mpic: Setting up MPIC " MPIC 1   " version 1.3 at 8004, max 
1 CPUs
[0.00] mpic: ISU size: 48, shift: 6, mask: 3f
[0.00] mpic: Initializing for 48 sources
[0.00] time_init: decrementer frequency = 100.00 MHz
[0.00] time_init: processor frequency   = 266.00 MHz
[0.000867] clocksource: timebase: mask: 0x max_cycles: 
0x171024e7e0, max_idle_ns: 440795205315 ns
[0.001204] clocksource: timebase mult[a00] shift[24] registered
[0.001562] clockevent: decrementer mult[199a] shift[32] cpu[0]
[0.014375] Console: colour dummy device 80x25
[0.030186] console [tty0] enabled
[0.060271] console [ttyS0] enabled
[0.061213] bootconsole [udbg0] 

Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-01 Thread Bruce Ashfield

On 09/01/2017 12:08 PM, Richard Purdie wrote:

On Thu, 2017-08-31 at 15:30 -0400, Bruce Ashfield wrote:

Now that 4.12 + headers are in the tree, this is the next (and
possibly final)
around of -stable updates for the active kernels.

I've built and booted them on all four architectures. But as usual,
this is a lot of combinations to confirm.

The 4.12 update doesn't specifically address the kernel traces that
RP has been seeing, but it also doesn't regress things here. I'd be
interested to know if they help/hinder in the search to find the root
cause.

I'll of course continue to track it down myself.

I also have some functionality fixes as well as configuration updates
as part of this update.


I ran these on the autobuilder. We did see:

https://autobuilder.yocto.io/builders/nightly-ppc/builds/465

which is the latest 4.12 kernel and the latest qemu hanging with no
boot output (but the pid did appear).

So we're still seeing something odd with ppc occasionally. Why? Don't
know :(


The only thing crappier than IDE support, is the PPC platforms that
qemu supports for system emulation.

I'll continue to try and track down the issues.

Bruce



Cheers,

Richard



--
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


Re: [OE-core] [PATCH 0/7] kernel-yocto: conslidated pull request

2017-09-01 Thread Richard Purdie
On Thu, 2017-08-31 at 15:30 -0400, Bruce Ashfield wrote:
> Now that 4.12 + headers are in the tree, this is the next (and
> possibly final)
> around of -stable updates for the active kernels.
> 
> I've built and booted them on all four architectures. But as usual,
> this is a lot of combinations to confirm.
> 
> The 4.12 update doesn't specifically address the kernel traces that
> RP has been seeing, but it also doesn't regress things here. I'd be
> interested to know if they help/hinder in the search to find the root
> cause.
> 
> I'll of course continue to track it down myself.
> 
> I also have some functionality fixes as well as configuration updates
> as part of this update.

I ran these on the autobuilder. We did see:

https://autobuilder.yocto.io/builders/nightly-ppc/builds/465

which is the latest 4.12 kernel and the latest qemu hanging with no
boot output (but the pid did appear).

So we're still seeing something odd with ppc occasionally. Why? Don't
know :(

Cheers,

Richard
-- 
___
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core