[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-28 Thread Mike Rushton
** Tags removed: blocks-hwcert-server ** Changed in: plainbox-provider-checkbox (Ubuntu) Status: Triaged => Fix Released ** Changed in: plainbox-provider-checkbox (Ubuntu Xenial) Status: Triaged => Fix Released -- You received this bug notification because you are a member of

[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-28 Thread Mike Rushton
Looks like this one is finally resolved. These are the results from my recent testing: Result,Timestamp,Test Name,Duration,kernel,Package version,CPU's,Memory,Package,Note PASS,2017-02-21-19-39-20,disk_stress_ng,01:48:24,4.4.0-63-generic-84-Ubuntu,0.07.16-1ppa1,128,31G,Stock stress-ng, stock

[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-14 Thread Colin Ian King
Hi Jeff, Yep, this issue was spotted on another system in the last week, it's due to a fallocate() option not being supported by some kernels on some file systems. To workaround this, I added two layers of emulation in an abstraction layer to fallocate to resolve this: commit

[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-14 Thread Jeff Lane
@Colin, one final question... I ran an updated stress-ng (0.07.16-1ppa1) on a Xeon Phi system with a TON of cores that was able to reproduce these bugs that we found on power. Note, that version is where we copied stress-ng 0.07.16 to the cert PPA and built it for Xenial since that version

[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-02 Thread Colin Ian King
@Mark, can you open a new bug report as this is a different bug. Also, can you give a summary of what the failure is as I'm not sure from the data you provided what failure issue you are reporting. -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-01 Thread Mark W Wenning
** Attachment added: "submission_2017-02-01T15.41.41.527613.tar.xz" https://bugs.launchpad.net/ubuntu/+source/plainbox-provider-checkbox/+bug/1640547/+attachment/4811956/+files/submission_2017-02-01T15.41.41.527613.tar.xz -- You received this bug notification because you are a member of

[Bug 1640547] Re: stress-ng based disk tests failing

2017-02-01 Thread Mark W Wenning
Got some different errors this time, attaching the test output. ubuntu@ubuntu-DSS1500:~/good-hawk$ cat stress-ng.txt Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)

[Bug 1640547] Re: stress-ng based disk tests failing

2017-01-31 Thread Colin Ian King
** Changed in: stress-ng Status: In Progress => Fix Committed ** Changed in: stress-ng Status: Fix Committed => Fix Released -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-31 Thread Colin Ian King
Oops, I meant "Any progress..." -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about this bug go to:

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-31 Thread Colin Ian King
And progress on this for checkbox? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about this bug go to:

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-07 Thread Colin Ian King
I've made some modifications to the script (see attached), the changes include: 1. kill with ALRM first, then kill with KILL if this does not work after a small grace period. Also report on unkillable stressors 2. bump up async I/O threshold for machines with lots of CPUs 3. force hdd to do sync

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-07 Thread Colin Ian King
Attached: updated script ** Attachment added: "updated disk stress bash script" https://bugs.launchpad.net/ubuntu/+source/plainbox-provider-checkbox/+bug/1640547/+attachment/4788530/+files/disk_stress_ng -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-07 Thread Colin Ian King
OK, I understand what your requirements are now. With that in mind, I guess the best thing to do is: 1. run stress-ng with the -k flag, this keeps all the process names as "stress-ng" rather than "stress-ng-${stressor-name} - this way we can nuke them using killall -9 stress-ng later on. 2.

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-07 Thread Rod Smith
Thanks for the work on this so far, Colin. One point: > I'm still not happy about the /usr/lib/plainbox-provider- checkbox/bin/disk_stress_ng using timeout with a -9 (SIGKILL) to terminate stress-ng stressors. Stress-ng stressors can be *cleanly* terminated with a SIGALRM signal, this triggers

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
I'm still not happy about the /usr/lib/plainbox-provider- checkbox/bin/disk_stress_ng using timeout with a -9 (SIGKILL) to terminate stress-ng stressors. Stress-ng stressors can be *cleanly* terminated with a SIGALRM signal, this triggers all the processes to terminate once they have freed

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
After a couple of runs the readahead stressor had multiple processes stuck on system call #6, close() and required several kill -9 kills on the processes to kill them. This is unexpected behaviour. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
Looks like the fstat stressor sticks on the open of /dev/urandom even with O_NONBLOCK when as root, I'm going to skip that for now in the stressor when running with euid of zero. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
I've fixed up the sync_file_range() call for this architecture. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about this

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
When heavily loaded I'm seeing a bunch of these USB related kworker thread delay warnings: [ 5282.423179] INFO: task kworker/136:1:1165 blocked for more than 120 seconds. [ 5282.423457] Not tainted 4.4.0-47-generic #68-Ubuntu [ 5282.423498] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
OK, so checkbox is also nuking the files under the --fstat stressor and causing it to get stuck in a loop; I've pushed to fixes to make it more robust in checking for "files that have disappeared" while doing fstats on them. But this is a minor issue as checkbox kills stress-ng anyhow, so this is

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Colin Ian King
sync_file_range is not implemented on this architecture, it uses a different system call sync_file_range2() which I need to get around to plugging into this stressor instead. So ignore that failure for now. I'll fix that up before the next release of stress-ng. -- You received this bug

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Mike Rushton
@mark when/if it eventually gets fixed, it might be the same issue. We won't know until we get it resolved. @colin confirmed stress-ng 0.07.08-1 on a different power 8 server still has the issue. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed

Re: [Bug 1640547] Re: stress-ng based disk tests failing

2016-12-06 Thread Jeff Lane
On Mon, Dec 5, 2016 at 11:09 PM, Mark W Wenning wrote: > Mike, will this fix the stress-related errors I'm seeing in Xeon Phi on > Dell C6320p? > https://certification.canonical.com/hardware/201611-25216/ It's possible? But no idea until you try it. We don't have

Re: [Bug 1640547] Re: stress-ng based disk tests failing

2016-12-05 Thread Mark W Wenning
Mike, will this fix the stress-related errors I'm seeing in Xeon Phi on Dell C6320p? https://certification.canonical.com/hardware/201611-25216/ Thanks, Mark Wenning Technical Partner Manager, Cloud Alliances Canonical, Ltd mark.wenn...@canonical.com - "We will encourage you to develop the

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-05 Thread Mike Rushton
stress-ng was at 0.05.23-1ubuntu2 for the above tests. I have now upgraded stress-ng from the package in Xenial to that from Zesty: $ sudo apt-cache policy stress-ng stress-ng: Installed: 0.07.08-1 Candidate: 0.07.08-1 Version table: *** 0.07.08-1 500 500

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-02 Thread Colin Ian King
Which version of stress-ng are you using? With the latest V0.07.08 I get: ./stress-ng --sync-file 1 stress-ng: info: [25692] defaulting to a 86400 second run per stressor stress-ng: info: [25692] dispatching hogs: 1 sync-file stress-ng: info: [25692] cache allocate: default cache size: 3072K

[Bug 1640547] Re: stress-ng based disk tests failing

2016-12-02 Thread Mike Rushton
Attached is another run of the test with test_dir="/tmp/disk_stress_ng_$(uuidgen)" The test was hung with the following on the screen: Running stress-ng sync-file stressor for 240 seconds stress-ng: unrecognized option '--sync-file' Try 'stress-ng --help' for more information. return_code is

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-18 Thread Colin Ian King
It may be worthwhile running each test with a random tmp director, e.g. TMPDIR=/tmp/disk_stress_ng_$(uuidgen) or TMPDIR=/tmp/disk_stress_ng_$(cat /proc/sys/kernel/random/uuid) and that way we don't trash previous test instance temp dir data -- You received this bug notification because you are

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Colin Ian King
It could be that. I've also added some forceful yield points to the fstat stressor on a SIGALRM to try even hard to abort this stressor. For the aiol test, it seems I was using the system call interface rather than the aiol library wrapper. I've fixed this up. I will upload a fixed version ASAP

Re: [Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Jeff Lane
I wonder if it's left over and the script isn't cleaning up /tmp/disk_stress_ng before that process has a chance to cleanly die. On Thu, Nov 17, 2016 at 4:18 PM, Mike Rushton wrote: > It seems to be stuck on: > root 10664 1 0 18:28 pts/300:00:00

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Mike Rushton
It seems to be stuck on: root 10664 1 0 18:28 pts/300:00:00 stress-ng --aggressive --verify --timeout 240 --temp-path /tmp/disk_stress_ng --fstat 0 Though /tmp/disk_stress_ng doesn't exist -- You received this bug notification because you are a member of Ubuntu Bugs, which is

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Mike Rushton
** Attachment added: "screenlog.1" https://bugs.launchpad.net/ubuntu/+source/plainbox-provider-checkbox/+bug/1640547/+attachment/4778821/+files/screenlog.1 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Mike Rushton
** Attachment added: "ps.log" https://bugs.launchpad.net/ubuntu/+source/plainbox-provider-checkbox/+bug/1640547/+attachment/4778822/+files/ps.log -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu.

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Mike Rushton
I had to run the test again. It's been running for about an hour now. I will post the logs and full screen output when it is finished. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title:

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Colin Ian King
Any chance of getting some of that ps / dmesg output? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about this bug go to:

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Mike Rushton
** Changed in: stress-ng Status: Fix Committed => In Progress -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Colin Ian King
It would be useful if one could find out what is running, so output from ps -ax would be useful to see what's hanging. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Colin Ian King
And output from dmesg too please. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about this bug go to:

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-17 Thread Mike Rushton
Still getting the same issues with stress-ng 0.07.04 ubuntu@fesenkov:~$ apt-cache policy stress-ng stress-ng: Installed: 0.07.04-1 Candidate: 0.07.04-1 Version table: *** 0.07.04-1 500 500 http://ports.ubuntu.com/ubuntu-ports zesty/universe ppc64el Packages 100

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-15 Thread Colin Ian King
FYI, the fix landed in stress-ng 0.07.04 over the weekend. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage notifications about this bug go

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-15 Thread Alberto Salvia Novella
** No longer affects: linux (Ubuntu Xenial) ** Changed in: plainbox-provider-checkbox (Ubuntu Xenial) Status: New => Triaged ** Changed in: plainbox-provider-checkbox (Ubuntu) Status: Confirmed => Triaged ** Changed in: plainbox-provider-checkbox (Ubuntu) Importance: Undecided

[Bug 1640547] Re: stress-ng based disk tests failing

2016-11-14 Thread Jeff Lane
** Summary changed: - stress-ng tests failing + stress-ng based disk tests failing -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1640547 Title: stress-ng based disk tests failing To manage