On Wed, Dec 05, 2018 at 10:34:02AM -0500, Josef Bacik wrote:
> v1->v2:
> - dropped my python library, TIL about jq.
> - fixed the spelling mistakes in the test.
>
> -- Original message --
>
> This patchset is to add a test to verify io.latency is working properly, and
> to
> add all the supporting code to run that test.
>
> First is the cgroup2 infrastructure which is fairly straightforward. Just
> verifies we have cgroup2, and gives us the helpers to check and make sure we
> have the right controllers in place for the test.
>
> The second patch brings over some python scripts I put in xfstests for parsing
> the fio json output. I looked at the existing fio performance stuff in
> blktests, but we only capture bw stuff, which is wonky with this particular
> test
> because once the fast group is finished the slow group is allowed to go as
> fast
> as it wants. So I needed this to pull out actual jobtime spent. This will
> give
> us flexibility to pull out other fio performance data in the future.
>
> The final patch is the test itself. It simply runs a job by itself to get a
> baseline view of the disk performance. Then it creates 2 cgroups, one fast
> and
> one slow, and runs the same job simultaneously in both groups. The result
> should be that the fast group takes just slightly longer time than the
> baseline
> (I use a 15% threshold to be safe), and that the slow one takes considerably
> longer. Thanks,
I cleaned up a ton of shellcheck warnings (from `make check`) and pushed
to https://github.com/osandov/blktests/tree/josef. On I tested with QEMU
on Jens' for-next branch. With an emulated NVMe device, it failed with
"Too much of a performance drop for the protected workload". On
virtio-blk, I hit this:
[ 1843.056452] INFO: task fio:20750 blocked for more than 120 seconds.
[ 1843.057495] Not tainted 4.20.0-rc5-00251-g90efb26fa9a4 #19
[ 1843.058487] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[ 1843.059769] fio D0 20750 20747 0x0080
[ 1843.060688] Call Trace:
[ 1843.061123] ? __schedule+0x286/0x870
[ 1843.061735] ? blkcg_iolatency_done_bio+0x680/0x680
[ 1843.062574] ? blkcg_iolatency_cleanup+0x60/0x60
[ 1843.063347] schedule+0x32/0x80
[ 1843.063874] io_schedule+0x12/0x40
[ 1843.064449] rq_qos_wait+0x9a/0x120
[ 1843.065007] ? karma_partition+0x210/0x210
[ 1843.065661] ? blkcg_iolatency_done_bio+0x680/0x680
[ 1843.066435] blkcg_iolatency_throttle+0x185/0x360
[ 1843.067196] __rq_qos_throttle+0x23/0x30
[ 1843.067958] blk_mq_make_request+0x101/0x5c0
[ 1843.068637] generic_make_request+0x1b3/0x3c0
[ 1843.069329] submit_bio+0x45/0x140
[ 1843.069876] blkdev_direct_IO+0x3db/0x440
[ 1843.070527] ? aio_complete+0x2f0/0x2f0
[ 1843.071146] generic_file_direct_write+0x96/0x160
[ 1843.071880] __generic_file_write_iter+0xb3/0x1c0
[ 1843.072599] ? blk_mq_dispatch_rq_list+0x3aa/0x550
[ 1843.073340] blkdev_write_iter+0xa0/0x120
[ 1843.073960] ? __fget+0x6e/0xa0
[ 1843.074452] aio_write+0x11f/0x1d0
[ 1843.074979] ? __blk_mq_run_hw_queue+0x6f/0xe0
[ 1843.075658] ? __check_object_size+0xa0/0x189
[ 1843.076345] ? preempt_count_add+0x5a/0xb0
[ 1843.077086] ? aio_read_events+0x259/0x380
[ 1843.077819] ? kmem_cache_alloc+0x16e/0x1c0
[ 1843.078427] io_submit_one+0x4a8/0x790
[ 1843.078975] ? read_events+0x76/0x150
[ 1843.079510] __se_sys_io_submit+0x98/0x1a0
[ 1843.080116] ? syscall_trace_enter+0x1d3/0x2d0
[ 1843.080785] do_syscall_64+0x55/0x160
[ 1843.081404] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1843.082210] RIP: 0033:0x7f6e571fc4ed
[ 1843.082763] Code: Bad RIP value.
[ 1843.083268] RSP: 002b:7ffc212b76f8 EFLAGS: 0246 ORIG_RAX:
00d1
[ 1843.084445] RAX: ffda RBX: 7f6e4c876870 RCX: 7f6e571fc4ed
[ 1843.085545] RDX: 557c4bc11208 RSI: 0001 RDI: 7f6e4c85e000
[ 1843.086251] RBP: 7f6e4c85e000 R08: 557c4bc2b130 R09: 02f8
[ 1843.087308] R10: 557c4bbf4470 R11: 0246 R12: 0001
[ 1843.088310] R13: R14: 557c4bc11208 R15: 7f6e2b17f070