Re: [Qemu-devel] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-09 Thread Alberto Garcia
On Fri 09 Mar 2018 01:54:31 PM CET, Max Reitz wrote:
> On 2018-03-08 08:44, Alberto Garcia wrote:
>> On Wed 07 Mar 2018 06:54:51 PM CET, Max Reitz wrote:
 v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
 contributes to solve the original bug (both commits need to
 reverted in order to reproduce this bug reliably).

 Rewrite the loop that writes data into the images to make it more
 readable.
>>>
>>> Thanks!  Applied to my block tree:
>>>
>>> https://github.com/XanClic/qemu/commits/block
>>>
>>> (Still took me a couple of attempts to get it to fail both commits
>>> reverted, though...)
>> 
>> Odd, I can reproduce it 100% of the cases. Were you maybe running the
>> tests on tmpfs ?
>
> ...yes? O:-)

Nothing wrong, I do it often, but I used an actual hard drive to
reproduce this bug.

Berto



Re: [Qemu-devel] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-09 Thread Max Reitz
On 2018-03-08 08:44, Alberto Garcia wrote:
> On Wed 07 Mar 2018 06:54:51 PM CET, Max Reitz wrote:
>>> v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
>>> contributes to solve the original bug (both commits need to
>>> reverted in order to reproduce this bug reliably).
>>>
>>> Rewrite the loop that writes data into the images to make it more
>>> readable.
>>
>> Thanks!  Applied to my block tree:
>>
>> https://github.com/XanClic/qemu/commits/block
>>
>> (Still took me a couple of attempts to get it to fail both commits
>> reverted, though...)
> 
> Odd, I can reproduce it 100% of the cases. Were you maybe running the
> tests on tmpfs ?

...yes? O:-)

Max

> Anyway, thanks!




signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-07 Thread Alberto Garcia
On Wed 07 Mar 2018 06:54:51 PM CET, Max Reitz wrote:
>> v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
>> contributes to solve the original bug (both commits need to
>> reverted in order to reproduce this bug reliably).
>> 
>> Rewrite the loop that writes data into the images to make it more
>> readable.
>
> Thanks!  Applied to my block tree:
>
> https://github.com/XanClic/qemu/commits/block
>
> (Still took me a couple of attempts to get it to fail both commits
> reverted, though...)

Odd, I can reproduce it 100% of the cases. Were you maybe running the
tests on tmpfs ?

Anyway, thanks!

Berto



Re: [Qemu-devel] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-07 Thread Max Reitz
On 2018-03-06 14:01, Alberto Garcia wrote:
> This patch tweaks TestParallelOps in iotest 030 so it allocates data
> in smaller regions (256KB/512KB instead of 512KB/1MB) and the
> block-stream job in test_stream_commit() only needs to copy data that
> is at the very end of the image.
> 
> This way when the block-stream job is awakened it will finish right
> away without any chance of being stopped by block_job_sleep_ns(). This
> triggers the bug that was fixed by 3d5d319e1221082974711af1d09d82f and
> 1a63a907507fbbcfaee3f622907ec24 and is therefore a more useful test
> case for parallel block jobs.
> 
> After this patch the aforementiond bug can also be reproduced with the
> test_stream_parallel() test case.
> 
> Since with this change the stream job in test_stream_commit() finishes
> early, this patch introduces a similar test case where both jobs are
> slowed down so they can actually run in parallel.
> 
> Signed-off-by: Alberto Garcia 
> Cc: John Snow 
> ---
> 
> v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
> contributes to solve the original bug (both commits need to
> reverted in order to reproduce this bug reliably).
> 
> Rewrite the loop that writes data into the images to make it more
> readable.

Thanks!  Applied to my block tree:

https://github.com/XanClic/qemu/commits/block

(Still took me a couple of attempts to get it to fail both commits
reverted, though...)

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v4] iotests: Tweak 030 in order to trigger a race condition with parallel jobs

2018-03-06 Thread Alberto Garcia
This patch tweaks TestParallelOps in iotest 030 so it allocates data
in smaller regions (256KB/512KB instead of 512KB/1MB) and the
block-stream job in test_stream_commit() only needs to copy data that
is at the very end of the image.

This way when the block-stream job is awakened it will finish right
away without any chance of being stopped by block_job_sleep_ns(). This
triggers the bug that was fixed by 3d5d319e1221082974711af1d09d82f and
1a63a907507fbbcfaee3f622907ec24 and is therefore a more useful test
case for parallel block jobs.

After this patch the aforementiond bug can also be reproduced with the
test_stream_parallel() test case.

Since with this change the stream job in test_stream_commit() finishes
early, this patch introduces a similar test case where both jobs are
slowed down so they can actually run in parallel.

Signed-off-by: Alberto Garcia 
Cc: John Snow 
---

v4: Mention that commit 1a63a907507fbbcfaee3f622907ec24 also
contributes to solve the original bug (both commits need to
reverted in order to reproduce this bug reliably).

Rewrite the loop that writes data into the images to make it more
readable.

---
 tests/qemu-iotests/030 | 52 ++
 tests/qemu-iotests/030.out |  4 ++--
 2 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 457984b8e9..b5f88959aa 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -156,7 +156,7 @@ class TestSingleDrive(iotests.QMPTestCase):
 class TestParallelOps(iotests.QMPTestCase):
 num_ops = 4 # Number of parallel block-stream operations
 num_imgs = num_ops * 2 + 1
-image_len = num_ops * 1024 * 1024
+image_len = num_ops * 512 * 1024
 imgs = []
 
 def setUp(self):
@@ -176,14 +176,14 @@ class TestParallelOps(iotests.QMPTestCase):
  '-o', 'backing_file=%s' % self.imgs[i-1], self.imgs[i])
 
 # Put data into the images we are copying data from
-for i in range(self.num_imgs / 2):
-img_index = i * 2 + 1
-# Alternate between 512k and 1M.
+odd_img_indexes = [x for x in reversed(range(self.num_imgs)) if x % 2 
== 1]
+for i in range(len(odd_img_indexes)):
+# Alternate between 256KB and 512KB.
 # This way jobs will not finish in the same order they were created
-num_kb = 512 + 512 * (i % 2)
+num_kb = 256 + 256 * (i % 2)
 qemu_io('-f', iotests.imgfmt,
-'-c', 'write -P %d %d %d' % (i, i*1024*1024, num_kb * 
1024),
-self.imgs[img_index])
+'-c', 'write -P 0xFF %dk %dk' % (i * 512, num_kb),
+self.imgs[odd_img_indexes[i]])
 
 # Attach the drive to the VM
 self.vm = iotests.VM()
@@ -318,12 +318,14 @@ class TestParallelOps(iotests.QMPTestCase):
 self.wait_until_completed(drive='commit-drive0')
 
 # Test a block-stream and a block-commit job in parallel
-def test_stream_commit(self):
+# Here the stream job is supposed to finish quickly in order to reproduce
+# the scenario that triggers the bug fixed in 3d5d319e1221 and 1a63a907507
+def test_stream_commit_1(self):
 self.assertLessEqual(8, self.num_imgs)
 self.assert_no_active_block_jobs()
 
 # Stream from node0 into node2
-result = self.vm.qmp('block-stream', device='node2', job_id='node2')
+result = self.vm.qmp('block-stream', device='node2', 
base_node='node0', job_id='node2')
 self.assert_qmp(result, 'return', {})
 
 # Commit from the active layer into node3
@@ -348,6 +350,38 @@ class TestParallelOps(iotests.QMPTestCase):
 
 self.assert_no_active_block_jobs()
 
+# This is similar to test_stream_commit_1 but both jobs are slowed
+# down so they can run in parallel for a little while.
+def test_stream_commit_2(self):
+self.assertLessEqual(8, self.num_imgs)
+self.assert_no_active_block_jobs()
+
+# Stream from node0 into node4
+result = self.vm.qmp('block-stream', device='node4', 
base_node='node0', job_id='node4', speed=1024*1024)
+self.assert_qmp(result, 'return', {})
+
+# Commit from the active layer into node5
+result = self.vm.qmp('block-commit', device='drive0', 
base=self.imgs[5], speed=1024*1024)
+self.assert_qmp(result, 'return', {})
+
+# Wait for all jobs to be finished.
+pending_jobs = ['node4', 'drive0']
+while len(pending_jobs) > 0:
+for event in self.vm.get_qmp_events(wait=True):
+if event['event'] == 'BLOCK_JOB_COMPLETED':
+node_name = self.dictpath(event, 'data/device')
+self.assertTrue(node_name in pending_jobs)
+self.assert_qmp_absent(event, 'data/error')
+pending_jobs.remove(node_name)
+if event['event'] == '