date:20190919

Re: [Qemu-devel] [PATCH v12 1/2] block/backup: fix max_transfer handling for copy_range

2019-09-19 Thread John Snow




On 9/19/19 2:50 AM, Vladimir Sementsov-Ogievskiy wrote:
> 18.09.2019 22:57, John Snow wrote:
>>
>>
>> On 9/17/19 12:07 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
>>> are trying to find aligned size which satisfy both source and target.
>>> Also, don't ignore too small max_transfer. In this case seems safer to
>>> disable copy_range.
>>>
>>> Fixes: 9ded4a0114968e
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   block/backup.c | 12 
>>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 763f0d7ff6..d8fdbfadfe 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -741,12 +741,16 @@ BlockJob *backup_job_create(const char *job_id, 
>>> BlockDriverState *bs,
>>>   job->cluster_size = cluster_size;
>>>   job->copy_bitmap = copy_bitmap;
>>>   copy_bitmap = NULL;
>>> -    job->use_copy_range = !compress; /* compression isn't supported for it 
>>> */
>>>   job->copy_range_size = 
>>> MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
>>>   
>>> blk_get_max_transfer(job->target));
>>> -    job->copy_range_size = MAX(job->cluster_size,
>>> -   QEMU_ALIGN_UP(job->copy_range_size,
>>> - job->cluster_size));
>>> +    job->copy_range_size = QEMU_ALIGN_DOWN(job->copy_range_size,
>>> +   job->cluster_size);
>>> +    /*
>>> + * Compression is not supported for copy_range. Also, we don't want to
>>> + * handle small max_transfer for copy_range (which currently don't
>>> + * handle max_transfer at all).
>>> + */
>>> +    job->use_copy_range = !compress && job->copy_range_size > 0;
>>>   /* Required permissions are already taken with target's blk_new() */
>>>   block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
>>>
>>
>> I'm clear on the alignment fix, I'm not clear on the comment about 
>> max_transfer and how it relates to copy_range_size being non-zero.
>>
>> "small max transfer" -- what happens when it's zero? we're apparently OK 
>> with a single cluster, but when it's zero, what happens?
> 
> if it zero it means that source or target requires max_transfer less than 
> cluster_size. It seems not valid to call copy_range in this case.
> Still it's OK to use normal read/write, as they handle max_transfer 
> internally in a loop (copy_range doesn't do it).
> 

oh, I'm ... sorry, I just didn't quite understand the comment.

You're just making sure copy_range after all of our checks is non-zero,
plain and simple. If max_transfer was *smaller than a job cluster*, we
might end up with a copy_range size that's zero, which is obviously...
not useful.

So, I might phrase "Also, we don't want to..." as:

"copy_range does not respect max_transfer, so we factor that in here. If
it's smaller than the job->cluster_size, we are unable to use copy_range."

Just a suggestion, though, so:

Reviewed-by: John Snow 


(SHOULD copy_range respect max_transfer? I guess it would be quite
different -- it would only count things it had to fall back and actually
*transfer*, right? I suppose that because it can have that fallback we
need to accommodate it here in backup.c, hence this workaround clamp.)

Re: [Qemu-devel] [PATCH v12 2/2] block/backup: fix backup_cow_with_offload for last cluster

2019-09-19 Thread John Snow




On 9/19/19 3:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> 18.09.2019 23:14, John Snow wrote:
>>
>>
>> On 9/17/19 12:07 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> We shouldn't try to copy bytes beyond EOF. Fix it.
>>>
>>> Fixes: 9ded4a0114968e
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> Reviewed-by: Max Reitz 
>>> ---
>>>   block/backup.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index d8fdbfadfe..89f7f89200 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -161,7 +161,7 @@ static int coroutine_fn 
>>> backup_cow_with_offload(BackupBlockJob *job,
>>>   assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
>>>   assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>>> -    nbytes = MIN(job->copy_range_size, end - start);
>>> +    nbytes = MIN(job->copy_range_size, MIN(end, job->len) - start);
>>
>> I'm a little confused. I think the patch as written is correct, but I don't 
>> know what problem it solves.
> 
> last cluster may exceed EOF. And backup_do_cow (who calls  
> backup_cow_with_offload) rounds all to clusters.
> It's not bad, but we need to crop nbytes before calling actual io functions. 
> backup_cow_with_bounce_buffer does the same thing.
> 
>>
>> If we're going to allow the caller to pass in an end that's beyond EOF, does 
>> that mean we are *requiring* the caller to pass in a value that's aligned?
> 
> Actually yes, as we are resetting dirty bitmap.
> 
>>
>> We should probably assert what kind of a value we're accepted here, right? 
>> We do for start, but should we for 'end' as well?
> 
> Yes assertion may be added.
> 
>>
>> Then ...
>>
>>>   nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
>>
>> Don't we just round this right back up immediately anyway? Does that mean we 
>> have callers that are passing in an 'end' that's more than 1 job-cluster 
>> beyond EOF? That seems like something that should be fixed in the caller, 
>> surely?
> 
> nr_clusters are used to set/reset dirty bitmap. It's OK. Still, for last 
> cluster we can drop it and use nbytes directly. No there should not be such 
> callers.
> nbytes is used to call blk_co_copy_range, and must be cropped to not exceed 
> EOF.
> 

Ah, right, right ... I *was* confused. We don't use nr_clusters for the
IO itself, just the bitmap. So we effectively re-calculate aligned and
unaligned values for use in different places.

> Also, backup_cow_with_bounce_buffer behave in similar way: it crops nbytes.
> 
> Of course, there is a place for good refactoring, but I think not in this 
> patch, it's a small bug fix.
> 
>>
>>>   bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
>>>   job->cluster_size * nr_clusters);
>>>
>>
> 
> 

We should make the interface here a little more clear I think, but what
you wrote is correct.

Reviewed-by: John Snow

Re: [RFC 4/4] ahci media error reporting

2019-09-19 Thread Tony Asleson

On 9/19/19 3:43 PM, John Snow wrote:
> 
> 
> On 9/19/19 3:48 PM, Tony Asleson wrote:
>> Initial attempt at returning a media error for ahci.  This is certainly
>> wrong and needs serious improvement.
>>
> 
> Hi; I have the unfortunate distinction of being the AHCI maintainer.
> Please CC me on future revisions and discussion if you are interacting
> with my problem child.

Will do and thank you for taking a look at this!

> Also remember to CC qemu-block.
> 
>> Signed-off-by: Tony Asleson 
>> ---
>>  hw/ide/ahci.c | 27 +++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
>> index d45393c019..f487764106 100644
>> --- a/hw/ide/ahci.c
>> +++ b/hw/ide/ahci.c
>> @@ -36,6 +36,7 @@
>>  #include "hw/ide/internal.h"
>>  #include "hw/ide/pci.h"
>>  #include "ahci_internal.h"
>> +#include "block/error_inject.h"
>>  
>>  #include "trace.h"
>>  
>> @@ -999,6 +1000,22 @@ static void ncq_err(NCQTransferState *ncq_tfs)
>>  ncq_tfs->used = 0;
>>  }
>>  
>> +/*
>> + * Figure out correct way to report media error, this is at best a guess
>> + * and based on the output of linux kernel, not even remotely close.
>> + */
> 
> Depends on what kind of error, exactly, you're trying to report, and at
> what level. (AHCI, NCQ, SATA, ATA?)

I was trying to return a media error, like a 3/1101 for SCSI device.  I
think that is at the ATA level?


> Keep in mind that you've inserted an error path for SATA drives using
> NCQ, but seemingly haven't touched SATA drives not using NCQ, or ATAPI
> devices using either PATA/SATA, or ATA drives on PATA.

Correct, but for trying out a simple read on a SATA drive for Linux I
end up here first :-)  Well, until the kernel retries a number of times
and finally gives up and returns an error while also disabling NCQ for
the device.


>> +static void ncq_media_err(NCQTransferState *ncq_tfs, uint64_t err_sector)
>> +{
>> +IDEState *ide_state = _tfs->drive->port.ifs[0];
>> +
>> +ide_state->error = ECC_ERR;
>> +ide_state->status = READY_STAT | ERR_STAT;
>> +ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
>> +ncq_tfs->lba = err_sector;
>> +qemu_sglist_destroy(_tfs->sglist);
>> +ncq_tfs->used = 0;
>> +}
>> +
> 
> If you are definitely very sure you only want an ide_state error
> difference, you could just as well call ncq_err and then patch
> ide_state->error.
> 
> (I am not sure that's what you want, but... maybe it is?)

As I mentioned above, return an unrecoverable media error.

SCSI sense data can report the first sector in error and I thought I
could do the same for SATA too?

> I'd have to check -- because I can't say the AHCI emulator was designed
> so much as congealed -- but you might need calls to ncq_finish.
> 
> usually, ncq_cb handles the return from any NCQ command and will call
> ncq_err and ncq_finish as appropriate to tidy up the command.
> 
> It might be a mistake that execute_ncq_command issues ncq_err in the
> `default` arm of the switch statement without a call to finish.
> 
> If we do call ncq_finish from this context I'm not sure if we want
> block_acct_done here unconditionally. We may not have started a block
> accounting operation if we never started a backend operation. Everything
> else looks about right to me.
> 
> 
>>  static void ncq_finish(NCQTransferState *ncq_tfs)
>>  {
>>  /* If we didn't error out, set our finished bit. Errored commands
>> @@ -1065,6 +1082,8 @@ static void execute_ncq_command(NCQTransferState 
>> *ncq_tfs)
>>  {
>>  AHCIDevice *ad = ncq_tfs->drive;
>>  IDEState *ide_state = >port.ifs[0];
>> +uint64_t error_sector = 0;
>> +char device_id[32];
>>  int port = ad->port_no;
>>  
>>  g_assert(is_ncq(ncq_tfs->cmd));
>> @@ -1072,6 +1091,14 @@ static void execute_ncq_command(NCQTransferState 
>> *ncq_tfs)
>>  
>>  switch (ncq_tfs->cmd) {
>>  case READ_FPDMA_QUEUED:
>> +sprintf(device_id, "%lu", ide_state->wwn);
> 
> This seems suspicious for your design in general, but I'd like to not
> run sprintf to a buffer in the hotpath for NCQ.

I totally agree.

I started out using integers in the call for error_in_read, as that is
what SCSI uses internally for wwn.  Then I did NVMe and it's using a
string that doesn't apparently need to be an integer for the wwn? so I
changed it to being a string to accommodate.

I also find it interesting that when a SATA device wwid is dumped out
within the guest it doesn't appear to have any correlation with the wwn
that was passed in on the command line, eg.

-device ide-drive,drive=satadisk,bus=ahci.0,wwn=8675309

$cat /sys/block/sda/device/wwid
t10.ATA QEMU HARDDISK   QM5


This does correlate for SCSI

-device scsi-hd,drive=hd,wwn=12345678

$ cat /sys/block/sdc/device/wwid
naa.00bc614e


as 0xbc614e = 12345678


For NVMe, the wwn is in the wwid, but it's not immediately obvious.

Being able to correlate between the command line and what you find in

Re: [RFC 4/4] ahci media error reporting

2019-09-19 Thread John Snow

On 9/19/19 3:48 PM, Tony Asleson wrote:
> Initial attempt at returning a media error for ahci.  This is certainly
> wrong and needs serious improvement.
> 

Hi; I have the unfortunate distinction of being the AHCI maintainer.
Please CC me on future revisions and discussion if you are interacting
with my problem child.

Also remember to CC qemu-block.

> Signed-off-by: Tony Asleson 
> ---
>  hw/ide/ahci.c | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index d45393c019..f487764106 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -36,6 +36,7 @@
>  #include "hw/ide/internal.h"
>  #include "hw/ide/pci.h"
>  #include "ahci_internal.h"
> +#include "block/error_inject.h"
>  
>  #include "trace.h"
>  
> @@ -999,6 +1000,22 @@ static void ncq_err(NCQTransferState *ncq_tfs)
>  ncq_tfs->used = 0;
>  }
>  
> +/*
> + * Figure out correct way to report media error, this is at best a guess
> + * and based on the output of linux kernel, not even remotely close.
> + */

Depends on what kind of error, exactly, you're trying to report, and at
what level. (AHCI, NCQ, SATA, ATA?)

Keep in mind that you've inserted an error path for SATA drives using
NCQ, but seemingly haven't touched SATA drives not using NCQ, or ATAPI
devices using either PATA/SATA, or ATA drives on PATA.

> +static void ncq_media_err(NCQTransferState *ncq_tfs, uint64_t err_sector)
> +{
> +IDEState *ide_state = _tfs->drive->port.ifs[0];
> +
> +ide_state->error = ECC_ERR;
> +ide_state->status = READY_STAT | ERR_STAT;
> +ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
> +ncq_tfs->lba = err_sector;
> +qemu_sglist_destroy(_tfs->sglist);
> +ncq_tfs->used = 0;
> +}
> +

If you are definitely very sure you only want an ide_state error
difference, you could just as well call ncq_err and then patch
ide_state->error.

(I am not sure that's what you want, but... maybe it is?)

I'd have to check -- because I can't say the AHCI emulator was designed
so much as congealed -- but you might need calls to ncq_finish.

usually, ncq_cb handles the return from any NCQ command and will call
ncq_err and ncq_finish as appropriate to tidy up the command.

It might be a mistake that execute_ncq_command issues ncq_err in the
`default` arm of the switch statement without a call to finish.

If we do call ncq_finish from this context I'm not sure if we want
block_acct_done here unconditionally. We may not have started a block
accounting operation if we never started a backend operation. Everything
else looks about right to me.

>  static void ncq_finish(NCQTransferState *ncq_tfs)
>  {
>  /* If we didn't error out, set our finished bit. Errored commands
> @@ -1065,6 +1082,8 @@ static void execute_ncq_command(NCQTransferState 
> *ncq_tfs)
>  {
>  AHCIDevice *ad = ncq_tfs->drive;
>  IDEState *ide_state = >port.ifs[0];
> +uint64_t error_sector = 0;
> +char device_id[32];
>  int port = ad->port_no;
>  
>  g_assert(is_ncq(ncq_tfs->cmd));
> @@ -1072,6 +1091,14 @@ static void execute_ncq_command(NCQTransferState 
> *ncq_tfs)
>  
>  switch (ncq_tfs->cmd) {
>  case READ_FPDMA_QUEUED:
> +sprintf(device_id, "%lu", ide_state->wwn);

This seems suspicious for your design in general, but I'd like to not
run sprintf to a buffer in the hotpath for NCQ.

If you need this, I'd do it when wwn is set and just grab it from the
state structure.

> +
> +if (error_in_read(device_id, ncq_tfs->lba,
> +ncq_tfs->sector_count, _sector)) {
> +ncq_media_err(ncq_tfs, error_sector);
> +return;
> +}
> +

One of the downsides to trying to trigger read error injections
per-device instead of per-node is that now you have to goof around with
device-specific code everywhere.

I suppose from your cover letter you *WANT* device-specific error
exercising, which would necessitate a different design from blkdebug,
but it means you have to add support for the framework per-device and it
might be very tricky to get right.

>  trace_execute_ncq_command_read(ad->hba, port, ncq_tfs->tag,
> ncq_tfs->sector_count, ncq_tfs->lba);
>  dma_acct_start(ide_state->blk, _tfs->acct,
>

Re: [PATCH v2 2/2] iotests: Remove Python 2 compatibility code

2019-09-19 Thread John Snow




On 9/19/19 12:29 PM, Kevin Wolf wrote:
> Some scripts check the Python version number and have two code paths to
> accomodate both Python 2 and 3. Remove the code specific to Python 2 and
> assert the minimum version of 3.6 instead (check skips Python tests in
> this case, so the assertion would only ever trigger if a Python script
> is executed manually).
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/044   |  3 ---
>  tests/qemu-iotests/163   |  3 ---
>  tests/qemu-iotests/iotests.py| 13 +++--
>  tests/qemu-iotests/nbd-fault-injector.py |  7 +++
>  4 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/tests/qemu-iotests/044 b/tests/qemu-iotests/044
> index 05ea1f49c5..8b2afa2a11 100755
> --- a/tests/qemu-iotests/044
> +++ b/tests/qemu-iotests/044
> @@ -28,9 +28,6 @@ import struct
>  import subprocess
>  import sys
>  
> -if sys.version_info.major == 2:
> -range = xrange
> -
>  test_img = os.path.join(iotests.test_dir, 'test.img')
>  
>  class TestRefcountTableGrowth(iotests.QMPTestCase):
> diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
> index 081ccc8ac1..d94728e080 100755
> --- a/tests/qemu-iotests/163
> +++ b/tests/qemu-iotests/163
> @@ -21,9 +21,6 @@
>  import os, random, iotests, struct, qcow2, sys
>  from iotests import qemu_img, qemu_io, image_size
>  
> -if sys.version_info.major == 2:
> -range = xrange
> -
>  test_img = os.path.join(iotests.test_dir, 'test.img')
>  check_img = os.path.join(iotests.test_dir, 'check.img')
>  
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index b26271187c..9fb5181c3d 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -35,6 +35,7 @@ from collections import OrderedDict
>  sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 
> 'python'))
>  from qemu import qtest
>  
> +assert sys.version_info >= (3,6)
>  
>  # This will not work if arguments contain spaces but is necessary if we
>  # want to support the override options that ./check supports.
> @@ -250,10 +251,7 @@ def image_size(img):
>  return json.loads(r)['virtual-size']
>  
>  def is_str(val):
> -if sys.version_info.major >= 3:
> -return isinstance(val, str)
> -else:
> -return isinstance(val, str) or isinstance(val, unicode)
> +return isinstance(val, str)
>  
>  test_dir_re = re.compile(r"%s" % test_dir)
>  def filter_test_dir(msg):
> @@ -935,12 +933,7 @@ def execute_test(test_function=None,
>  else:
>  # We need to filter out the time taken from the output so that
>  # qemu-iotest can reliably diff the results against master output.
> -if sys.version_info.major >= 3:
> -output = io.StringIO()
> -else:
> -# io.StringIO is for unicode strings, which is not what
> -# 2.x's test runner emits.
> -output = io.BytesIO()
> +output = io.StringIO()
>  
>  logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
>  
> diff --git a/tests/qemu-iotests/nbd-fault-injector.py 
> b/tests/qemu-iotests/nbd-fault-injector.py
> index 6b2d659dee..43f095ceef 100755
> --- a/tests/qemu-iotests/nbd-fault-injector.py
> +++ b/tests/qemu-iotests/nbd-fault-injector.py
> @@ -48,10 +48,9 @@ import sys
>  import socket
>  import struct
>  import collections
> -if sys.version_info.major >= 3:
> -import configparser
> -else:
> -import ConfigParser as configparser
> +import configparser
> +
> +assert sys.version_info >= (3,6)
>  
>  FAKE_DISK_SIZE = 8 * 1024 * 1024 * 1024 # 8 GB
>  
> 

Reviewed-by: John Snow

Re: [Qemu-block] [Qemu-devel] [PATCH] block/backup: install notifier during creation

2019-09-19 Thread John Snow




On 9/19/19 3:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> 18.09.2019 23:31, John Snow wrote:
>>
>>
>> On 9/10/19 9:23 AM, John Snow wrote:
>>>
>>>
>>> On 9/10/19 4:19 AM, Stefan Hajnoczi wrote:
 On Wed, Aug 21, 2019 at 04:01:52PM -0400, John Snow wrote:
>
>
> On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 09.08.2019 23:13, John Snow wrote:
>>> Backup jobs may yield prior to installing their handler, because of the
>>> job_co_entry shim which guarantees that a job won't begin work until
>>> we are ready to start an entire transaction.
>>>
>>> Unfortunately, this makes proving correctness about transactional
>>> points-in-time for backup hard to reason about. Make it explicitly clear
>>> by moving the handler registration to creation time, and changing the
>>> write notifier to a no-op until the job is started.
>>>
>>> Reported-by: Vladimir Sementsov-Ogievskiy 
>>> Signed-off-by: John Snow 
>>> ---
>>>block/backup.c | 32 +++-
>>>include/qemu/job.h |  5 +
>>>job.c  |  2 +-
>>>3 files changed, 29 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/block/backup.c b/block/backup.c
>>> index 07d751aea4..4df5b95415 100644
>>> --- a/block/backup.c
>>> +++ b/block/backup.c
>>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_notify(
>>>assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
>>>assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
>>>
>>> +/* The handler is installed at creation time; the actual 
>>> point-in-time
>>> + * starts at job_start(). Transactions guarantee those two points 
>>> are
>>> + * the same point in time. */
>>> +if (!job_started(>common.job)) {
>>> +return 0;
>>> +}
>>
>> Hmm, sorry if it is a stupid question, I'm not good in multiprocessing 
>> and in
>> Qemu iothreads..
>>
>> job_started just reads job->co. If bs runs in iothread, and therefore 
>> write-notifier
>> is in iothread, when job_start is called from main thread.. Is it 
>> guaranteed that
>> write-notifier will see job->co variable change early enough to not miss 
>> guest write?
>> Should not job->co be volatile for example or something like this?
>>
>> If not think about this patch looks good for me.
>>
>
> You know, it's a really good question.
> So good, in fact, that I have no idea.
>
> ¯\_(ツ)_/¯
>
> I'm fairly certain that IO will not come in until the .clean phase of a
> qmp_transaction, because bdrv_drained_begin(bs) is called during
> .prepare, and we activate the handler (by starting the job) in .commit.
> We do not end the drained section until .clean.
>
> I'm not fully clear on what threading guarantees we have otherwise,
> though; is it possible that "Thread A" would somehow lift the bdrv_drain
> on an IO thread ("Thread B") and, after that, "Thread B" would somehow
> still be able to see an outdated version of job->co that was set by
> "Thread A"?
>
> I doubt it; but I can't prove it.

 In the qmp_backup() case (not qmp_transaction()) there is:

void qmp_drive_backup(DriveBackup *arg, Error **errp)
{

BlockJob *job;
job = do_drive_backup(arg, NULL, errp);
if (job) {
job_start(>job);
}
}

 job_start() is called without any thread synchronization, which is
 usually fine because the coroutine doesn't run until job_start() calls
 aio_co_enter().

 Now that the before write notifier has been installed early, there is
 indeed a race between job_start() and the write notifier accessing
 job->co from an IOThread.

 The write before notifier might see job->co != NULL before job_start()
 has finished.  This could lead to issues if job_*() APIs are invoked by
 the write notifier and access an in-between job state.

>>>
>>> I see. I think in this case, as long as it sees != NULL, that the
>>> notifier is actually safe to run. I agree that this might be confusing
>>> to verify and could bite us in the future. The worry we had, too, is
>>> more the opposite: will it see NULL for too long? We want to make sure
>>> that it is registering as true *before the first yield*.
>>>
 A safer approach is to set a BackupBlockJob variable at the beginning of
 backup_run() and check it from the before write notifier.

>>>
>>> That's too late, for reasons below.
>>>
 That said, I don't understand the benefit of this patch and IMO it makes
 the code harder to understand because now we need to think about the
 created but not started state too.

 Stefan

>>>
>>> It's always possible I've hyped myself up into believing

bugfix ping Re: [PATCH v4 00/10] qcow2-bitmaps: rewrite reopening logic

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

ping
05.09.2019 12:54, Vladimir Sementsov-Ogievskiy wrote:
> ping
> 
> 07.08.2019 17:12, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Bitmaps reopening is buggy, reopening-rw just not working at all and
>> reopening-ro may lead to producing broken incremental
>> backup if we do temporary snapshot in a meantime.
>>
>> v4: Drop complicated solution around reopening logic [Kevin], fix
>>  the existing bug in a simplest way
>>
>> Structure:
>>
>> 02: fix reopen to RW
>> 03: test reopen to RW
>>
>> 07: fix reopen to RO
>> 08: test reopen to RO
>>
>> Others are less significant improvements and refactoring
>>
>> Changelog:
>>
>> 01-03: new patches, to fix reopening bitmaps to RW and personal test for
>>     this bug
>> 08: merged test from v3, it covers both bugs (reopen to RW and reopen to RO)
>> 10: instead of moving bitmap-reopening to prepare(as in 09 in v3) we now 
>> keep it
>>  in commit, but in right place
>> others: unchanged
>>
>> v3:
>> 02: John's events_wait already merged in, so my 02 from v2 is not needed.
>>  Instead, add two simple logging wrappers here
>> 03: rebase on 02 - use new wrappers, move to 260
>> 05: add John's r-b
>> 06: improve function docs [John], add John's r-b
>>
>> v2:
>> 01: new
>> 02-03: test: splat into two patches, some wording
>>     improvements and event_wait improved
>> 04: add John's r-b
>> 05: new
>> 06-09: fixes: changed, splat, use patch 01
>>
>> Vladimir Sementsov-Ogievskiy (10):
>>    block: switch reopen queue from QSIMPLEQ to QTAILQ
>>    block: reverse order for reopen commits
>>    iotests: add test-case to 165 to test reopening qcow2 bitmaps to RW
>>    iotests.py: add event_wait_log and events_wait_log helpers
>>    block/qcow2-bitmap: get rid of bdrv_has_changed_persistent_bitmaps
>>    block/qcow2-bitmap: drop qcow2_reopen_bitmaps_rw_hint()
>>    block/qcow2-bitmap: do not remove bitmaps on reopen-ro
>>    iotests: add test 260 to check bitmap life after snapshot + commit
>>    block/qcow2-bitmap: fix and improve qcow2_reopen_bitmaps_rw
>>    qcow2-bitmap: move bitmap reopen-rw code to qcow2_reopen_commit
>>
>>   block/qcow2.h |   5 +-
>>   include/block/block.h |   2 +-
>>   include/block/block_int.h |   6 --
>>   include/block/dirty-bitmap.h  |   1 -
>>   block.c   |  51 +---
>>   block/dirty-bitmap.c  |  12 ---
>>   block/qcow2-bitmap.c  | 143 --
>>   block/qcow2.c |  17 +++-
>>   tests/qemu-iotests/165    |  46 ++-
>>   tests/qemu-iotests/165.out    |   4 +-
>>   tests/qemu-iotests/260    |  87 +
>>   tests/qemu-iotests/260.out    |  52 +
>>   tests/qemu-iotests/group  |   1 +
>>   tests/qemu-iotests/iotests.py |  10 +++
>>   14 files changed, 318 insertions(+), 119 deletions(-)
>>   create mode 100755 tests/qemu-iotests/260
>>   create mode 100644 tests/qemu-iotests/260.out
>>
> 
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH 4/4] iotests: Add test for failing mirror complete

2019-09-19 Thread Max Reitz

On 19.09.19 19:02, John Snow wrote:
> 
> 
> On 9/19/19 12:58 PM, Max Reitz wrote:
>> On 18.09.19 20:46, John Snow wrote:
>>>
>>>
>>> On 9/12/19 9:56 AM, Max Reitz wrote:
 Signed-off-by: Max Reitz 
 ---
   tests/qemu-iotests/041 | 44 ++
   tests/qemu-iotests/041.out |  4 ++--
   2 files changed, 46 insertions(+), 2 deletions(-)

 diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
 index 8568426311..84bc6d6581 100755
 --- a/tests/qemu-iotests/041
 +++ b/tests/qemu-iotests/041
 @@ -1121,6 +1121,50 @@ class TestOrphanedSource(iotests.QMPTestCase):
    target='dest-ro')
   self.assert_qmp(result, 'error/class', 'GenericError')
   +    def test_failing_permission_in_complete(self):
 +    self.assert_no_active_block_jobs()
 +
 +    # Unshare consistent-read on the target
 +    # (The mirror job does not care)
 +    result = self.vm.qmp('blockdev-add',
 + driver='blkdebug',
 + node_name='dest-perm',
 + image='dest',
 + unshare_child_perms=['consistent-read'])
 +    self.assert_qmp(result, 'return', {})
 +
 +    result = self.vm.qmp('blockdev-mirror', job_id='job',
 device='src',
 + sync='full', target='dest',
 + filter_node_name='mirror-filter')
 +    self.assert_qmp(result, 'return', {})
 +
 +    # Require consistent-read on the source
 +    # (We can only add this node once the job has started, or it
 +    # will complain that it does not want to run on non-root nodes)
 +    result = self.vm.qmp('blockdev-add',
 + driver='blkdebug',
 + node_name='src-perm',
 + image='src',
 + take_child_perms=['consistent-read'])
 +    self.assert_qmp(result, 'return', {})
 +
 +    # While completing, mirror will attempt to replace src by
 +    # dest, which must fail because src-perm requires
 +    # consistent-read but dest-perm does not share it; thus
 +    # aborting the job when it is supposed to complete
 +    self.complete_and_wait('job',
 +   completion_error='Operation not
 permitted')
 +
 +    # Assert that all of our nodes are still there (except for the
 +    # mirror filter, which should be gone despite the failure)
 +    nodes = self.vm.qmp('query-named-block-nodes')['return']
 +    nodes = list(map(lambda image: image['node-name'], nodes))
>>>
>>> Sadly, the list comprehension is a good suggestion. Squash it in if
>>> you'd like.
>>
>> You know I don’t, but I’ll do it anyway.
>>
> 
> Don't you dare make me feel bad by re-spinning, though.

I have to respin anyway. ;-)

 +
 +    for expect in ['src', 'src-perm', 'dest', 'dest-perm']:
 +    self.assertTrue(expect in nodes, '%s disappeared' % expect)
>>>
>>> I could be a real weenie and say "why not use a tuple here?"
>>
>> OK.
>>
>>> but, I'll only pretend I didn't say that instead of couching it in a
>>> self-deprecating wrapper to the same end effect.
>>>
>>> (I'm a weenie.)
>>>
 +    self.assertFalse('mirror-filter' in nodes,
 + 'Mirror filter node did not disappear')
 +
   if __name__ == '__main__':
   iotests.main(supported_fmts=['qcow2', 'qed'],
    supported_protocols=['file'])
 diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
 index 2c448b4239..f496be9197 100644
 --- a/tests/qemu-iotests/041.out
 +++ b/tests/qemu-iotests/041.out
 @@ -1,5 +1,5 @@
 -..

 +...

   --
 -Ran 90 tests
 +Ran 91 tests
     OK

>>>
>>> Or don't do anything, because I'm just being obnoxious, and I owe you
>>> one for giving you bad advice last round.
>>
>> Come on, I have better (more selfish) reasons.
>>
>> For the list comprehension: I want to introduce as many instances of
>> map/filter as I can so using those functions becomes normal.
>>
> 
> They have their uses! But also they're usually just simply longer and
> aren't worth using where list comprehensions do.

The point is that they’re special language constructs whereas map/filter
are available in basically any decent language.

>> For the tuple: This is a test, nobody cares whether it uses 60 bytes
>> more memory or is 10 µs slower or lets

Re: [PATCH v2 0/2] iotests: Require Python 3.6 or later

2019-09-19 Thread Philippe Mathieu-Daudé

On 9/19/19 6:29 PM, Kevin Wolf wrote:
> v2:
> 
> - Provide the right exit code from Python instead of having a
>   potentially confusing negation in the shell script
> 
> - Raised the minimal version to 3.6. If we're going to use a different
>   version than QEMU as a whole anyway, we can use a version that suits
>   us best. 3.5 would only be for Debian Stretch, and we don't really
>   care that much about running Python test cases on it.
> 
> - Added a patch to remove Python 2 compatibility code
> 
> Kevin Wolf (2):
>   iotests: Require Python 3.6 or later
>   iotests: Remove Python 2 compatibility code
> 
>  tests/qemu-iotests/044   |  3 ---
>  tests/qemu-iotests/163   |  3 ---
>  tests/qemu-iotests/check | 13 -
>  tests/qemu-iotests/iotests.py| 13 +++--
>  tests/qemu-iotests/nbd-fault-injector.py |  7 +++
>  5 files changed, 18 insertions(+), 21 deletions(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [Qemu-devel] [PATCH 4/4] iotests: Add test for failing mirror complete

2019-09-19 Thread John Snow




On 9/19/19 12:58 PM, Max Reitz wrote:
> On 18.09.19 20:46, John Snow wrote:
>>
>>
>> On 9/12/19 9:56 AM, Max Reitz wrote:
>>> Signed-off-by: Max Reitz 
>>> ---
>>>   tests/qemu-iotests/041 | 44 ++
>>>   tests/qemu-iotests/041.out |  4 ++--
>>>   2 files changed, 46 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>>> index 8568426311..84bc6d6581 100755
>>> --- a/tests/qemu-iotests/041
>>> +++ b/tests/qemu-iotests/041
>>> @@ -1121,6 +1121,50 @@ class TestOrphanedSource(iotests.QMPTestCase):
>>>    target='dest-ro')
>>>   self.assert_qmp(result, 'error/class', 'GenericError')
>>>   +    def test_failing_permission_in_complete(self):
>>> +    self.assert_no_active_block_jobs()
>>> +
>>> +    # Unshare consistent-read on the target
>>> +    # (The mirror job does not care)
>>> +    result = self.vm.qmp('blockdev-add',
>>> + driver='blkdebug',
>>> + node_name='dest-perm',
>>> + image='dest',
>>> + unshare_child_perms=['consistent-read'])
>>> +    self.assert_qmp(result, 'return', {})
>>> +
>>> +    result = self.vm.qmp('blockdev-mirror', job_id='job',
>>> device='src',
>>> + sync='full', target='dest',
>>> + filter_node_name='mirror-filter')
>>> +    self.assert_qmp(result, 'return', {})
>>> +
>>> +    # Require consistent-read on the source
>>> +    # (We can only add this node once the job has started, or it
>>> +    # will complain that it does not want to run on non-root nodes)
>>> +    result = self.vm.qmp('blockdev-add',
>>> + driver='blkdebug',
>>> + node_name='src-perm',
>>> + image='src',
>>> + take_child_perms=['consistent-read'])
>>> +    self.assert_qmp(result, 'return', {})
>>> +
>>> +    # While completing, mirror will attempt to replace src by
>>> +    # dest, which must fail because src-perm requires
>>> +    # consistent-read but dest-perm does not share it; thus
>>> +    # aborting the job when it is supposed to complete
>>> +    self.complete_and_wait('job',
>>> +   completion_error='Operation not
>>> permitted')
>>> +
>>> +    # Assert that all of our nodes are still there (except for the
>>> +    # mirror filter, which should be gone despite the failure)
>>> +    nodes = self.vm.qmp('query-named-block-nodes')['return']
>>> +    nodes = list(map(lambda image: image['node-name'], nodes))
>>
>> Sadly, the list comprehension is a good suggestion. Squash it in if
>> you'd like.
> 
> You know I don’t, but I’ll do it anyway.
> 

Don't you dare make me feel bad by re-spinning, though.

>>> +
>>> +    for expect in ['src', 'src-perm', 'dest', 'dest-perm']:
>>> +    self.assertTrue(expect in nodes, '%s disappeared' % expect)
>>
>> I could be a real weenie and say "why not use a tuple here?"
> 
> OK.
> 
>> but, I'll only pretend I didn't say that instead of couching it in a
>> self-deprecating wrapper to the same end effect.
>>
>> (I'm a weenie.)
>>
>>> +    self.assertFalse('mirror-filter' in nodes,
>>> + 'Mirror filter node did not disappear')
>>> +
>>>   if __name__ == '__main__':
>>>   iotests.main(supported_fmts=['qcow2', 'qed'],
>>>    supported_protocols=['file'])
>>> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
>>> index 2c448b4239..f496be9197 100644
>>> --- a/tests/qemu-iotests/041.out
>>> +++ b/tests/qemu-iotests/041.out
>>> @@ -1,5 +1,5 @@
>>> -..
>>>
>>> +...
>>>
>>>   --
>>> -Ran 90 tests
>>> +Ran 91 tests
>>>     OK
>>>
>>
>> Or don't do anything, because I'm just being obnoxious, and I owe you
>> one for giving you bad advice last round.
> 
> Come on, I have better (more selfish) reasons.
> 
> For the list comprehension: I want to introduce as many instances of
> map/filter as I can so using those functions becomes normal.
> 

They have their uses! But also they're usually just simply longer and
aren't worth using where list comprehensions do.

> For the tuple: This is a test, nobody cares whether it uses 60 bytes
> more memory or is 10 µs slower or lets something be mutable when it is
> actually never changed.  As such, I like to use the most general
> solution which is simply a list.
> 

I did say I was being a weenie. You really can take the reviewed-by!

Re: [Qemu-devel] [PATCH 4/4] iotests: Add test for failing mirror complete

2019-09-19 Thread Max Reitz

On 18.09.19 20:46, John Snow wrote:
> 
> 
> On 9/12/19 9:56 AM, Max Reitz wrote:
>> Signed-off-by: Max Reitz 
>> ---
>>   tests/qemu-iotests/041 | 44 ++
>>   tests/qemu-iotests/041.out |  4 ++--
>>   2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>> index 8568426311..84bc6d6581 100755
>> --- a/tests/qemu-iotests/041
>> +++ b/tests/qemu-iotests/041
>> @@ -1121,6 +1121,50 @@ class TestOrphanedSource(iotests.QMPTestCase):
>>    target='dest-ro')
>>   self.assert_qmp(result, 'error/class', 'GenericError')
>>   +    def test_failing_permission_in_complete(self):
>> +    self.assert_no_active_block_jobs()
>> +
>> +    # Unshare consistent-read on the target
>> +    # (The mirror job does not care)
>> +    result = self.vm.qmp('blockdev-add',
>> + driver='blkdebug',
>> + node_name='dest-perm',
>> + image='dest',
>> + unshare_child_perms=['consistent-read'])
>> +    self.assert_qmp(result, 'return', {})
>> +
>> +    result = self.vm.qmp('blockdev-mirror', job_id='job',
>> device='src',
>> + sync='full', target='dest',
>> + filter_node_name='mirror-filter')
>> +    self.assert_qmp(result, 'return', {})
>> +
>> +    # Require consistent-read on the source
>> +    # (We can only add this node once the job has started, or it
>> +    # will complain that it does not want to run on non-root nodes)
>> +    result = self.vm.qmp('blockdev-add',
>> + driver='blkdebug',
>> + node_name='src-perm',
>> + image='src',
>> + take_child_perms=['consistent-read'])
>> +    self.assert_qmp(result, 'return', {})
>> +
>> +    # While completing, mirror will attempt to replace src by
>> +    # dest, which must fail because src-perm requires
>> +    # consistent-read but dest-perm does not share it; thus
>> +    # aborting the job when it is supposed to complete
>> +    self.complete_and_wait('job',
>> +   completion_error='Operation not
>> permitted')
>> +
>> +    # Assert that all of our nodes are still there (except for the
>> +    # mirror filter, which should be gone despite the failure)
>> +    nodes = self.vm.qmp('query-named-block-nodes')['return']
>> +    nodes = list(map(lambda image: image['node-name'], nodes))
> 
> Sadly, the list comprehension is a good suggestion. Squash it in if
> you'd like.

You know I don’t, but I’ll do it anyway.

>> +
>> +    for expect in ['src', 'src-perm', 'dest', 'dest-perm']:
>> +    self.assertTrue(expect in nodes, '%s disappeared' % expect)
> 
> I could be a real weenie and say "why not use a tuple here?"

OK.

> but, I'll only pretend I didn't say that instead of couching it in a
> self-deprecating wrapper to the same end effect.
> 
> (I'm a weenie.)
> 
>> +    self.assertFalse('mirror-filter' in nodes,
>> + 'Mirror filter node did not disappear')
>> +
>>   if __name__ == '__main__':
>>   iotests.main(supported_fmts=['qcow2', 'qed'],
>>    supported_protocols=['file'])
>> diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out
>> index 2c448b4239..f496be9197 100644
>> --- a/tests/qemu-iotests/041.out
>> +++ b/tests/qemu-iotests/041.out
>> @@ -1,5 +1,5 @@
>> -..
>>
>> +...
>>
>>   --
>> -Ran 90 tests
>> +Ran 91 tests
>>     OK
>>
> 
> Or don't do anything, because I'm just being obnoxious, and I owe you
> one for giving you bad advice last round.

Come on, I have better (more selfish) reasons.

For the list comprehension: I want to introduce as many instances of
map/filter as I can so using those functions becomes normal.

For the tuple: This is a test, nobody cares whether it uses 60 bytes
more memory or is 10 µs slower or lets something be mutable when it is
actually never changed.  As such, I like to use the most general
solution which is simply a list.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] iotests: Require Python 3.5 or later

2019-09-19 Thread John Snow

On 9/19/19 4:35 AM, Kevin Wolf wrote:
> Am 18.09.2019 um 20:49 hat John Snow geschrieben:
>>
>>
>> On 9/18/19 4:55 AM, Kevin Wolf wrote:
>>> Running iotests is not required to build QEMU, so we can have stricter
>>> version requirements for Python here and can make use of new features
>>> and drop compatibility code earlier.
>>>
>>> This makes qemu-iotests skip all Python tests if a Python version before
>>> 3.5 is used for the build.
>>>
>>> Suggested-by: Eduardo Habkost 
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>   tests/qemu-iotests/check | 14 +-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
>>> index 875399d79f..a68f414d6c 100755
>>> --- a/tests/qemu-iotests/check
>>> +++ b/tests/qemu-iotests/check
>>> @@ -633,6 +633,13 @@ then
>>>   export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
>>>   fi
>>> +# Note that if the Python conditional here evaluates True we will exit
>>> +# with status 1 which is a shell 'false' value.
>>> +python_usable=false
>>> +if ! $PYTHON -c 'import sys; sys.exit(sys.version_info >= (3,5))'; then
>>> +python_usable=true
>>> +fi
>>> +
>>
>> Do we want this as a temporary fix only until we can stipulate the same
>> version in the configure file?
> 
> I thought that maybe we should leave the code around so that at some
> later point, we could upgrade it to 3.6 (or something else) before QEMU
> as a whole does so.
> 

Sorry for the double-post. Do we want the iotest dependencies to
formally diverge from QEMU as a whole? It's part of the default `make
check` target and it's nice that it only skips instead of failing
outright, but ... I would like to somehow centrally express our
dependencies instead of hard-coding them in various individual places.
Python can be just one example of this.

Ideally, in my software utopia; configure expresses these dependencies
in a central location where they are easy to verify and maintain.

(And easy for downstream maintainers to verify and adjust their package
dependencies as-needed, too.)

I don't think it's a good idea to make testing run far ahead of the
build requirements in case we see more re-use and interdependence of
Python code in the future. Or, in general, to allow situations where you
can build but not test, especially on older platforms where that testing
might be even more important.

I think we're being good about that so far, but fracturing the
requirements seems like a step we maybe ought not take if we don't have to.

But, well, maybe it is a good idea and I am wrong, but I still think it
gets confusing to have one-off dependencies expressed in various places.

> In fact... Could we already require 3.6 now instead of using 3.5, which
> I think we only chose because of Debian Stretch (oldstable)?
> 
> Kevin
>

Re: [Qemu-devel] [PATCH 4/4] iotests: Add test for failing mirror complete

2019-09-19 Thread Max Reitz

On 18.09.19 18:30, Vladimir Sementsov-Ogievskiy wrote:
> 12.09.2019 16:56, Max Reitz wrote:
>> Signed-off-by: Max Reitz 
>> ---
>>   tests/qemu-iotests/041 | 44 ++
>>   tests/qemu-iotests/041.out |  4 ++--
>>   2 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041
>> index 8568426311..84bc6d6581 100755
>> --- a/tests/qemu-iotests/041
>> +++ b/tests/qemu-iotests/041
>> @@ -1121,6 +1121,50 @@ class TestOrphanedSource(iotests.QMPTestCase):
>>target='dest-ro')
>>   self.assert_qmp(result, 'error/class', 'GenericError')
>>   
>> +def test_failing_permission_in_complete(self):
>> +self.assert_no_active_block_jobs()
>> +
>> +# Unshare consistent-read on the target
>> +# (The mirror job does not care)
>> +result = self.vm.qmp('blockdev-add',
>> + driver='blkdebug',
>> + node_name='dest-perm',
>> + image='dest',
>> + unshare_child_perms=['consistent-read'])
>> +self.assert_qmp(result, 'return', {})
>> +
>> +result = self.vm.qmp('blockdev-mirror', job_id='job', device='src',
>> + sync='full', target='dest',
>> + filter_node_name='mirror-filter')
>> +self.assert_qmp(result, 'return', {})
>> +
>> +# Require consistent-read on the source
>> +# (We can only add this node once the job has started, or it
>> +# will complain that it does not want to run on non-root nodes)
>> +result = self.vm.qmp('blockdev-add',
>> + driver='blkdebug',
>> + node_name='src-perm',
>> + image='src',
>> + take_child_perms=['consistent-read'])
>> +self.assert_qmp(result, 'return', {})
>> +
>> +# While completing, mirror will attempt to replace src by
>> +# dest, which must fail because src-perm requires
>> +# consistent-read but dest-perm does not share it; thus
>> +# aborting the job when it is supposed to complete
>> +self.complete_and_wait('job',
>> +   completion_error='Operation not permitted')
>> +
>> +# Assert that all of our nodes are still there (except for the
>> +# mirror filter, which should be gone despite the failure)
>> +nodes = self.vm.qmp('query-named-block-nodes')['return']
>> +nodes = list(map(lambda image: image['node-name'], nodes))
> 
> using list comprehension is a bit more pythonic:
> nodes = [node['node-name'] for node in nodes]

OK.  I can never remember, so I rarely bother using list/dict
comprehensions and instead use what I would do in any other language.
(Hence the “Sadly” from John.)

(And then wait for some kind reviewer to tell me how to do it right. :-))

Max



signature.asc
Description: OpenPGP digital signature

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Daniel P . Berrangé

On Thu, Sep 19, 2019 at 04:16:25PM +, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 18:50, Daniel P. Berrangé wrote:
> > On Thu, Sep 19, 2019 at 10:24:20AM -0500, Eric Blake wrote:
> >> On 9/19/19 9:49 AM, Daniel P. Berrangé wrote:
> >>
>  ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
>  **errp parameter is dirt-simple to explain.  It has no performance
>  penalty if the user passed in a normal error or error_abort (the cost of
>  an 'if' hidden in the macro is probably negligible compared to
>  everything else we do), and has no semantic penalty if the user passed
>  in NULL or error_fatal (we now get the behavior we want with less
>  boilerplate).
> 
>  Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
>  can I omit it?' does not provide the same simplicity.
> >>>
> >>> The flipside is that MAKE_ERRP_SAFE hides a bunch of logic, so you don't
> >>> really know what its doing without looking at it, and this is QEMU
> >>> custom concept so one more thing to learn for new contributors.
> >>>
> >>> While I think it is a nice trick, personally I think we would be better
> >>> off if we simply used a code pattern which does not require de-referencing
> >>> 'errp' at all, aside from exceptional cases. IOW, no added macro in 95%
> >>> of all our methods using Error **errp.
> >>
> >> If 100% of our callsites use the macro, then new contributors will
> >> quickly learn by observation alone that the macro usage must be
> >> important on any new function taking Error **errp, whether or not they
> >> actually read the macro to see what it does.  If only 5% of our
> >> callsites use the macro, it's harder to argue that a new user will pick
> >> up on the nuances by observation alone (presumably, our docs would also
> >> spell it out, but we know that not everyone reads those...).
> > 
> > To get some slightly less made-up stats, I did some searching:
> > 
> >- 4408  methods with an 'errp' parameter declared
> > 
> >   git grep 'Error \*\*errp'|  wc -l
> > 
> >- 696 methods with an 'Error *local' declared (what other names
> >  do we use for this?)
> > 
> >   git grep 'Error \*local' | wc -l
> > 
> >- 1243 methods with an 'errp' parameter which have void
> >  return value (fuzzy number - my matching is quite crude)
> > 
> >   git grep 'Error \*\*errp'|  grep -E '(^| )void' | wc -l
> > 
> >- 11 methods using error_append_hint with a local_err
> > 
> >   git grep append_hint | grep local | wc -l
> 
> why do you count only with local? Greg's series is here to bring local to all
> functions with append_hint:

I hadn't noticed the scope of Greg's series :-)

> 
> # git grep append_hint | wc -l
> 85



> Also, conversion to use macro everywhere may be done (seems so) by coccinelle 
> script.
> But conversion which you mean, only by hand I think. Converting 1243 methods 
> by hand
> is a huge task..

Yeah, it would be a non-negligible amount of work.

> I think there are three consistent ways:
> 
> 1. Use macro everywhere
> 2. Drop error_append_hint
> 3. Drop error_fatal

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH 0/4] mirror: Do not dereference invalid pointers

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 19:45, Max Reitz wrote:
> On 18.09.19 17:38, Vladimir Sementsov-Ogievskiy wrote:
>> 12.09.2019 16:56, Max Reitz wrote:
>>> Hi,
>>>
>>> The fix (patch 1) is pretty straightforward; patch 2 (which I need for
>>> the test) may not be.
>>>
>>> The biggest problem with patch 2 is that you can use it to uncover where
>>> our permission handling is broken.  For example, devising the test case
>>> (patch 4) was very difficult because I kept running into the
>>> _abort that mirror_exit_common() passes when dropping the
>>> mirror_top_bs.
>>>
>>> The problem is that mirror_top_bs does not take the same permissions
>>> that its parent takes.  Ergo using _abort when dropping it is
>>> wrong: The parent may require more permissions that mirror_top_bs did,
>>> and so dropping mirror_top_bs may fail.
>>>
>>> Now what’s really bad is that this cannot be fixed with our current
>>> permission system.  mirror_top_bs was introduced precisely so it does
>>> not take CONSISTENT_READ, but can still allow parents to take it (for
>>> active commits).  But what if there is actually something besides the
>>> mirror job that unshares CONSISTENT_READ?
>>>
>>>
>>> Imagine this:
>>>
>>> mirror target BB   mirror source BB
>>> | |
>>> v v
>>> mirror_top_bs -> top -> mid -> base
>>> ^
>>> |
>>>other_parent
>>>
>>> The source BB unshares CONSISTENT_READ on the base.  mirror_top_bs
>>> ensures that its parents can read from top even though top itself cannot
>>> allow CONSISTENT_READ to be taken.  So far so good.
>>>
>>> But what if other_parent also unshares CONSISTENT_READ?  Then,
>>> mirror_top_bs has no business allowing its parents to take it.
>>>
>>> No idea how to fix that.  (I suppose mirror_top_bs would need some way
>>> to verify that there is no other party that has unshared CONSISTENT_READ
>>> but its associated source BB.
>>
>> May be we need grouped permissions?
>>
>> Some way to define group of children, which may unshare read permission
>> for other children (out of the group), but still children in group may
>> have read permission?
> 
> Hm, is that different from my idea below where one of mirror_top's
> children unshares the read permission, and another is allowed to take it
> still?

I just tried to imagine something generic

> 
> (The problem is always that if some BDS has a parent that unshares this
> permission, this condition propagates upwards through its other parents,
> and we need to keep track of who unshared it in the first place.)
> 
>> But it don't work here as we are saying about children on different
>> nodes.. And propagated through backing chain permissions..
> 
> Yep.
> 
>>>   In the future, we want the source BB to
>>> go away and instead have the source be an immediate BdrvChild of
>>> mirror_top_bs.  Maybe we can then build something into the block layer
>>> so that a node can only restore CONSISTENT_READ when it was that node
>>> that broke it?)
>>>
>>>
>>> Anyway.  You can see something arising from this problem simply by
>>> unsharing CONSISTENT_READ on the target node.  (Just drop the src-perm
>>> node from the test I add in patch 4.)  Replacing the source with the
>>> target will then work fine (because mirror_top_bs doesn’t care about
>>> CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs –
>>> because its parent does want CONSISTENT_READ.  Thus, the _abort
>>> aborts.
>>>
>>>
>>> While this is a more special case, I have no idea how to fix this one
>>> either.
>>>
>>>
>>> Soo...  This series just fixes one thing, and leaves another unfixed
>>> because I have no idea how to fix it.  Worse, it adds parameters to
>>> blkdebug to actually see the problem.  Do we want to let blkdebug be
>>> able to crash qemu (because of a bug in qemu)?
>>>
>>
>> blkdebug is for debugging and not used by end users like libvirt, yes?
> 
> Correct.
> 
> 
>>>
>>> Max Reitz (4):
>>> mirror: Do not dereference invalid pointers
>>> blkdebug: Allow taking/unsharing permissions
>>> iotests: Add @error to wait_until_completed
>>> iotests: Add test for failing mirror complete
>>>
>>>qapi/block-core.json  |  29 +-
>>>block/blkdebug.c  | 106 +-
>>>block/mirror.c|  13 +++--
>>>tests/qemu-iotests/041|  44 ++
>>>tests/qemu-iotests/041.out|   4 +-
>>>tests/qemu-iotests/iotests.py |  18 --
>>>6 files changed, 200 insertions(+), 14 deletions(-)
>>>
>>
>>
> 
> 


-- 
Best regards,
Vladimir

Re: [Qemu-devel] [PATCH 2/4] blkdebug: Allow taking/unsharing permissions

2019-09-19 Thread Max Reitz

On 18.09.19 18:01, Vladimir Sementsov-Ogievskiy wrote:
> 12.09.2019 16:56, Max Reitz wrote:
>> Sometimes it is useful to be able to add a node to the block graph that
>> takes or unshare a certain set of permissions for debugging purposes.
>> This patch adds this capability to blkdebug.
>>
>> (Note that you cannot make blkdebug release or share permissions that it
>> needs to take or cannot share, because this might result in assertion
>> failures in the block layer.  But if the blkdebug node has no parents,
>> it will not take any permissions and share everything by default, so you
>> can then freely choose what permissions to take and share.)
>>
>> Signed-off-by: Max Reitz 
>> ---
>>   qapi/block-core.json |  29 +++-
>>   block/blkdebug.c | 106 ++-
>>   2 files changed, 133 insertions(+), 2 deletions(-)
>>
>> diff --git a/qapi/block-core.json b/qapi/block-core.json
>> index e6edd641f1..336043e02c 100644
>> --- a/qapi/block-core.json
>> +++ b/qapi/block-core.json
>> @@ -3347,6 +3347,21 @@
>>   '*state': 'int',
>>   'new_state': 'int' } }
>>   
>> +##
>> +# @BlockdevPermission:
>> +#
>> +# Permissions that an edge in the block graph can take or share.
>> +#
>> +# Since: 4.2
>> +##
>> +{ 'enum': 'BlockdevPermission',
>> +  'data': [
>> +  'consistent-read',
>> +  'write',
>> +  'write-unchanged',
>> +  'resize',
>> +  'graph-mod' ] }
> 
> :)
> 
> BlockPermission is already here since 4.0 and has exactly same content. (And 
> better documented)

Oops.  Very good, then, thanks. :-)

[...]

>> @@ -344,6 +349,84 @@ static void blkdebug_parse_filename(const char 
>> *filename, QDict *options,
>>   qdict_put_str(options, "x-image", filename);
>>   }
>>   
>> +static int blkdebug_parse_perm_list(uint64_t *dest, QDict *options,
>> +const char *prefix, Error **errp)
>> +{
>> +int ret = 0;
>> +QDict *subqdict = NULL;
>> +QObject *crumpled_subqdict = NULL;
>> +QList *perm_list;
>> +const QListEntry *perm;
>> +
>> +qdict_extract_subqdict(options, , prefix);
>> +if (!qdict_size(subqdict)) {
>> +goto out;
>> +}
>> +
>> +crumpled_subqdict = qdict_crumple(subqdict, errp);
>> +if (!crumpled_subqdict) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +perm_list = qobject_to(QList, crumpled_subqdict);
>> +if (!perm_list) {
>> +/* Omit the trailing . from the prefix */
>> +error_setg(errp, "%.*s expects a list",
>> +   (int)(strlen(prefix) - 1), prefix);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +for (perm = qlist_first(perm_list); perm; perm = qlist_next(perm)) {
>> +const char *perm_name;
>> +BlockdevPermission perm_bit;
>> +
>> +perm_name = qstring_get_try_str(qobject_to(QString, perm->value));
>> +if (!perm_name) {
>> +/* Omit the trailing . from the prefix */
>> +error_setg(errp, "%.*s expects a list of enum strings",
>> +   (int)(strlen(prefix) - 1), prefix);
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +perm_bit = qapi_enum_parse(_lookup, perm_name,
>> +   BLOCKDEV_PERMISSION__MAX, errp);
>> +if (perm_bit == BLOCKDEV_PERMISSION__MAX) {
>> +ret = -EINVAL;
>> +goto out;
>> +}
>> +
>> +*dest |= UINT64_C(1) << perm_bit;
>> +}
>> +
>> +out:
>> +qobject_unref(subqdict);
>> +qobject_unref(crumpled_subqdict);
>> +return ret;
>> +}
>> +
>> +static int blkdebug_parse_perms(BDRVBlkdebugState *s, QDict *options,
>> +Error **errp)
>> +{
>> +int ret;
>> +
>> +ret = blkdebug_parse_perm_list(>take_child_perms, options,
>> +   "take-child-perms.", errp);
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +
>> +ret = blkdebug_parse_perm_list(>unshare_child_perms, options,
>> +   "unshare-child-perms.", errp);
>> +if (ret < 0) {
>> +return ret;
>> +}
>> +
>> +return 0;
>> +}
> 
> 
> It's a pity that being described in json, these new parameters still not 
> parsed automatically..

That would require changing the whole bdrv_open() infrastructure to use
QAPI types.

>> +
>>   static QemuOptsList runtime_opts = {
> 
> and that we have to keep these runtime_opts everywhere, which duplicates json 
> definitions..

Well, it duplicates some json definitions.  I don’t add the new
parameters added in this patch to the list, because there is no point in
supporting them outside of blockdev-add.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/4] mirror: Do not dereference invalid pointers

2019-09-19 Thread Max Reitz

On 18.09.19 17:38, Vladimir Sementsov-Ogievskiy wrote:
> 12.09.2019 16:56, Max Reitz wrote:
>> Hi,
>>
>> The fix (patch 1) is pretty straightforward; patch 2 (which I need for
>> the test) may not be.
>>
>> The biggest problem with patch 2 is that you can use it to uncover where
>> our permission handling is broken.  For example, devising the test case
>> (patch 4) was very difficult because I kept running into the
>> _abort that mirror_exit_common() passes when dropping the
>> mirror_top_bs.
>>
>> The problem is that mirror_top_bs does not take the same permissions
>> that its parent takes.  Ergo using _abort when dropping it is
>> wrong: The parent may require more permissions that mirror_top_bs did,
>> and so dropping mirror_top_bs may fail.
>>
>> Now what’s really bad is that this cannot be fixed with our current
>> permission system.  mirror_top_bs was introduced precisely so it does
>> not take CONSISTENT_READ, but can still allow parents to take it (for
>> active commits).  But what if there is actually something besides the
>> mirror job that unshares CONSISTENT_READ?
>>
>>
>> Imagine this:
>>
>>mirror target BB   mirror source BB
>>| |
>>v v
>> mirror_top_bs -> top -> mid -> base
>>^
>>|
>>   other_parent
>>
>> The source BB unshares CONSISTENT_READ on the base.  mirror_top_bs
>> ensures that its parents can read from top even though top itself cannot
>> allow CONSISTENT_READ to be taken.  So far so good.
>>
>> But what if other_parent also unshares CONSISTENT_READ?  Then,
>> mirror_top_bs has no business allowing its parents to take it.
>>
>> No idea how to fix that.  (I suppose mirror_top_bs would need some way
>> to verify that there is no other party that has unshared CONSISTENT_READ
>> but its associated source BB.
> 
> May be we need grouped permissions?
> 
> Some way to define group of children, which may unshare read permission
> for other children (out of the group), but still children in group may
> have read permission?

Hm, is that different from my idea below where one of mirror_top's
children unshares the read permission, and another is allowed to take it
still?

(The problem is always that if some BDS has a parent that unshares this
permission, this condition propagates upwards through its other parents,
and we need to keep track of who unshared it in the first place.)

> But it don't work here as we are saying about children on different
> nodes.. And propagated through backing chain permissions..

Yep.

>>  In the future, we want the source BB to
>> go away and instead have the source be an immediate BdrvChild of
>> mirror_top_bs.  Maybe we can then build something into the block layer
>> so that a node can only restore CONSISTENT_READ when it was that node
>> that broke it?)
>>
>>
>> Anyway.  You can see something arising from this problem simply by
>> unsharing CONSISTENT_READ on the target node.  (Just drop the src-perm
>> node from the test I add in patch 4.)  Replacing the source with the
>> target will then work fine (because mirror_top_bs doesn’t care about
>> CONSISTENT_READ being removed), but then you cannot drop mirror_top_bs –
>> because its parent does want CONSISTENT_READ.  Thus, the _abort
>> aborts.
>>
>>
>> While this is a more special case, I have no idea how to fix this one
>> either.
>>
>>
>> Soo...  This series just fixes one thing, and leaves another unfixed
>> because I have no idea how to fix it.  Worse, it adds parameters to
>> blkdebug to actually see the problem.  Do we want to let blkdebug be
>> able to crash qemu (because of a bug in qemu)?
>>
> 
> blkdebug is for debugging and not used by end users like libvirt, yes?

Correct.

Max

>>
>> Max Reitz (4):
>>mirror: Do not dereference invalid pointers
>>blkdebug: Allow taking/unsharing permissions
>>iotests: Add @error to wait_until_completed
>>iotests: Add test for failing mirror complete
>>
>>   qapi/block-core.json  |  29 +-
>>   block/blkdebug.c  | 106 +-
>>   block/mirror.c|  13 +++--
>>   tests/qemu-iotests/041|  44 ++
>>   tests/qemu-iotests/041.out|   4 +-
>>   tests/qemu-iotests/iotests.py |  18 --
>>   6 files changed, 200 insertions(+), 14 deletions(-)
>>
> 
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH] iotests: Require Python 3.5 or later

2019-09-19 Thread John Snow




On 9/19/19 4:35 AM, Kevin Wolf wrote:
> Am 18.09.2019 um 20:49 hat John Snow geschrieben:
>>
>>
>> On 9/18/19 4:55 AM, Kevin Wolf wrote:
>>> Running iotests is not required to build QEMU, so we can have stricter
>>> version requirements for Python here and can make use of new features
>>> and drop compatibility code earlier.
>>>
>>> This makes qemu-iotests skip all Python tests if a Python version before
>>> 3.5 is used for the build.
>>>
>>> Suggested-by: Eduardo Habkost 
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>   tests/qemu-iotests/check | 14 +-
>>>   1 file changed, 13 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
>>> index 875399d79f..a68f414d6c 100755
>>> --- a/tests/qemu-iotests/check
>>> +++ b/tests/qemu-iotests/check
>>> @@ -633,6 +633,13 @@ then
>>>   export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
>>>   fi
>>> +# Note that if the Python conditional here evaluates True we will exit
>>> +# with status 1 which is a shell 'false' value.
>>> +python_usable=false
>>> +if ! $PYTHON -c 'import sys; sys.exit(sys.version_info >= (3,5))'; then
>>> +python_usable=true
>>> +fi
>>> +
>>
>> Do we want this as a temporary fix only until we can stipulate the same
>> version in the configure file?
> 
> I thought that maybe we should leave the code around so that at some
> later point, we could upgrade it to 3.6 (or something else) before QEMU
> as a whole does so.
> 
> In fact... Could we already require 3.6 now instead of using 3.5, which
> I think we only chose because of Debian Stretch (oldstable)?
> 
> Kevin
> 

I thought 3.5 was chosen because it's the EPEL version...?

No, I guess it's actually 3.6. Well! You won't catch me complaining
about that.

--js

Re: [PATCH v2 2/2] iotests: Remove Python 2 compatibility code

2019-09-19 Thread Thomas Huth

On 19/09/2019 18.29, Kevin Wolf wrote:
> Some scripts check the Python version number and have two code paths to
> accomodate both Python 2 and 3. Remove the code specific to Python 2 and
> assert the minimum version of 3.6 instead (check skips Python tests in
> this case, so the assertion would only ever trigger if a Python script
> is executed manually).
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/044   |  3 ---
>  tests/qemu-iotests/163   |  3 ---
>  tests/qemu-iotests/iotests.py| 13 +++--
>  tests/qemu-iotests/nbd-fault-injector.py |  7 +++
>  4 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/tests/qemu-iotests/044 b/tests/qemu-iotests/044
> index 05ea1f49c5..8b2afa2a11 100755
> --- a/tests/qemu-iotests/044
> +++ b/tests/qemu-iotests/044
> @@ -28,9 +28,6 @@ import struct
>  import subprocess
>  import sys
>  
> -if sys.version_info.major == 2:
> -range = xrange
> -
>  test_img = os.path.join(iotests.test_dir, 'test.img')
>  
>  class TestRefcountTableGrowth(iotests.QMPTestCase):
> diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
> index 081ccc8ac1..d94728e080 100755
> --- a/tests/qemu-iotests/163
> +++ b/tests/qemu-iotests/163
> @@ -21,9 +21,6 @@
>  import os, random, iotests, struct, qcow2, sys
>  from iotests import qemu_img, qemu_io, image_size
>  
> -if sys.version_info.major == 2:
> -range = xrange
> -
>  test_img = os.path.join(iotests.test_dir, 'test.img')
>  check_img = os.path.join(iotests.test_dir, 'check.img')
>  
> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
> index b26271187c..9fb5181c3d 100644
> --- a/tests/qemu-iotests/iotests.py
> +++ b/tests/qemu-iotests/iotests.py
> @@ -35,6 +35,7 @@ from collections import OrderedDict
>  sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 
> 'python'))
>  from qemu import qtest
>  
> +assert sys.version_info >= (3,6)
>  
>  # This will not work if arguments contain spaces but is necessary if we
>  # want to support the override options that ./check supports.
> @@ -250,10 +251,7 @@ def image_size(img):
>  return json.loads(r)['virtual-size']
>  
>  def is_str(val):
> -if sys.version_info.major >= 3:
> -return isinstance(val, str)
> -else:
> -return isinstance(val, str) or isinstance(val, unicode)
> +return isinstance(val, str)
>  
>  test_dir_re = re.compile(r"%s" % test_dir)
>  def filter_test_dir(msg):
> @@ -935,12 +933,7 @@ def execute_test(test_function=None,
>  else:
>  # We need to filter out the time taken from the output so that
>  # qemu-iotest can reliably diff the results against master output.
> -if sys.version_info.major >= 3:
> -output = io.StringIO()
> -else:
> -# io.StringIO is for unicode strings, which is not what
> -# 2.x's test runner emits.
> -output = io.BytesIO()
> +output = io.StringIO()
>  
>  logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
>  
> diff --git a/tests/qemu-iotests/nbd-fault-injector.py 
> b/tests/qemu-iotests/nbd-fault-injector.py
> index 6b2d659dee..43f095ceef 100755
> --- a/tests/qemu-iotests/nbd-fault-injector.py
> +++ b/tests/qemu-iotests/nbd-fault-injector.py
> @@ -48,10 +48,9 @@ import sys
>  import socket
>  import struct
>  import collections
> -if sys.version_info.major >= 3:
> -import configparser
> -else:
> -import ConfigParser as configparser
> +import configparser
> +
> +assert sys.version_info >= (3,6)
>  
>  FAKE_DISK_SIZE = 8 * 1024 * 1024 * 1024 # 8 GB
>  

Reviewed-by: Thomas Huth

Re: [PATCH v2 1/2] iotests: Require Python 3.6 or later

2019-09-19 Thread Thomas Huth

On 19/09/2019 18.29, Kevin Wolf wrote:
> Running iotests is not required to build QEMU, so we can have stricter
> version requirements for Python here and can make use of new features
> and drop compatibility code earlier.
> 
> This makes qemu-iotests skip all Python tests if a Python version before
> 3.6 is used for the build.
> 
> Suggested-by: Eduardo Habkost 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/check | 13 -
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 875399d79f..588c453a94 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -633,6 +633,12 @@ then
>  export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
>  fi
>  
> +python_usable=false
> +if $PYTHON -c 'import sys; sys.exit(0 if sys.version_info >= (3,6) else 1)'
> +then
> +python_usable=true
> +fi
> +
>  default_machine=$($QEMU_PROG -machine help | sed -n '/(default)/ s/ .*//p')
>  default_alias_machine=$($QEMU_PROG -machine help | \
> sed -n "/(alias of $default_machine)/ { s/ .*//p; q; }")
> @@ -809,7 +815,12 @@ do
>  start=$(_wallclock)
>  
>  if [ "$(head -n 1 "$source_iotests/$seq")" == "#!/usr/bin/env 
> python" ]; then
> -run_command="$PYTHON $seq"
> +if $python_usable; then
> +run_command="$PYTHON $seq"
> +else
> +run_command="false"
> +echo "Unsupported Python version" > $seq.notrun
> +fi
>  else
>  run_command="./$seq"
>  fi
> 

Reviewed-by: Thomas Huth

Re: [PATCH v2 2/2] iotests: Remove Python 2 compatibility code

2019-09-19 Thread Eduardo Habkost

On Thu, Sep 19, 2019 at 06:29:05PM +0200, Kevin Wolf wrote:
> Some scripts check the Python version number and have two code paths to
> accomodate both Python 2 and 3. Remove the code specific to Python 2 and
> assert the minimum version of 3.6 instead (check skips Python tests in
> this case, so the assertion would only ever trigger if a Python script
> is executed manually).
> 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

Re: [PATCH v2 1/2] iotests: Require Python 3.6 or later

2019-09-19 Thread Eduardo Habkost

On Thu, Sep 19, 2019 at 06:29:04PM +0200, Kevin Wolf wrote:
> Running iotests is not required to build QEMU, so we can have stricter
> version requirements for Python here and can make use of new features
> and drop compatibility code earlier.
> 
> This makes qemu-iotests skip all Python tests if a Python version before
> 3.6 is used for the build.
> 
> Suggested-by: Eduardo Habkost 
> Signed-off-by: Kevin Wolf 

Reviewed-by: Eduardo Habkost 

-- 
Eduardo

[PATCH v2 2/2] iotests: Remove Python 2 compatibility code

2019-09-19 Thread Kevin Wolf

Some scripts check the Python version number and have two code paths to
accomodate both Python 2 and 3. Remove the code specific to Python 2 and
assert the minimum version of 3.6 instead (check skips Python tests in
this case, so the assertion would only ever trigger if a Python script
is executed manually).

Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/044   |  3 ---
 tests/qemu-iotests/163   |  3 ---
 tests/qemu-iotests/iotests.py| 13 +++--
 tests/qemu-iotests/nbd-fault-injector.py |  7 +++
 4 files changed, 6 insertions(+), 20 deletions(-)

diff --git a/tests/qemu-iotests/044 b/tests/qemu-iotests/044
index 05ea1f49c5..8b2afa2a11 100755
--- a/tests/qemu-iotests/044
+++ b/tests/qemu-iotests/044
@@ -28,9 +28,6 @@ import struct
 import subprocess
 import sys
 
-if sys.version_info.major == 2:
-range = xrange
-
 test_img = os.path.join(iotests.test_dir, 'test.img')
 
 class TestRefcountTableGrowth(iotests.QMPTestCase):
diff --git a/tests/qemu-iotests/163 b/tests/qemu-iotests/163
index 081ccc8ac1..d94728e080 100755
--- a/tests/qemu-iotests/163
+++ b/tests/qemu-iotests/163
@@ -21,9 +21,6 @@
 import os, random, iotests, struct, qcow2, sys
 from iotests import qemu_img, qemu_io, image_size
 
-if sys.version_info.major == 2:
-range = xrange
-
 test_img = os.path.join(iotests.test_dir, 'test.img')
 check_img = os.path.join(iotests.test_dir, 'check.img')
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index b26271187c..9fb5181c3d 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -35,6 +35,7 @@ from collections import OrderedDict
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qtest
 
+assert sys.version_info >= (3,6)
 
 # This will not work if arguments contain spaces but is necessary if we
 # want to support the override options that ./check supports.
@@ -250,10 +251,7 @@ def image_size(img):
 return json.loads(r)['virtual-size']
 
 def is_str(val):
-if sys.version_info.major >= 3:
-return isinstance(val, str)
-else:
-return isinstance(val, str) or isinstance(val, unicode)
+return isinstance(val, str)
 
 test_dir_re = re.compile(r"%s" % test_dir)
 def filter_test_dir(msg):
@@ -935,12 +933,7 @@ def execute_test(test_function=None,
 else:
 # We need to filter out the time taken from the output so that
 # qemu-iotest can reliably diff the results against master output.
-if sys.version_info.major >= 3:
-output = io.StringIO()
-else:
-# io.StringIO is for unicode strings, which is not what
-# 2.x's test runner emits.
-output = io.BytesIO()
+output = io.StringIO()
 
 logging.basicConfig(level=(logging.DEBUG if debug else logging.WARN))
 
diff --git a/tests/qemu-iotests/nbd-fault-injector.py 
b/tests/qemu-iotests/nbd-fault-injector.py
index 6b2d659dee..43f095ceef 100755
--- a/tests/qemu-iotests/nbd-fault-injector.py
+++ b/tests/qemu-iotests/nbd-fault-injector.py
@@ -48,10 +48,9 @@ import sys
 import socket
 import struct
 import collections
-if sys.version_info.major >= 3:
-import configparser
-else:
-import ConfigParser as configparser
+import configparser
+
+assert sys.version_info >= (3,6)
 
 FAKE_DISK_SIZE = 8 * 1024 * 1024 * 1024 # 8 GB
 
-- 
2.20.1

[PATCH v2 1/2] iotests: Require Python 3.6 or later

2019-09-19 Thread Kevin Wolf

Running iotests is not required to build QEMU, so we can have stricter
version requirements for Python here and can make use of new features
and drop compatibility code earlier.

This makes qemu-iotests skip all Python tests if a Python version before
3.6 is used for the build.

Suggested-by: Eduardo Habkost 
Signed-off-by: Kevin Wolf 
---
 tests/qemu-iotests/check | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 875399d79f..588c453a94 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -633,6 +633,12 @@ then
 export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
 fi
 
+python_usable=false
+if $PYTHON -c 'import sys; sys.exit(0 if sys.version_info >= (3,6) else 1)'
+then
+python_usable=true
+fi
+
 default_machine=$($QEMU_PROG -machine help | sed -n '/(default)/ s/ .*//p')
 default_alias_machine=$($QEMU_PROG -machine help | \
sed -n "/(alias of $default_machine)/ { s/ .*//p; q; }")
@@ -809,7 +815,12 @@ do
 start=$(_wallclock)
 
 if [ "$(head -n 1 "$source_iotests/$seq")" == "#!/usr/bin/env python" 
]; then
-run_command="$PYTHON $seq"
+if $python_usable; then
+run_command="$PYTHON $seq"
+else
+run_command="false"
+echo "Unsupported Python version" > $seq.notrun
+fi
 else
 run_command="./$seq"
 fi
-- 
2.20.1

[PATCH v2 0/2] iotests: Require Python 3.6 or later

2019-09-19 Thread Kevin Wolf

v2:

- Provide the right exit code from Python instead of having a
  potentially confusing negation in the shell script

- Raised the minimal version to 3.6. If we're going to use a different
  version than QEMU as a whole anyway, we can use a version that suits
  us best. 3.5 would only be for Debian Stretch, and we don't really
  care that much about running Python test cases on it.

- Added a patch to remove Python 2 compatibility code

Kevin Wolf (2):
  iotests: Require Python 3.6 or later
  iotests: Remove Python 2 compatibility code

 tests/qemu-iotests/044   |  3 ---
 tests/qemu-iotests/163   |  3 ---
 tests/qemu-iotests/check | 13 -
 tests/qemu-iotests/iotests.py| 13 +++--
 tests/qemu-iotests/nbd-fault-injector.py |  7 +++
 5 files changed, 18 insertions(+), 21 deletions(-)

-- 
2.20.1

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 18:50, Daniel P. Berrangé wrote:
> On Thu, Sep 19, 2019 at 10:24:20AM -0500, Eric Blake wrote:
>> On 9/19/19 9:49 AM, Daniel P. Berrangé wrote:
>>
 ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
 **errp parameter is dirt-simple to explain.  It has no performance
 penalty if the user passed in a normal error or error_abort (the cost of
 an 'if' hidden in the macro is probably negligible compared to
 everything else we do), and has no semantic penalty if the user passed
 in NULL or error_fatal (we now get the behavior we want with less
 boilerplate).

 Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
 can I omit it?' does not provide the same simplicity.
>>>
>>> The flipside is that MAKE_ERRP_SAFE hides a bunch of logic, so you don't
>>> really know what its doing without looking at it, and this is QEMU
>>> custom concept so one more thing to learn for new contributors.
>>>
>>> While I think it is a nice trick, personally I think we would be better
>>> off if we simply used a code pattern which does not require de-referencing
>>> 'errp' at all, aside from exceptional cases. IOW, no added macro in 95%
>>> of all our methods using Error **errp.
>>
>> If 100% of our callsites use the macro, then new contributors will
>> quickly learn by observation alone that the macro usage must be
>> important on any new function taking Error **errp, whether or not they
>> actually read the macro to see what it does.  If only 5% of our
>> callsites use the macro, it's harder to argue that a new user will pick
>> up on the nuances by observation alone (presumably, our docs would also
>> spell it out, but we know that not everyone reads those...).
> 
> To get some slightly less made-up stats, I did some searching:
> 
>- 4408  methods with an 'errp' parameter declared
> 
>   git grep 'Error \*\*errp'|  wc -l
> 
>- 696 methods with an 'Error *local' declared (what other names
>  do we use for this?)
> 
>   git grep 'Error \*local' | wc -l
> 
>- 1243 methods with an 'errp' parameter which have void
>  return value (fuzzy number - my matching is quite crude)
> 
>   git grep 'Error \*\*errp'|  grep -E '(^| )void' | wc -l
> 
>- 11 methods using error_append_hint with a local_err
> 
>   git grep append_hint | grep local | wc -l

why do you count only with local? Greg's series is here to bring local to all
functions with append_hint:

# git grep append_hint | wc -l
85

(still, some functions may append_hint several times, so the number should be 
less)
and file list is big:

# git grep -l append_hint
block.c
block/backup.c
block/dirty-bitmap.c
block/file-posix.c
block/gluster.c
block/qcow.c
block/qcow2.c
block/vhdx-log.c
block/vpc.c
chardev/spice.c
hw/9pfs/9p-local.c
hw/9pfs/9p-proxy.c
hw/arm/msf2-soc.c
hw/arm/virt.c
hw/audio/intel-hda.c
hw/intc/arm_gicv3_kvm.c
hw/misc/msf2-sysreg.c
hw/pci-bridge/pci_bridge_dev.c
hw/pci-bridge/pcie_root_port.c
hw/ppc/mac_newworld.c
hw/ppc/spapr.c
hw/ppc/spapr_irq.c
hw/ppc/spapr_pci.c
hw/rdma/vmw/pvrdma_main.c
hw/s390x/s390-ccw.c
hw/scsi/megasas.c
hw/scsi/mptsas.c
hw/scsi/scsi-disk.c
hw/scsi/scsi-generic.c
hw/usb/ccid-card-emulated.c
hw/usb/hcd-xhci.c
hw/vfio/common.c
hw/vfio/pci-quirks.c
hw/vfio/pci.c
hw/virtio/virtio-pci.c
hw/xen/xen_pt.c
hw/xen/xen_pt_config_init.c
include/qapi/error.h
migration/migration.c
nbd/client.c
nbd/server.c
qdev-monitor.c
target/arm/cpu64.c
target/ppc/kvm.c
util/error.c
util/qemu-option.c
util/qemu-sockets.c


Also, conversion to use macro everywhere may be done (seems so) by coccinelle 
script.
But conversion which you mean, only by hand I think. Converting 1243 methods by 
hand
is a huge task..


I think there are three consistent ways:

1. Use macro everywhere
2. Drop error_append_hint
3. Drop error_fatal


> 
> 
> This suggests to me, that if we used the 'return 0 / return -1' convention
> to indicate failure for the methods which currently return void, we could
> potentially only have 11 cases where a local error object is genuinely
> needed.
> 
> If my numbers are at all accurate, I still believe we're better off
> changing the return type and eliminating essentially all use of local
> error variables, as void funcs/local error objects appear to be the
> minority coding pattern.
> 
> 
>>> There are lots of neat things we could do with auto-cleanup functions we
>>> I think we need to be wary of hiding too much cleverness behind macros
>>> when doing so overall.
>>
>> The benefit of getting rid of the 'Error *local_err = NULL; ...
>> error_propagate()' boilerplate is worth the cleverness, in my opinion,
>> but especially if also accompanied by CI coverage that we abide by our
>> new rules.
> 
> At least we're both wanting to eliminate the local error + propagation.
> The difference is whether we're genuinely eliminating it, or just hiding
> it from view via auto-cleanup & macro magic
> 
> Regards,
> Daniel
> 


-- 
Best regards,

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Daniel P . Berrangé

On Thu, Sep 19, 2019 at 10:24:20AM -0500, Eric Blake wrote:
> On 9/19/19 9:49 AM, Daniel P. Berrangé wrote:
> 
> >> ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
> >> **errp parameter is dirt-simple to explain.  It has no performance
> >> penalty if the user passed in a normal error or error_abort (the cost of
> >> an 'if' hidden in the macro is probably negligible compared to
> >> everything else we do), and has no semantic penalty if the user passed
> >> in NULL or error_fatal (we now get the behavior we want with less
> >> boilerplate).
> >>
> >> Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
> >> can I omit it?' does not provide the same simplicity.
> > 
> > The flipside is that MAKE_ERRP_SAFE hides a bunch of logic, so you don't
> > really know what its doing without looking at it, and this is QEMU
> > custom concept so one more thing to learn for new contributors.
> > 
> > While I think it is a nice trick, personally I think we would be better
> > off if we simply used a code pattern which does not require de-referencing
> > 'errp' at all, aside from exceptional cases. IOW, no added macro in 95%
> > of all our methods using Error **errp.
> 
> If 100% of our callsites use the macro, then new contributors will
> quickly learn by observation alone that the macro usage must be
> important on any new function taking Error **errp, whether or not they
> actually read the macro to see what it does.  If only 5% of our
> callsites use the macro, it's harder to argue that a new user will pick
> up on the nuances by observation alone (presumably, our docs would also
> spell it out, but we know that not everyone reads those...).

To get some slightly less made-up stats, I did some searching:

  - 4408  methods with an 'errp' parameter declared

 git grep 'Error \*\*errp'|  wc -l

  - 696 methods with an 'Error *local' declared (what other names
do we use for this?)

 git grep 'Error \*local' | wc -l

  - 1243 methods with an 'errp' parameter which have void
return value (fuzzy number - my matching is quite crude)

 git grep 'Error \*\*errp'|  grep -E '(^| )void' | wc -l

  - 11 methods using error_append_hint with a local_err

 git grep append_hint | grep local | wc -l


This suggests to me, that if we used the 'return 0 / return -1' convention
to indicate failure for the methods which currently return void, we could
potentially only have 11 cases where a local error object is genuinely
needed. 

If my numbers are at all accurate, I still believe we're better off
changing the return type and eliminating essentially all use of local
error variables, as void funcs/local error objects appear to be the
minority coding pattern.


> > There are lots of neat things we could do with auto-cleanup functions we
> > I think we need to be wary of hiding too much cleverness behind macros
> > when doing so overall.
> 
> The benefit of getting rid of the 'Error *local_err = NULL; ...
> error_propagate()' boilerplate is worth the cleverness, in my opinion,
> but especially if also accompanied by CI coverage that we abide by our
> new rules.

At least we're both wanting to eliminate the local error + propagation.
The difference is whether we're genuinely eliminating it, or just hiding
it from view via auto-cleanup & macro magic

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 18:24, Eric Blake wrote:
> On 9/19/19 9:49 AM, Daniel P. Berrangé wrote:
> 
>>> ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
>>> **errp parameter is dirt-simple to explain.  It has no performance
>>> penalty if the user passed in a normal error or error_abort (the cost of
>>> an 'if' hidden in the macro is probably negligible compared to
>>> everything else we do), and has no semantic penalty if the user passed
>>> in NULL or error_fatal (we now get the behavior we want with less
>>> boilerplate).
>>>
>>> Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
>>> can I omit it?' does not provide the same simplicity.
>>
>> The flipside is that MAKE_ERRP_SAFE hides a bunch of logic, so you don't
>> really know what its doing without looking at it, and this is QEMU
>> custom concept so one more thing to learn for new contributors.
>>
>> While I think it is a nice trick, personally I think we would be better
>> off if we simply used a code pattern which does not require de-referencing
>> 'errp' at all, aside from exceptional cases. IOW, no added macro in 95%
>> of all our methods using Error **errp.
> 
> If 100% of our callsites use the macro, then new contributors will
> quickly learn by observation alone that the macro usage must be
> important on any new function taking Error **errp, whether or not they
> actually read the macro to see what it does.  If only 5% of our
> callsites use the macro, it's harder to argue that a new user will pick
> up on the nuances by observation alone (presumably, our docs would also
> spell it out, but we know that not everyone reads those...).

When I was a new contributor, it was not simple to understand, when and why we
need to use local_err, keeping in mind that (this is not only reason, but also
a consequence) we have places where local_err is used for no reason.

> 
> However, if we can automate syntax checks to reach a near-100% accuracy,
> we don't HAVE to worry about whether a new programmer picks up on the
> nuances by observation, because they will instead pick up the nuances by
> CI rejection messages.  This is true for _either_ style:
> 
> 100% use of the macro: CI message would be "you added a method with a
> parameter 'Error **errp' but forgot to use MAKE_ERRP_SAFE"
> 
> use of the macro only where necessary (namely, functions that contain
> '*errp' and/or 'error_append_hint'): CI message would either be "your
> function body requires MAKE_ERRP_SAFE but you forgot it" or "your
> function body does not require MAKE_ERRP_SAFE but you forgot to remove
> it".  And this would work even for experienced committers editing
> existing functions (such as ongoing work to convert away from 'void
> child(errp); if (*errp)' and towards 'if (int child(errp) < 0)').
> 
> Writing the CI engine for the first case is easy, writing it for the
> second is a bit harder, but still seems tractable (since, for any
> function with an 'Error **errp' parameter, it should be easy to scan the
> function body for instances of '*errp' or 'error_append_hint', as well
> as to scan whether MAKE_ERRP_SAFE was present or absent accordingly).
> 
>> There are lots of neat things we could do with auto-cleanup functions we
>> I think we need to be wary of hiding too much cleverness behind macros
>> when doing so overall.
> 
> The benefit of getting rid of the 'Error *local_err = NULL; ...
> error_propagate()' boilerplate is worth the cleverness, in my opinion,
> but especially if also accompanied by CI coverage that we abide by our
> new rules.
> 
> I'd really like to hear Markus' opinion on the matter, as Error maintainer.
> 


-- 
Best regards,
Vladimir

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Eric Blake

On 9/19/19 9:49 AM, Daniel P. Berrangé wrote:

>> ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
>> **errp parameter is dirt-simple to explain.  It has no performance
>> penalty if the user passed in a normal error or error_abort (the cost of
>> an 'if' hidden in the macro is probably negligible compared to
>> everything else we do), and has no semantic penalty if the user passed
>> in NULL or error_fatal (we now get the behavior we want with less
>> boilerplate).
>>
>> Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
>> can I omit it?' does not provide the same simplicity.
> 
> The flipside is that MAKE_ERRP_SAFE hides a bunch of logic, so you don't
> really know what its doing without looking at it, and this is QEMU
> custom concept so one more thing to learn for new contributors.
> 
> While I think it is a nice trick, personally I think we would be better
> off if we simply used a code pattern which does not require de-referencing
> 'errp' at all, aside from exceptional cases. IOW, no added macro in 95%
> of all our methods using Error **errp.

If 100% of our callsites use the macro, then new contributors will
quickly learn by observation alone that the macro usage must be
important on any new function taking Error **errp, whether or not they
actually read the macro to see what it does.  If only 5% of our
callsites use the macro, it's harder to argue that a new user will pick
up on the nuances by observation alone (presumably, our docs would also
spell it out, but we know that not everyone reads those...).

However, if we can automate syntax checks to reach a near-100% accuracy,
we don't HAVE to worry about whether a new programmer picks up on the
nuances by observation, because they will instead pick up the nuances by
CI rejection messages.  This is true for _either_ style:

100% use of the macro: CI message would be "you added a method with a
parameter 'Error **errp' but forgot to use MAKE_ERRP_SAFE"

use of the macro only where necessary (namely, functions that contain
'*errp' and/or 'error_append_hint'): CI message would either be "your
function body requires MAKE_ERRP_SAFE but you forgot it" or "your
function body does not require MAKE_ERRP_SAFE but you forgot to remove
it".  And this would work even for experienced committers editing
existing functions (such as ongoing work to convert away from 'void
child(errp); if (*errp)' and towards 'if (int child(errp) < 0)').

Writing the CI engine for the first case is easy, writing it for the
second is a bit harder, but still seems tractable (since, for any
function with an 'Error **errp' parameter, it should be easy to scan the
function body for instances of '*errp' or 'error_append_hint', as well
as to scan whether MAKE_ERRP_SAFE was present or absent accordingly).

> There are lots of neat things we could do with auto-cleanup functions we
> I think we need to be wary of hiding too much cleverness behind macros
> when doing so overall.

The benefit of getting rid of the 'Error *local_err = NULL; ...
error_propagate()' boilerplate is worth the cleverness, in my opinion,
but especially if also accompanied by CI coverage that we abide by our
new rules.

I'd really like to hear Markus' opinion on the matter, as Error maintainer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 17:44, Eric Blake wrote:
> On 9/19/19 9:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
>>>
>>> To the same topic, of minimization: should we always call MAKE_ERRP_SAFE at 
>>> function top, or only
>>> in block, where it is needed (assume, we dereference it only inside some 
>>> "if" or "while"?
>>> Kevin, is something bad in propagation, when it not related to error_abort?
>>>
>>>
>>
>> Or, even, we may use MAKE_ERRP_SAFE on every function, which have Error 
>> **errp argument, even if we neither
>> dereference it nor append hints? Is it what you propose by "SINGLE 
>> paradigm"? It's of course simpler to script,
>> to check in checkpatch and to maintain.
> 
> Yes. The simpler we make the rules, and the less boilerplate it entails,
> the more likely that we can:
> 
> a) avoid exceptions and corner cases that cost review time
> b) automate the conversion into complying with the rule
> c) codify those rules in syntax check to ensure they are followed
> post-conversion
> 
> ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
> **errp parameter is dirt-simple to explain.  It has no performance
> penalty if the user passed in a normal error or error_abort (the cost of
> an 'if' hidden in the macro is probably negligible compared to
> everything else we do), and has no semantic penalty if the user passed
> in NULL or error_fatal (we now get the behavior we want with less
> boilerplate).
> 
> Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
> can I omit it?' does not provide the same simplicity.
> 

Interesting: it's another story, but with this macro used in each errp-function 
we can collect
the whole call-stack of the error int Error object, and report it.
It may be not good for end-user, but very useful for testing.

Or is it bad idea? Anyway, I often have the problem: I have some error 
reported, and need to
understand where was it from.. git grep helps, but backtrace would be a lot 
better.

-- 
Best regards,
Vladimir

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 17:12, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 16:40, Eric Blake wrote:
>> On 9/19/19 4:17 AM, Kevin Wolf wrote:
>>> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
 On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> + */
> +#define MAKE_ERRP_SAFE(errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> { \
> +    (errp) = &__auto_errp_prop.local_err; \
> +}

 Not written to take a trailing semicolon in the caller.

 You could even set __auto_errp_prop unconditionally rather than trying
 to reuse incoming errp (the difference being that error_propagate() gets
 called more frequently).
>>>
>>> I think this difference is actually a problem.
>>>
>>> When debugging things, I hate error_propagate(). It means that the Error
>>> (specifically its fields src/func/line) points to the outermost
>>> error_propagate() rather than the place where the error really happened.
>>> It also makes error_abort completely useless because at the point where
>>> the process gets aborted, the interesting information is already lost.
>>
>> Okay, based on that, I see the following desirable semantics:
>>
>> Caller: one of 4 calling paradigms:
>>
>> pass errp=NULL (we don't care about failures)
>> pass errp=_abort (we want to abort() as soon as possible as close
>> to the real problem as possible)
>> pass errp=_fatal (we want to exit(), but only after collecting as
>> much information as possible)
>> pass errp = anything else (we are collecting an error for other reasons,
>> we may report it or let the caller decide or ...)
>>
>> Callee: we want a SINGLE paradigm:
>>
>> func (Error **errp)
>> {
>>  MAKE_ERRP_SAFE();
>>
>>  now we can pass errp to any child function, test '*errp', or do
>> anything else, and we DON'T have to call error_propagate.
>>
>> I think that means we need:
>>
>> #define MAKE_ERRP_SAFE() \
>>    g_auto(...) __auto_errp = { .errp = errp }; \
>>    do { \
>>  if (!errp || errp == _fatal) { errp = &__auto_errp.local; } \
>>    } while (0)
>>
>> So back to the caller semantics:
>>
>> if the caller passed NULL, we've redirected errp locally so that we can
>> use *errp at will; the auto-cleanup will free our local error.
>>
>> if the caller passed _abort, we keep errp unchanged.  *errp tests
>> will never trigger, because we'll have already aborted in the child on
>> the original errp, giving developers the best stack trace.
>>
>> if the caller passed _fatal, we redirect errp.  auto-cleanup will
>> then error_propagate that back to the caller, producing as much nice
>> information as possible.
>>
>> if the caller passed anything else, we keep errp unchanged, so no extra
>> error_propagate in the mix.
>>
>>>
>>> So I'd really like to restrict the use of error_propagate() to places
>>> where it's absolutely necessary. Unless, of course, you can fix these
>>> practical problems that error_propagate() causes for debugging.
>>>
>>> In fact, in the context of Greg's series, I think we really only need to
>>> support hints for error_fatal, which are cases that users are supposed
>>> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
>>> are things that are never supposed to happen. A good stack trace is more
>>> important there than adding a hint to the message.
>>
>> We also want to handle the caller passing NULL, so that we no longer
>> have to introduce 'Error *local_error = NULL' everywhere.
>>
> 
> With my plan of two different macro, I at least messed the case when we need
> both dereferencing and hints, which means third macro, or one macro with 
> parameters,
> saying what to wrap.
> 
> And my aim was to follow the idea of "do propagation only if it really 
> necessary in this case".
> 
> But may be you are right, and we shouldn't care so much.
> 
> 1. What is bad, if we wrap NULL, when we only want to use hints?
> Seems nothing. Some extra actions on error path, but who cares about it?
> 
> 2. What is bad, if we wrap error_fatal, when we only want to dereference, and 
> don't use hints?
> Seems nothing again, on error path we will return from higher level, and a 
> bit of extra work, but nothing worse..
> 
> So I tend to agree. But honestly, I didn't understand first part of Kevin's 
> paragraph against propagation,
> so, may be he have more reasons to minimize number of cases when we propagate.
> 
> To the same topic, of minimization: should we always call MAKE_ERRP_SAFE at 
> function top, or only
> in block, where it is needed (assume, we dereference it only inside some "if" 
> or "while"?
> Kevin, is something bad in propagation, when it not related to error_abort?
> 
> 

Or, even, we may use MAKE_ERRP_SAFE on every function, which have Error **errp 
argument, even if we neither
dereference it nor append hints? Is it what you propose by "SINGLE paradigm"? 
It's of course simpler to script,
to

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Kevin Wolf

Am 19.09.2019 um 16:13 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 19.09.2019 16:40, Eric Blake wrote:
> > On 9/19/19 4:17 AM, Kevin Wolf wrote:
> >> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
> >>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>  + */
>  +#define MAKE_ERRP_SAFE(errp) \
>  +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>  +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
>  { \
>  +(errp) = &__auto_errp_prop.local_err; \
>  +}
> >>>
> >>> Not written to take a trailing semicolon in the caller.
> >>>
> >>> You could even set __auto_errp_prop unconditionally rather than trying
> >>> to reuse incoming errp (the difference being that error_propagate() gets
> >>> called more frequently).
> >>
> >> I think this difference is actually a problem.
> >>
> >> When debugging things, I hate error_propagate(). It means that the Error
> >> (specifically its fields src/func/line) points to the outermost
> >> error_propagate() rather than the place where the error really happened.
> >> It also makes error_abort completely useless because at the point where
> >> the process gets aborted, the interesting information is already lost.
> > 
> > Okay, based on that, I see the following desirable semantics:
> > 
> > Caller: one of 4 calling paradigms:
> > 
> > pass errp=NULL (we don't care about failures)
> > pass errp=_abort (we want to abort() as soon as possible as close
> > to the real problem as possible)
> > pass errp=_fatal (we want to exit(), but only after collecting as
> > much information as possible)
> > pass errp = anything else (we are collecting an error for other reasons,
> > we may report it or let the caller decide or ...)
> > 
> > Callee: we want a SINGLE paradigm:
> > 
> > func (Error **errp)
> > {
> >  MAKE_ERRP_SAFE();
> > 
> >  now we can pass errp to any child function, test '*errp', or do
> > anything else, and we DON'T have to call error_propagate.
> > 
> > I think that means we need:
> > 
> > #define MAKE_ERRP_SAFE() \
> >g_auto(...) __auto_errp = { .errp = errp }; \
> >do { \
> >  if (!errp || errp == _fatal) { errp = &__auto_errp.local; } \
> >} while (0)
> > 
> > So back to the caller semantics:
> > 
> > if the caller passed NULL, we've redirected errp locally so that we can
> > use *errp at will; the auto-cleanup will free our local error.
> > 
> > if the caller passed _abort, we keep errp unchanged.  *errp tests
> > will never trigger, because we'll have already aborted in the child on
> > the original errp, giving developers the best stack trace.
> > 
> > if the caller passed _fatal, we redirect errp.  auto-cleanup will
> > then error_propagate that back to the caller, producing as much nice
> > information as possible.
> > 
> > if the caller passed anything else, we keep errp unchanged, so no extra
> > error_propagate in the mix.
> > 
> >>
> >> So I'd really like to restrict the use of error_propagate() to places
> >> where it's absolutely necessary. Unless, of course, you can fix these
> >> practical problems that error_propagate() causes for debugging.
> >>
> >> In fact, in the context of Greg's series, I think we really only need to
> >> support hints for error_fatal, which are cases that users are supposed
> >> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
> >> are things that are never supposed to happen. A good stack trace is more
> >> important there than adding a hint to the message.
> > 
> > We also want to handle the caller passing NULL, so that we no longer
> > have to introduce 'Error *local_error = NULL' everywhere.
> > 
> 
> With my plan of two different macro, I at least messed the case when we need
> both dereferencing and hints, which means third macro, or one macro with 
> parameters,
> saying what to wrap.
> 
> And my aim was to follow the idea of "do propagation only if it really 
> necessary in this case".
> 
> But may be you are right, and we shouldn't care so much.
> 
> 1. What is bad, if we wrap NULL, when we only want to use hints?
> Seems nothing. Some extra actions on error path, but who cares about it?
> 
> 2. What is bad, if we wrap error_fatal, when we only want to dereference, and 
> don't use hints?
> Seems nothing again, on error path we will return from higher level, and a 
> bit of extra work, but nothing worse..
> 
> So I tend to agree. But honestly, I didn't understand first part of Kevin's 
> paragraph against propagation,
> so, may be he have more reasons to minimize number of cases when we propagate.

I think my concerns were really only about error_abort and "normal"
non-NULL errp losing some information about the origin of the error. And
from this thread, it seems that I misremebered and the normal one is
actually supposed to just work.

In any case, wrapping NULL and error_fatal should be fine, so I agree
that a single macro should do.

> To the same topic, of minimization: should we

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Daniel P . Berrangé

On Thu, Sep 19, 2019 at 09:44:14AM -0500, Eric Blake wrote:
> On 9/19/19 9:30 AM, Vladimir Sementsov-Ogievskiy wrote:
> 
> >>
> >> To the same topic, of minimization: should we always call MAKE_ERRP_SAFE 
> >> at function top, or only
> >> in block, where it is needed (assume, we dereference it only inside some 
> >> "if" or "while"?
> >> Kevin, is something bad in propagation, when it not related to error_abort?
> >>
> >>
> > 
> > Or, even, we may use MAKE_ERRP_SAFE on every function, which have Error 
> > **errp argument, even if we neither
> > dereference it nor append hints? Is it what you propose by "SINGLE 
> > paradigm"? It's of course simpler to script,
> > to check in checkpatch and to maintain.
> 
> Yes. The simpler we make the rules, and the less boilerplate it entails,
> the more likely that we can:
> 
> a) avoid exceptions and corner cases that cost review time
> b) automate the conversion into complying with the rule
> c) codify those rules in syntax check to ensure they are followed
> post-conversion
> 
> ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
> **errp parameter is dirt-simple to explain.  It has no performance
> penalty if the user passed in a normal error or error_abort (the cost of
> an 'if' hidden in the macro is probably negligible compared to
> everything else we do), and has no semantic penalty if the user passed
> in NULL or error_fatal (we now get the behavior we want with less
> boilerplate).
> 
> Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
> can I omit it?' does not provide the same simplicity.

The flipside is that MAKE_ERRP_SAFE hides a bunch of logic, so you don't
really know what its doing without looking at it, and this is QEMU
custom concept so one more thing to learn for new contributors.

While I think it is a nice trick, personally I think we would be better
off if we simply used a code pattern which does not require de-referencing
'errp' at all, aside from exceptional cases. IOW, no added macro in 95%
of all our methods using Error **errp.

There are lots of neat things we could do with auto-cleanup functions we
I think we need to be wary of hiding too much cleverness behind macros
when doing so overall.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [RFC] error: auto propagated local_err

2019-09-19 Thread Eric Blake

On 9/19/19 9:30 AM, Vladimir Sementsov-Ogievskiy wrote:

>>
>> To the same topic, of minimization: should we always call MAKE_ERRP_SAFE at 
>> function top, or only
>> in block, where it is needed (assume, we dereference it only inside some 
>> "if" or "while"?
>> Kevin, is something bad in propagation, when it not related to error_abort?
>>
>>
> 
> Or, even, we may use MAKE_ERRP_SAFE on every function, which have Error 
> **errp argument, even if we neither
> dereference it nor append hints? Is it what you propose by "SINGLE paradigm"? 
> It's of course simpler to script,
> to check in checkpatch and to maintain.

Yes. The simpler we make the rules, and the less boilerplate it entails,
the more likely that we can:

a) avoid exceptions and corner cases that cost review time
b) automate the conversion into complying with the rule
c) codify those rules in syntax check to ensure they are followed
post-conversion

ALWAYS using MAKE_ERRP_SAFE() on entry to any function that has an Error
**errp parameter is dirt-simple to explain.  It has no performance
penalty if the user passed in a normal error or error_abort (the cost of
an 'if' hidden in the macro is probably negligible compared to
everything else we do), and has no semantic penalty if the user passed
in NULL or error_fatal (we now get the behavior we want with less
boilerplate).

Having to think 'does this method require me to use MAKE_ERRP_SAFE, or
can I omit it?' does not provide the same simplicity.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Eric Blake

On 9/19/19 9:13 AM, Vladimir Sementsov-Ogievskiy wrote:

> 
> With my plan of two different macro, I at least messed the case when we need
> both dereferencing and hints, which means third macro, or one macro with 
> parameters,
> saying what to wrap.
> 
> And my aim was to follow the idea of "do propagation only if it really 
> necessary in this case".
> 
> But may be you are right, and we shouldn't care so much.
> 
> 1. What is bad, if we wrap NULL, when we only want to use hints?
> Seems nothing. Some extra actions on error path, but who cares about it?
> 
> 2. What is bad, if we wrap error_fatal, when we only want to dereference, and 
> don't use hints?
> Seems nothing again, on error path we will return from higher level, and a 
> bit of extra work, but nothing worse..
> 
> So I tend to agree. But honestly, I didn't understand first part of Kevin's 
> paragraph against propagation,
> so, may be he have more reasons to minimize number of cases when we propagate.

Here's the problem.

error_propagate(_abort, ...)

produces an abort() with a stack trace tied to your CURRENT location,
not the location at which the error was first detected.  We DO copy the
function name and line number from the original detection, but we have
moved on since that point in time, so the backtrace does not match the
reported location in the error struct.  So for error_abort, you want to
abort as soon as possible, without any intermediate error_propagate - we
use this to flag programmer errors, and want to make life easy for the
programmer.

But for error_propagate(_fatal, ...), we WANT to delay things to
as late as possible, in order to gather up as many additional
append_hints as possible, before finally telling the user their problem.
 In that case, we don't care about the stack trace or even the function
and line number of the failure, but about giving the user as much as we
can to help them fix their command-line problem.

> 
> To the same topic, of minimization: should we always call MAKE_ERRP_SAFE at 
> function top, or only
> in block, where it is needed (assume, we dereference it only inside some "if" 
> or "while"?
> Kevin, is something bad in propagation, when it not related to error_abort?

error_propagate() serves two purposes:

It is a crutch to make it easier to paper over passing NULL as errp (in
that case, we substitute a non-NULL pointer, then we can use 'if
(local_error)', and error_propagate() then frees local_error).  Hiding
this crutch behind a macro is desirable (less boilerplate).

It helps transfer from one error to another.  In this form, we are fine
using it for error_fatal (because we WANT the transfer to happen as late
as possible, because once we transfer the program exits so we can't add
any more hints), but not for error_abort (because we don't want to lose
context by transferring), and wasteful at other times (write directly to
the original error, instead of having to transfer).  So again, hiding it
in a macro means we no longer have to call it directly, but only in the
places where it helps.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 16:40, Eric Blake wrote:
> On 9/19/19 4:17 AM, Kevin Wolf wrote:
>> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
>>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
 + */
 +#define MAKE_ERRP_SAFE(errp) \
 +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
 +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { 
 \
 +(errp) = &__auto_errp_prop.local_err; \
 +}
>>>
>>> Not written to take a trailing semicolon in the caller.
>>>
>>> You could even set __auto_errp_prop unconditionally rather than trying
>>> to reuse incoming errp (the difference being that error_propagate() gets
>>> called more frequently).
>>
>> I think this difference is actually a problem.
>>
>> When debugging things, I hate error_propagate(). It means that the Error
>> (specifically its fields src/func/line) points to the outermost
>> error_propagate() rather than the place where the error really happened.
>> It also makes error_abort completely useless because at the point where
>> the process gets aborted, the interesting information is already lost.
> 
> Okay, based on that, I see the following desirable semantics:
> 
> Caller: one of 4 calling paradigms:
> 
> pass errp=NULL (we don't care about failures)
> pass errp=_abort (we want to abort() as soon as possible as close
> to the real problem as possible)
> pass errp=_fatal (we want to exit(), but only after collecting as
> much information as possible)
> pass errp = anything else (we are collecting an error for other reasons,
> we may report it or let the caller decide or ...)
> 
> Callee: we want a SINGLE paradigm:
> 
> func (Error **errp)
> {
>  MAKE_ERRP_SAFE();
> 
>  now we can pass errp to any child function, test '*errp', or do
> anything else, and we DON'T have to call error_propagate.
> 
> I think that means we need:
> 
> #define MAKE_ERRP_SAFE() \
>g_auto(...) __auto_errp = { .errp = errp }; \
>do { \
>  if (!errp || errp == _fatal) { errp = &__auto_errp.local; } \
>} while (0)
> 
> So back to the caller semantics:
> 
> if the caller passed NULL, we've redirected errp locally so that we can
> use *errp at will; the auto-cleanup will free our local error.
> 
> if the caller passed _abort, we keep errp unchanged.  *errp tests
> will never trigger, because we'll have already aborted in the child on
> the original errp, giving developers the best stack trace.
> 
> if the caller passed _fatal, we redirect errp.  auto-cleanup will
> then error_propagate that back to the caller, producing as much nice
> information as possible.
> 
> if the caller passed anything else, we keep errp unchanged, so no extra
> error_propagate in the mix.
> 
>>
>> So I'd really like to restrict the use of error_propagate() to places
>> where it's absolutely necessary. Unless, of course, you can fix these
>> practical problems that error_propagate() causes for debugging.
>>
>> In fact, in the context of Greg's series, I think we really only need to
>> support hints for error_fatal, which are cases that users are supposed
>> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
>> are things that are never supposed to happen. A good stack trace is more
>> important there than adding a hint to the message.
> 
> We also want to handle the caller passing NULL, so that we no longer
> have to introduce 'Error *local_error = NULL' everywhere.
> 

With my plan of two different macro, I at least messed the case when we need
both dereferencing and hints, which means third macro, or one macro with 
parameters,
saying what to wrap.

And my aim was to follow the idea of "do propagation only if it really 
necessary in this case".

But may be you are right, and we shouldn't care so much.

1. What is bad, if we wrap NULL, when we only want to use hints?
Seems nothing. Some extra actions on error path, but who cares about it?

2. What is bad, if we wrap error_fatal, when we only want to dereference, and 
don't use hints?
Seems nothing again, on error path we will return from higher level, and a bit 
of extra work, but nothing worse..

So I tend to agree. But honestly, I didn't understand first part of Kevin's 
paragraph against propagation,
so, may be he have more reasons to minimize number of cases when we propagate.

To the same topic, of minimization: should we always call MAKE_ERRP_SAFE at 
function top, or only
in block, where it is needed (assume, we dereference it only inside some "if" 
or "while"?
Kevin, is something bad in propagation, when it not related to error_abort?


-- 
Best regards,
Vladimir

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Eric Blake

On 9/19/19 8:03 AM, Kevin Wolf wrote:

>>
>> Interesting, that to handle error_append_hint problem, we don't need to
>> create local_err in case of errp==NULL either..
>>
>> So, possibly, we need the following steps:
>>
>> 1. implement MAKE_ERRP_SAFE_FOR_HINT (which only leave "*(errp) == 
>> error_fatal" in the if condition
>> 2. rebase Greg's series on it, to fix hints for fatal errors
>> 3. implement MAKE_ERRP_SAFE_FOR_DEREFERENCE (which only leave "(errp) == 
>> NULL" in the if condition
>> 4. convert all (almost all) local_err usage to use 
>> MAKE_ERRP_SAFE_FOR_DEREFERENCE, which will
>> fix problem with error_abort (and also drop a lot of calls of 
>> error_propagate)

Why do we need two macros?  A single MAKE_ERRP_SAFE that covers both
NULL and _fatal at the same time is sufficient.  You can then
always use append_hint and/or *errp as you wish, and the
error_propagate() call is only used in two situations: caller passed
NULL (so we free the error as ignored), caller passed _fatal (so
the hint got appended before we propagate it to flag a more useful message).

>> 5. merely convert "void func(.., errp)" to "int func(.., errp)" and drop 
>> MAKE_ERRP_SAFE_FOR_DEREFERENCE()
>> magic, together with dereferencing.

That's orthogonal, but may also be useful cleanups.  I don't think we
need to drop the macro, though.

> 
> Long macro names, but as the parameter will always only be "errp", it
> fits easily on a line, so this is fine.
> 
> I think I like this plan.
> 
> Kevin
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Eric Blake

On 9/19/19 4:17 AM, Kevin Wolf wrote:
> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> + */
>>> +#define MAKE_ERRP_SAFE(errp) \
>>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>>> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
>>> +(errp) = &__auto_errp_prop.local_err; \
>>> +}
>>
>> Not written to take a trailing semicolon in the caller.
>>
>> You could even set __auto_errp_prop unconditionally rather than trying
>> to reuse incoming errp (the difference being that error_propagate() gets
>> called more frequently).
> 
> I think this difference is actually a problem.
> 
> When debugging things, I hate error_propagate(). It means that the Error
> (specifically its fields src/func/line) points to the outermost
> error_propagate() rather than the place where the error really happened.
> It also makes error_abort completely useless because at the point where
> the process gets aborted, the interesting information is already lost.

Okay, based on that, I see the following desirable semantics:

Caller: one of 4 calling paradigms:

pass errp=NULL (we don't care about failures)
pass errp=_abort (we want to abort() as soon as possible as close
to the real problem as possible)
pass errp=_fatal (we want to exit(), but only after collecting as
much information as possible)
pass errp = anything else (we are collecting an error for other reasons,
we may report it or let the caller decide or ...)

Callee: we want a SINGLE paradigm:

func (Error **errp)
{
MAKE_ERRP_SAFE();

now we can pass errp to any child function, test '*errp', or do
anything else, and we DON'T have to call error_propagate.

I think that means we need:

#define MAKE_ERRP_SAFE() \
  g_auto(...) __auto_errp = { .errp = errp }; \
  do { \
if (!errp || errp == _fatal) { errp = &__auto_errp.local; } \
  } while (0)

So back to the caller semantics:

if the caller passed NULL, we've redirected errp locally so that we can
use *errp at will; the auto-cleanup will free our local error.

if the caller passed _abort, we keep errp unchanged.  *errp tests
will never trigger, because we'll have already aborted in the child on
the original errp, giving developers the best stack trace.

if the caller passed _fatal, we redirect errp.  auto-cleanup will
then error_propagate that back to the caller, producing as much nice
information as possible.

if the caller passed anything else, we keep errp unchanged, so no extra
error_propagate in the mix.

> 
> So I'd really like to restrict the use of error_propagate() to places
> where it's absolutely necessary. Unless, of course, you can fix these
> practical problems that error_propagate() causes for debugging.
> 
> In fact, in the context of Greg's series, I think we really only need to
> support hints for error_fatal, which are cases that users are supposed
> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
> are things that are never supposed to happen. A good stack trace is more
> important there than adding a hint to the message.

We also want to handle the caller passing NULL, so that we no longer
have to introduce 'Error *local_error = NULL' everywhere.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Eric Blake

On 9/19/19 1:47 AM, Vladimir Sementsov-Ogievskiy wrote:

>> "The evaluations of the initialization list expressions are
>> indeterminately sequenced with respect to one another and thus the order
>> in which any side effects occur is unspecified."
>>
>> which does not bode well for the assignment to __auto_errp_prop.errp.
>> All changes to errp would have to be within the same initializer.  Maybe:
>>
>> #define MAKE_ERRP_SAFE() \
>>g_auto(ErrorPropagator) __auto_errp_prop = { \
>>  .local_err = (__auto_errp_prop.err = errp, \
>>  (errp = &__auto_errp_prop.local_err), NULL) }
> 
> Is it guaranteed that .errp will not be initialized to NULL after evaluating 
> of
> .local_err initializer?

Probably not.

> 
>>
>> but by the time you get that complicated, just using a statement is
>> easier to read.

Either two declarations (the second being an unused dummy variable
declared solely for its initializer's side-effects) or a declaration and
a statement are the only sane ways I can see to provide guaranteed
ordering.  It's hidden behind a macro, so I don't care which.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 16:03, Kevin Wolf wrote:
> Am 19.09.2019 um 14:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 19.09.2019 12:17, Kevin Wolf wrote:
>>> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
 On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> + */
> +#define MAKE_ERRP_SAFE(errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> { \
> +(errp) = &__auto_errp_prop.local_err; \
> +}

 Not written to take a trailing semicolon in the caller.

 You could even set __auto_errp_prop unconditionally rather than trying
 to reuse incoming errp (the difference being that error_propagate() gets
 called more frequently).
>>>
>>> I think this difference is actually a problem.
>>>
>>> When debugging things, I hate error_propagate(). It means that the Error
>>> (specifically its fields src/func/line) points to the outermost
>>> error_propagate() rather than the place where the error really happened.
>>> It also makes error_abort completely useless because at the point where
>>> the process gets aborted, the interesting information is already lost.
>>>
>>> So I'd really like to restrict the use of error_propagate() to places
>>> where it's absolutely necessary. Unless, of course, you can fix these
>>> practical problems that error_propagate() causes for debugging.
>>>
>>> In fact, in the context of Greg's series, I think we really only need to
>>> support hints for error_fatal, which are cases that users are supposed
>>> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
>>> are things that are never supposed to happen. A good stack trace is more
>>> important there than adding a hint to the message.
>>>
>>
>> Interesting, that to handle error_append_hint problem, we don't need to
>> create local_err in case of errp==NULL either..
>>
>> So, possibly, we need the following steps:
>>
>> 1. implement MAKE_ERRP_SAFE_FOR_HINT (which only leave "*(errp) == 
>> error_fatal" in the if condition
>> 2. rebase Greg's series on it, to fix hints for fatal errors
>> 3. implement MAKE_ERRP_SAFE_FOR_DEREFERENCE (which only leave "(errp) == 
>> NULL" in the if condition
>> 4. convert all (almost all) local_err usage to use 
>> MAKE_ERRP_SAFE_FOR_DEREFERENCE, which will
>>  fix problem with error_abort (and also drop a lot of calls of 
>> error_propagate)
>> 5. merely convert "void func(.., errp)" to "int func(.., errp)" and drop 
>> MAKE_ERRP_SAFE_FOR_DEREFERENCE()
>>  magic, together with dereferencing.
> 
> Long macro names, but as the parameter will always only be "errp", it
> fits easily on a line, so this is fine.

Yes, I wanted to stress their meaning in plan..

Other variants, I can imagine:

MAKE_ERRP_SAFE_FOR_DEREFERENCE
WRAP_ERRP_FOR_DEREFERENCE
WRAP_NULL_ERRP

MAKE_ERRP_SAFE_FOR_HINT
WRAP_ERRP_FOR_HINT
WRAP_FATAL_ERRP


> 
> I think I like this plan.
> 
> Kevin
> 


-- 
Best regards,
Vladimir

Re: [Qemu-block] [PULL 0/3] Tracing patches

2019-09-19 Thread Peter Maydell

On Wed, 18 Sep 2019 at 14:21, Stefan Hajnoczi  wrote:
>
> The following changes since commit f8c3db33a5e863291182f8862ddf81618a7c6194:
>
>   target/sparc: Switch to do_transaction_failed() hook (2019-09-17 12:01:00 
> +0100)
>
> are available in the Git repository at:
>
>   https://github.com/stefanha/qemu.git tags/tracing-pull-request
>
> for you to fetch changes up to 9f7ad79c16ede0da01902b18fb32929cfbd20f87:
>
>   trace: Forbid event format ending with newline character (2019-09-18 
> 10:20:15 +0100)
>
> 
> Pull request
>
> 
>
> Alexey Kardashevskiy (1):
>   loader: Trace loaded images
>
> Philippe Mathieu-Daudé (2):
>   trace: Remove trailing newline in events
>   trace: Forbid event format ending with newline character


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Kevin Wolf

Am 19.09.2019 um 14:00 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 19.09.2019 12:17, Kevin Wolf wrote:
> > Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
> >> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> >>> + */
> >>> +#define MAKE_ERRP_SAFE(errp) \
> >>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> >>> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> >>> { \
> >>> +(errp) = &__auto_errp_prop.local_err; \
> >>> +}
> >>
> >> Not written to take a trailing semicolon in the caller.
> >>
> >> You could even set __auto_errp_prop unconditionally rather than trying
> >> to reuse incoming errp (the difference being that error_propagate() gets
> >> called more frequently).
> > 
> > I think this difference is actually a problem.
> > 
> > When debugging things, I hate error_propagate(). It means that the Error
> > (specifically its fields src/func/line) points to the outermost
> > error_propagate() rather than the place where the error really happened.
> > It also makes error_abort completely useless because at the point where
> > the process gets aborted, the interesting information is already lost.
> > 
> > So I'd really like to restrict the use of error_propagate() to places
> > where it's absolutely necessary. Unless, of course, you can fix these
> > practical problems that error_propagate() causes for debugging.
> > 
> > In fact, in the context of Greg's series, I think we really only need to
> > support hints for error_fatal, which are cases that users are supposed
> > to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
> > are things that are never supposed to happen. A good stack trace is more
> > important there than adding a hint to the message.
> > 
> 
> Interesting, that to handle error_append_hint problem, we don't need to
> create local_err in case of errp==NULL either..
> 
> So, possibly, we need the following steps:
> 
> 1. implement MAKE_ERRP_SAFE_FOR_HINT (which only leave "*(errp) == 
> error_fatal" in the if condition
> 2. rebase Greg's series on it, to fix hints for fatal errors
> 3. implement MAKE_ERRP_SAFE_FOR_DEREFERENCE (which only leave "(errp) == 
> NULL" in the if condition
> 4. convert all (almost all) local_err usage to use 
> MAKE_ERRP_SAFE_FOR_DEREFERENCE, which will
> fix problem with error_abort (and also drop a lot of calls of 
> error_propagate)
> 5. merely convert "void func(.., errp)" to "int func(.., errp)" and drop 
> MAKE_ERRP_SAFE_FOR_DEREFERENCE()
> magic, together with dereferencing.

Long macro names, but as the parameter will always only be "errp", it
fits easily on a line, so this is fine.

I think I like this plan.

Kevin

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Max Reitz

On 19.09.19 12:03, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 12:33, Max Reitz wrote:
>> On 19.09.19 11:14, Vladimir Sementsov-Ogievskiy wrote:
>>> 19.09.2019 11:59, Max Reitz wrote:
 On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> Here is a proposal (three of them, actually) of auto propagation for
> local_err, to not call error_propagate on every exit point, when we
> deal with local_err.
>
> It also may help make Greg's series[1] about error_append_hint smaller.
>
> See definitions and examples below.
>
> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> it, the idea will touch same code (and may be more).
>
> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>include/qapi/error.h | 102 +++
>block.c  |  63 --
>block/backup.c   |   8 +++-
>block/gluster.c  |   7 +++
>4 files changed, 144 insertions(+), 36 deletions(-)

 If the combination of “if (local_err) { error_propagate(...); ... }” is
 what’s cumbersome, can’t this be done simpler by adding an
 error_propagate() variant with a return value?

 i.e.

 bool has_error_then_propagate(Error **errp, Error *err)
 {
   if (!err) {
   return false;
   }
   error_propagate(errp, err);
   return true;
 }

 And then turn all instances of

 if (local_err) {
   error_propagate(errp, local_err);
   ...
 }

 into

 if (has_error_then_propagate(errp, local_err)) {
   ...
 }

 ?

 Max

>>>
>>> No, originally cumbersome is introducing local_err in a lot of new places by
>>> Greg's series. MAKE_ERRP_SAFE macro makes it as simple as one macro call
>>> instead (in each function where we need local_err).
>>
>> Does it need more than one local_err per function?
>>
>> Because if it didn’t, I’d find one “Error *local_err;” simpler than one
>> macro incantation.
>>
>> (It has the same LoC, and it makes code readers ask the same question:
>> “Why do we need it?”  Which has the same answer for both; but one is
>> immediately readable code, whereas the other is a macro.)
> 
> Not the same, you didn't count error_propagate

I did, because it would part of the if () statement in my proposal.

> And your example don't match Greg's case, there no "if (local_err)" in it,

Ah, right, I see.  OK then.

> just "error_append_hint(errp)", which don't work for error_fatal and 
> error_abort
> (Yes, I agree with Kevin, that it should work only for error_fatal)

True.

[...]

>> Now Kevin has given an actual advantage, which is that local_err
>> complicates debugging.  I’ve never had that problem myself, but that
>> would indeed be an advantage that may warrant some magic.

Although after some more consideration I realized this probably cannot
be achieved with this series, actually.

Max

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 12:17, Kevin Wolf wrote:
> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> + */
>>> +#define MAKE_ERRP_SAFE(errp) \
>>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>>> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
>>> +(errp) = &__auto_errp_prop.local_err; \
>>> +}
>>
>> Not written to take a trailing semicolon in the caller.
>>
>> You could even set __auto_errp_prop unconditionally rather than trying
>> to reuse incoming errp (the difference being that error_propagate() gets
>> called more frequently).
> 
> I think this difference is actually a problem.
> 
> When debugging things, I hate error_propagate(). It means that the Error
> (specifically its fields src/func/line) points to the outermost
> error_propagate() rather than the place where the error really happened.
> It also makes error_abort completely useless because at the point where
> the process gets aborted, the interesting information is already lost.
> 
> So I'd really like to restrict the use of error_propagate() to places
> where it's absolutely necessary. Unless, of course, you can fix these
> practical problems that error_propagate() causes for debugging.
> 
> In fact, in the context of Greg's series, I think we really only need to
> support hints for error_fatal, which are cases that users are supposed
> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
> are things that are never supposed to happen. A good stack trace is more
> important there than adding a hint to the message.
> 

Interesting, that to handle error_append_hint problem, we don't need to
create local_err in case of errp==NULL either..

So, possibly, we need the following steps:

1. implement MAKE_ERRP_SAFE_FOR_HINT (which only leave "*(errp) == error_fatal" 
in the if condition
2. rebase Greg's series on it, to fix hints for fatal errors
3. implement MAKE_ERRP_SAFE_FOR_DEREFERENCE (which only leave "(errp) == NULL" 
in the if condition
4. convert all (almost all) local_err usage to use 
MAKE_ERRP_SAFE_FOR_DEREFERENCE, which will
fix problem with error_abort (and also drop a lot of calls of 
error_propagate)
5. merely convert "void func(.., errp)" to "int func(.., errp)" and drop 
MAKE_ERRP_SAFE_FOR_DEREFERENCE()
magic, together with dereferencing.

-- 
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [RFC] error: auto propagated local_err

2019-09-19 Thread Daniel P . Berrangé

On Thu, Sep 19, 2019 at 10:21:44AM +, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 13:09, Daniel P. Berrangé wrote:
> > On Thu, Sep 19, 2019 at 11:17:20AM +0200, Kevin Wolf wrote:
> >> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
> >>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>  + */
>  +#define MAKE_ERRP_SAFE(errp) \
>  +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>  +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
>  { \
>  +(errp) = &__auto_errp_prop.local_err; \
>  +}
> >>>
> >>> Not written to take a trailing semicolon in the caller.
> >>>
> >>> You could even set __auto_errp_prop unconditionally rather than trying
> >>> to reuse incoming errp (the difference being that error_propagate() gets
> >>> called more frequently).
> >>
> >> I think this difference is actually a problem.
> >>
> >> When debugging things, I hate error_propagate(). It means that the Error
> >> (specifically its fields src/func/line) points to the outermost
> >> error_propagate() rather than the place where the error really happened.
> >> It also makes error_abort completely useless because at the point where
> >> the process gets aborted, the interesting information is already lost.
> > 
> > Yeah, that is very frustrating. For personal development you can eventually
> > figure it out, but with customer support requests, all you get is the
> > stack trace and you've no core file to investigate, so you're stuck.
> > IOW, as a general principle, we should strive to minimize the usage
> > of error propagate.
> > 
> > The key reason why we typically need the propagate calls, is because
> > we allow the passed in Error **errp parameter to be NULL, while at the
> > same time we want to inspect it to see if its contents are non-NULL
> > to detect failure. I venture to suggest that this is flawed API
> > design on our part. We should not need to inspect the error object
> > to see if a method call fails - the return value can be used for
> > this purpose.
> > 
> > IOW, instead of this pattern with typically 'void' methods
> > 
> >   extern void somemethod(Error **errp);
> > 
> >   void thismethod(Error **errp) {
> >  Error *local_err = NULL;
> > ...
> >  somemethod(_err);
> >  if (local_err) {
> > error_propagate(errp, local_err);
> > return;
> > }
> >  ...
> >   }
> > 
> > We should be preferring
> > 
> >   extern int somemethod(Error **errp);
> > 
> >   int thismethod(Error **errp) {
> > ...
> >  if (somemethod(errp) < 0) {
> > return -1;
> > }
> >  ...
> >   }
> > 
> > ie only using the Error object to bubble up the *details* of
> > the error, not as the mechanism for finding if an error occurred.
> > 
> > There is of course a downside with this approach, in that a
> > method might set 'errp' but return 0, or not set 'errp' but
> > return -1. I think this is outweighed by the simpler code
> > pattern overall though.
> > 
> > 
> 
> Agree. But seems that such change may be only done by hand.. Hmm, on the 
> other hand,
> why not. May be I'll try do it for some parts of block layer.
> 
> Still, error_append_hint needs local_err in case of error_fatal.

Yep, fortunately that usage is comparatively rare vs use of error_propagate
in general.

To be clear I don't have any objections to your overall idea of using
attribute cleanup to simplify error propagation. As you say, it can
still be useful for the error_append_hint, even if we use the code
pattern I suggest more widely to eliminate propagation where possible.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-block] [PATCH v3] virtio-blk: schedule virtio_notify_config to run on main context

2019-09-19 Thread Kevin Wolf

Am 16.09.2019 um 13:24 hat Sergio Lopez geschrieben:
> virtio_notify_config() needs to acquire the global mutex, which isn't
> allowed from an iothread, and may lead to a deadlock like this:
> 
>  - main thead
>   * Has acquired: qemu_global_mutex.
>   * Is trying the acquire: iothread AioContext lock via
> AIO_WAIT_WHILE (after aio_poll).
> 
>  - iothread
>   * Has acquired: AioContext lock.
>   * Is trying to acquire: qemu_global_mutex (via
> virtio_notify_config->prepare_mmio_access).
> 
> If virtio_blk_resize() is called from an iothread, schedule
> virtio_notify_config() to be run in the main context BH.
> 
> Signed-off-by: Sergio Lopez 
> ---
> Changelog
> 
> v3:
>  - Unconditionally schedule the work to be done in the main context BH
>(thanks John Snow and Kevin Wolf).
> 
> v2:
>  - Use aio_bh_schedule_oneshot instead of scheduling a coroutine
>(thanks Kevin Wolf).
>  - Switch from RFC to v2 patch.
> ---
>  hw/block/virtio-blk.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 18851601cb..0163285f6f 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -16,6 +16,7 @@
>  #include "qemu/iov.h"
>  #include "qemu/module.h"
>  #include "qemu/error-report.h"
> +#include "qemu/main-loop.h"
>  #include "trace.h"
>  #include "hw/block/block.h"
>  #include "hw/qdev-properties.h"
> @@ -1086,11 +1087,25 @@ static int virtio_blk_load_device(VirtIODevice *vdev, 
> QEMUFile *f,
>  return 0;
>  }
>  
> +static void virtio_resize_cb(void *opaque)
> +{
> +VirtIODevice *vdev = opaque;
> +
> +assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> +virtio_notify_config(vdev);
> +}
> +
>  static void virtio_blk_resize(void *opaque)
>  {
>  VirtIODevice *vdev = VIRTIO_DEVICE(opaque);
>  
> -virtio_notify_config(vdev);
> +/*
> + * virtio_notify_config() needs to acquire the global mutex,
> + * so it can't be called from an iothread. Instead, schedule
> + * it to be run in the main context BH.
> + */
> +aio_bh_schedule_oneshot(qemu_get_aio_context(),
> +virtio_resize_cb, vdev);

This fits on a single line. With this fixed:

Reviewed-by: Kevin Wolf

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 13:09, Daniel P. Berrangé wrote:
> On Thu, Sep 19, 2019 at 11:17:20AM +0200, Kevin Wolf wrote:
>> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
>>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
 + */
 +#define MAKE_ERRP_SAFE(errp) \
 +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
 +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { 
 \
 +(errp) = &__auto_errp_prop.local_err; \
 +}
>>>
>>> Not written to take a trailing semicolon in the caller.
>>>
>>> You could even set __auto_errp_prop unconditionally rather than trying
>>> to reuse incoming errp (the difference being that error_propagate() gets
>>> called more frequently).
>>
>> I think this difference is actually a problem.
>>
>> When debugging things, I hate error_propagate(). It means that the Error
>> (specifically its fields src/func/line) points to the outermost
>> error_propagate() rather than the place where the error really happened.
>> It also makes error_abort completely useless because at the point where
>> the process gets aborted, the interesting information is already lost.
> 
> Yeah, that is very frustrating. For personal development you can eventually
> figure it out, but with customer support requests, all you get is the
> stack trace and you've no core file to investigate, so you're stuck.
> IOW, as a general principle, we should strive to minimize the usage
> of error propagate.
> 
> The key reason why we typically need the propagate calls, is because
> we allow the passed in Error **errp parameter to be NULL, while at the
> same time we want to inspect it to see if its contents are non-NULL
> to detect failure. I venture to suggest that this is flawed API
> design on our part. We should not need to inspect the error object
> to see if a method call fails - the return value can be used for
> this purpose.
> 
> IOW, instead of this pattern with typically 'void' methods
> 
>   extern void somemethod(Error **errp);
> 
>   void thismethod(Error **errp) {
>  Error *local_err = NULL;
>   ...
>  somemethod(_err);
>  if (local_err) {
>   error_propagate(errp, local_err);
>   return;
>   }
>  ...
>   }
> 
> We should be preferring
> 
>   extern int somemethod(Error **errp);
> 
>   int thismethod(Error **errp) {
>   ...
>  if (somemethod(errp) < 0) {
>   return -1;
>   }
>  ...
>   }
> 
> ie only using the Error object to bubble up the *details* of
> the error, not as the mechanism for finding if an error occurred.
> 
> There is of course a downside with this approach, in that a
> method might set 'errp' but return 0, or not set 'errp' but
> return -1. I think this is outweighed by the simpler code
> pattern overall though.
> 
> 

Agree. But seems that such change may be only done by hand.. Hmm, on the other 
hand,
why not. May be I'll try do it for some parts of block layer.

Still, error_append_hint needs local_err in case of error_fatal.


-- 
Best regards,
Vladimir

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Daniel P . Berrangé

On Thu, Sep 19, 2019 at 11:17:20AM +0200, Kevin Wolf wrote:
> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
> > On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> > > + */
> > > +#define MAKE_ERRP_SAFE(errp) \
> > > +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> > > +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> > > { \
> > > +(errp) = &__auto_errp_prop.local_err; \
> > > +}
> > 
> > Not written to take a trailing semicolon in the caller.
> > 
> > You could even set __auto_errp_prop unconditionally rather than trying
> > to reuse incoming errp (the difference being that error_propagate() gets
> > called more frequently).
> 
> I think this difference is actually a problem.
> 
> When debugging things, I hate error_propagate(). It means that the Error
> (specifically its fields src/func/line) points to the outermost
> error_propagate() rather than the place where the error really happened.
> It also makes error_abort completely useless because at the point where
> the process gets aborted, the interesting information is already lost.

Yeah, that is very frustrating. For personal development you can eventually
figure it out, but with customer support requests, all you get is the
stack trace and you've no core file to investigate, so you're stuck.
IOW, as a general principle, we should strive to minimize the usage
of error propagate.

The key reason why we typically need the propagate calls, is because
we allow the passed in Error **errp parameter to be NULL, while at the
same time we want to inspect it to see if its contents are non-NULL
to detect failure. I venture to suggest that this is flawed API
design on our part. We should not need to inspect the error object
to see if a method call fails - the return value can be used for
this purpose.

IOW, instead of this pattern with typically 'void' methods

 extern void somemethod(Error **errp);

 void thismethod(Error **errp) {
Error *local_err = NULL;
...
somemethod(_err);
if (local_err) {
error_propagate(errp, local_err);
return;
}
...
 }

We should be preferring

 extern int somemethod(Error **errp);

 int thismethod(Error **errp) {
...
if (somemethod(errp) < 0) {
return -1;
}
...
 }

ie only using the Error object to bubble up the *details* of
the error, not as the mechanism for finding if an error occurred.

There is of course a downside with this approach, in that a
method might set 'errp' but return 0, or not set 'errp' but
return -1. I think this is outweighed by the simpler code
pattern overall though.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Greg Kurz

On Thu, 19 Sep 2019 09:28:11 +
Vladimir Sementsov-Ogievskiy  wrote:

> 19.09.2019 11:59, Greg Kurz wrote:
> > On Wed, 18 Sep 2019 16:02:44 +0300
> > Vladimir Sementsov-Ogievskiy  wrote:
> > 
> >> Hi all!
> >>
> >> Here is a proposal (three of them, actually) of auto propagation for
> >> local_err, to not call error_propagate on every exit point, when we
> >> deal with local_err.
> >>
> >> It also may help make Greg's series[1] about error_append_hint smaller.
> >>
> > 
> > This will depend on whether we reach a consensus soon enough (soft
> > freeze for 4.2 is 2019-10-29). Otherwise, I think my series should
> > be merged as is, and auto-propagation to be delayed to 4.3.
> > 
> >> See definitions and examples below.
> >>
> >> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> >> it, the idea will touch same code (and may be more).
> >>
> > 
> > When we have a good auto-propagation mechanism available, I guess
> > this can be beneficial for most of the code, not only the places
> > where we add hints (or prepend something, see below).
> > 
> >> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> >> ---
> >>   include/qapi/error.h | 102 +++
> >>   block.c  |  63 --
> >>   block/backup.c   |   8 +++-
> >>   block/gluster.c  |   7 +++
> >>   4 files changed, 144 insertions(+), 36 deletions(-)
> >>
> >> diff --git a/include/qapi/error.h b/include/qapi/error.h
> >> index 3f95141a01..083e061014 100644
> >> --- a/include/qapi/error.h
> >> +++ b/include/qapi/error.h
> >> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
> >>   ErrorClass err_class, const char *fmt, ...)
> >>   GCC_FMT_ATTR(6, 7);
> >>   
> >> +typedef struct ErrorPropagator {
> >> +Error **errp;
> >> +Error *local_err;
> >> +} ErrorPropagator;
> >> +
> >> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> >> +{
> >> +if (prop->local_err) {
> >> +error_propagate(prop->errp, prop->local_err);
> > 
> > We also have error_propagate_prepend(), which is basically a variant of
> > error_prepend()+error_propagate() that can cope with _fatal. This
> > was introduced by Markus in commit 4b5766488fd3, for similar reasons that
> > motivated my series. It has ~30 users in the tree.
> > 
> > There's no such thing a generic cleanup function with a printf-like
> > interface, so all of these should be converted back to error_prepend()
> > if we go for auto-propagation.
> > 
> > Aside from propagation, one can sometime choose to call things like
> > error_free() or error_free_or_abort()... Auto-propagation should
> > detect that and not call error_propagate() in this case.
> 
> Hmm, for example, qmp_eject, which error_free or error_propagate..
> We can leave such cases as is, not many of them. Or introduce
> safe_errp_free(Error **errp), which will zero pointer after freeing.
> 

Maybe even turning error_free() to take an Error ** ? It looks
safe to zero out a dangling pointer. Of course the API change
would need to be propagated to all error_* functions that
explicitly call error_free().

> > 
> >> +}
> >> +}
> >> +
> >> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
> >> error_propagator_cleanup);
> >> +
> >> +/*
> >> + * ErrorPropagationPair
> >> + *
> >> + * [Error *local_err, Error **errp]
> >> + *
> >> + * First element is local_err, second is original errp, which is 
> >> propagation
> >> + * target. Yes, errp has a bit another type, so it should be converted.
> >> + *
> >> + * ErrorPropagationPair may be used as errp, which points to local_err,
> >> + * as it's type is compatible.
> >> + */
> >> +typedef Error *ErrorPropagationPair[2];
> >> +
> >> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair 
> >> *arr)
> >> +{
> >> +Error *local_err = (*arr)[0];
> >> +Error **errp = (Error **)(*arr)[1];
> >> +
> >> +if (local_err) {
> >> +error_propagate(errp, local_err);
> >> +}
> >> +}
> >> +
> >> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
> >> + error_propagation_pair_cleanup);
> >> +
> >> +/*
> >> + * DEF_AUTO_ERRP
> >> + *
> >> + * Define auto_errp variable, which may be used instead of errp, and
> >> + * *auto_errp may be safely checked to be zero or not, and may be safely
> >> + * used for error_append_hint(). auto_errp is automatically propagated
> >> + * to errp at function exit.
> >> + */
> >> +#define DEF_AUTO_ERRP(auto_errp, errp) \
> >> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
> >> +
> >> +
> >> +/*
> >> + * Another variant:
> >> + *   Pros:
> >> + * - normal structure instead of cheating with array
> >> + * - we can directly use errp, if it's not NULL and don't point to
> >> + *   error_abort or error_fatal
> >> + *   Cons:
> >> + * - we need to

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 12:33, Max Reitz wrote:
> On 19.09.19 11:14, Vladimir Sementsov-Ogievskiy wrote:
>> 19.09.2019 11:59, Max Reitz wrote:
>>> On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
 Hi all!

 Here is a proposal (three of them, actually) of auto propagation for
 local_err, to not call error_propagate on every exit point, when we
 deal with local_err.

 It also may help make Greg's series[1] about error_append_hint smaller.

 See definitions and examples below.

 I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
 it, the idea will touch same code (and may be more).

 [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 ---
include/qapi/error.h | 102 +++
block.c  |  63 --
block/backup.c   |   8 +++-
block/gluster.c  |   7 +++
4 files changed, 144 insertions(+), 36 deletions(-)
>>>
>>> If the combination of “if (local_err) { error_propagate(...); ... }” is
>>> what’s cumbersome, can’t this be done simpler by adding an
>>> error_propagate() variant with a return value?
>>>
>>> i.e.
>>>
>>> bool has_error_then_propagate(Error **errp, Error *err)
>>> {
>>>   if (!err) {
>>>   return false;
>>>   }
>>>   error_propagate(errp, err);
>>>   return true;
>>> }
>>>
>>> And then turn all instances of
>>>
>>> if (local_err) {
>>>   error_propagate(errp, local_err);
>>>   ...
>>> }
>>>
>>> into
>>>
>>> if (has_error_then_propagate(errp, local_err)) {
>>>   ...
>>> }
>>>
>>> ?
>>>
>>> Max
>>>
>>
>> No, originally cumbersome is introducing local_err in a lot of new places by
>> Greg's series. MAKE_ERRP_SAFE macro makes it as simple as one macro call
>> instead (in each function where we need local_err).
> 
> Does it need more than one local_err per function?
> 
> Because if it didn’t, I’d find one “Error *local_err;” simpler than one
> macro incantation.
> 
> (It has the same LoC, and it makes code readers ask the same question:
> “Why do we need it?”  Which has the same answer for both; but one is
> immediately readable code, whereas the other is a macro.)

Not the same, you didn't count error_propagate

And your example don't match Greg's case, there no "if (local_err)" in it,
just "error_append_hint(errp)", which don't work for error_fatal and error_abort
(Yes, I agree with Kevin, that it should work only for error_fatal)

> 
>> Also, auto-propagation seems correct thing to do, which fits good into
>> g_auto concept, so even without any macro it just allows to drop several 
>> error_propagate
>> calls (or may be several goto statements to do one error_propagate call) into
>> one definitions. It's the same story like with g_autofree vs g_free.
> 
> I don’t see the advantage here.  You have to do the if () statement
> anyway, so it isn’t like you’re saving any LoC.  In addition, I
> personally don’t find code hidden through __attribute__((cleanup))
> easier to understand than explicitly written code.
> 
> It isn’t like I don’t like __attribute__((cleanup)).  But it does count
> under “magic” in my book, and thus I’d avoid it if it doesn’t bring
> actual advantages.  (It does bring actual advantages for things like
> auto-freeing memory, auto-closing FDs, or zeroing secret buffers before
> freeing them.  In all those cases, you save LoC, and you prevent
> accidental leakage.  I don’t quite see such advantages here, but I may
> well be mistaken.)
> 
> 
> Now Kevin has given an actual advantage, which is that local_err
> complicates debugging.  I’ve never had that problem myself, but that
> would indeed be an advantage that may warrant some magic.
> 
> Max
> 


-- 
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-trivial] [PATCH] cutils: Move size_to_str() from "qemu-common.h" to "qemu/cutils.h"

2019-09-19 Thread Laurent Vivier

Le 03/09/2019 à 14:05, Philippe Mathieu-Daudé a écrit :
> "qemu/cutils.h" contains various qemu_strtosz_*() functions
> useful to convert strings to size. It seems natural to have
> the opposite usage (from size to string) there too.
> 
> The function definition is already in util/cutils.c.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> There are only 5 users, is it worthwhile renaming it qemu_sztostrt()?
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  block/qapi.c | 2 +-
>  include/qemu-common.h| 1 -
>  include/qemu/cutils.h| 2 ++
>  qapi/string-output-visitor.c | 2 +-
>  4 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qapi.c b/block/qapi.c
> index 15f1030264..7ee2ee065d 100644
> --- a/block/qapi.c
> +++ b/block/qapi.c
> @@ -23,7 +23,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> -#include "qemu-common.h"
> +#include "qemu/cutils.h"
>  #include "block/qapi.h"
>  #include "block/block_int.h"
>  #include "block/throttle-groups.h"
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 0235cd3b91..8d84db90b0 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -123,7 +123,6 @@ void qemu_hexdump(const char *buf, FILE *fp, const char 
> *prefix, size_t size);
>  int parse_debug_env(const char *name, int max, int initial);
>  
>  const char *qemu_ether_ntoa(const MACAddr *mac);
> -char *size_to_str(uint64_t val);
>  void page_size_init(void);
>  
>  /* returns non-zero if dump is in progress, otherwise zero is
> diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
> index 12301340a4..b54c847e0f 100644
> --- a/include/qemu/cutils.h
> +++ b/include/qemu/cutils.h
> @@ -155,6 +155,8 @@ int qemu_strtosz(const char *nptr, const char **end, 
> uint64_t *result);
>  int qemu_strtosz_MiB(const char *nptr, const char **end, uint64_t *result);
>  int qemu_strtosz_metric(const char *nptr, const char **end, uint64_t 
> *result);
>  
> +char *size_to_str(uint64_t val);
> +
>  /* used to print char* safely */
>  #define STR_OR_NULL(str) ((str) ? (str) : "null")
>  
> diff --git a/qapi/string-output-visitor.c b/qapi/string-output-visitor.c
> index 7ab64468d9..0d93605d77 100644
> --- a/qapi/string-output-visitor.c
> +++ b/qapi/string-output-visitor.c
> @@ -11,7 +11,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> -#include "qemu-common.h"
> +#include "qemu/cutils.h"
>  #include "qapi/string-output-visitor.h"
>  #include "qapi/visitor-impl.h"
>  #include "qemu/host-utils.h"
> 

Applied to my trivial-patches branch.

Thanks,
Laurent

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 12:17, Kevin Wolf wrote:
> Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
>> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> + */
>>> +#define MAKE_ERRP_SAFE(errp) \
>>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>>> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
>>> +(errp) = &__auto_errp_prop.local_err; \
>>> +}
>>
>> Not written to take a trailing semicolon in the caller.
>>
>> You could even set __auto_errp_prop unconditionally rather than trying
>> to reuse incoming errp (the difference being that error_propagate() gets
>> called more frequently).
> 
> I think this difference is actually a problem.
> 
> When debugging things, I hate error_propagate(). It means that the Error
> (specifically its fields src/func/line) points to the outermost
> error_propagate() rather than the place where the error really happened.

Hmm, never tested it, but looking at the code I can't understand how can that
be. src/func/line are set in error_setg.. and in error_propagate() we just
set the errp of the caller, src/func/line unchanged.

Still, I see that these useful fields are printed only for error_abort, for
which we usually have coredump, which provides a lot more information.

> It also makes error_abort completely useless because at the point where
> the process gets aborted, the interesting information is already lost.

Aha, understand this point, error_abort just don't work as desired, if
we swap it by local_err. And we can fix it by using macro: never create
local_err for error_abort, let it abort exactly on error_setg.

> 
> So I'd really like to restrict the use of error_propagate() to places
> where it's absolutely necessary. Unless, of course, you can fix these
> practical problems that error_propagate() causes for debugging.
> 
> In fact, in the context of Greg's series, I think we really only need to
> support hints for error_fatal, which are cases that users are supposed
> to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
> are things that are never supposed to happen. A good stack trace is more
> important there than adding a hint to the message.
> 

Agreed


-- 
Best regards,
Vladimir

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Max Reitz

On 19.09.19 11:14, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 11:59, Max Reitz wrote:
>> On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> Here is a proposal (three of them, actually) of auto propagation for
>>> local_err, to not call error_propagate on every exit point, when we
>>> deal with local_err.
>>>
>>> It also may help make Greg's series[1] about error_append_hint smaller.
>>>
>>> See definitions and examples below.
>>>
>>> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
>>> it, the idea will touch same code (and may be more).
>>>
>>> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   include/qapi/error.h | 102 +++
>>>   block.c  |  63 --
>>>   block/backup.c   |   8 +++-
>>>   block/gluster.c  |   7 +++
>>>   4 files changed, 144 insertions(+), 36 deletions(-)
>>
>> If the combination of “if (local_err) { error_propagate(...); ... }” is
>> what’s cumbersome, can’t this be done simpler by adding an
>> error_propagate() variant with a return value?
>>
>> i.e.
>>
>> bool has_error_then_propagate(Error **errp, Error *err)
>> {
>>  if (!err) {
>>  return false;
>>  }
>>  error_propagate(errp, err);
>>  return true;
>> }
>>
>> And then turn all instances of
>>
>> if (local_err) {
>>  error_propagate(errp, local_err);
>>  ...
>> }
>>
>> into
>>
>> if (has_error_then_propagate(errp, local_err)) {
>>  ...
>> }
>>
>> ?
>>
>> Max
>>
> 
> No, originally cumbersome is introducing local_err in a lot of new places by
> Greg's series. MAKE_ERRP_SAFE macro makes it as simple as one macro call
> instead (in each function where we need local_err).

Does it need more than one local_err per function?

Because if it didn’t, I’d find one “Error *local_err;” simpler than one
macro incantation.

(It has the same LoC, and it makes code readers ask the same question:
“Why do we need it?”  Which has the same answer for both; but one is
immediately readable code, whereas the other is a macro.)

> Also, auto-propagation seems correct thing to do, which fits good into
> g_auto concept, so even without any macro it just allows to drop several 
> error_propagate
> calls (or may be several goto statements to do one error_propagate call) into
> one definitions. It's the same story like with g_autofree vs g_free.

I don’t see the advantage here.  You have to do the if () statement
anyway, so it isn’t like you’re saving any LoC.  In addition, I
personally don’t find code hidden through __attribute__((cleanup))
easier to understand than explicitly written code.

It isn’t like I don’t like __attribute__((cleanup)).  But it does count
under “magic” in my book, and thus I’d avoid it if it doesn’t bring
actual advantages.  (It does bring actual advantages for things like
auto-freeing memory, auto-closing FDs, or zeroing secret buffers before
freeing them.  In all those cases, you save LoC, and you prevent
accidental leakage.  I don’t quite see such advantages here, but I may
well be mistaken.)

Now Kevin has given an actual advantage, which is that local_err
complicates debugging.  I’ve never had that problem myself, but that
would indeed be an advantage that may warrant some magic.

Max

Re: [Qemu-block] [PULL 2/3] trace: Remove trailing newline in events

2019-09-19 Thread Stefan Hajnoczi

On Wed, Sep 18, 2019 at 05:51:16PM +0200, Philippe Mathieu-Daudé wrote:
> On 9/18/19 3:21 PM, Stefan Hajnoczi wrote:
> > Reviewed-by: John Snow 
> > Reviewed-by: Kevin Wolf 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > Message-id: 20190916095121.29506-2-phi...@redhat.com
> > Message-Id: <20190916095121.29506-2-phi...@redhat.com>
> 
> Out of curiosity, how do you end up with 2 slightly different
> message-id? Using two different tools in series? (Thinking about
> unifying these tools format).

I'm not sure what happened here :P.

Stefan


signature.asc
Description: PGP signature

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 11:59, Greg Kurz wrote:
> On Wed, 18 Sep 2019 16:02:44 +0300
> Vladimir Sementsov-Ogievskiy  wrote:
> 
>> Hi all!
>>
>> Here is a proposal (three of them, actually) of auto propagation for
>> local_err, to not call error_propagate on every exit point, when we
>> deal with local_err.
>>
>> It also may help make Greg's series[1] about error_append_hint smaller.
>>
> 
> This will depend on whether we reach a consensus soon enough (soft
> freeze for 4.2 is 2019-10-29). Otherwise, I think my series should
> be merged as is, and auto-propagation to be delayed to 4.3.
> 
>> See definitions and examples below.
>>
>> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
>> it, the idea will touch same code (and may be more).
>>
> 
> When we have a good auto-propagation mechanism available, I guess
> this can be beneficial for most of the code, not only the places
> where we add hints (or prepend something, see below).
> 
>> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   include/qapi/error.h | 102 +++
>>   block.c  |  63 --
>>   block/backup.c   |   8 +++-
>>   block/gluster.c  |   7 +++
>>   4 files changed, 144 insertions(+), 36 deletions(-)
>>
>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>> index 3f95141a01..083e061014 100644
>> --- a/include/qapi/error.h
>> +++ b/include/qapi/error.h
>> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>>   ErrorClass err_class, const char *fmt, ...)
>>   GCC_FMT_ATTR(6, 7);
>>   
>> +typedef struct ErrorPropagator {
>> +Error **errp;
>> +Error *local_err;
>> +} ErrorPropagator;
>> +
>> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
>> +{
>> +if (prop->local_err) {
>> +error_propagate(prop->errp, prop->local_err);
> 
> We also have error_propagate_prepend(), which is basically a variant of
> error_prepend()+error_propagate() that can cope with _fatal. This
> was introduced by Markus in commit 4b5766488fd3, for similar reasons that
> motivated my series. It has ~30 users in the tree.
> 
> There's no such thing a generic cleanup function with a printf-like
> interface, so all of these should be converted back to error_prepend()
> if we go for auto-propagation.
> 
> Aside from propagation, one can sometime choose to call things like
> error_free() or error_free_or_abort()... Auto-propagation should
> detect that and not call error_propagate() in this case.

Hmm, for example, qmp_eject, which error_free or error_propagate..
We can leave such cases as is, not many of them. Or introduce
safe_errp_free(Error **errp), which will zero pointer after freeing.

> 
>> +}
>> +}
>> +
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
>> +
>> +/*
>> + * ErrorPropagationPair
>> + *
>> + * [Error *local_err, Error **errp]
>> + *
>> + * First element is local_err, second is original errp, which is propagation
>> + * target. Yes, errp has a bit another type, so it should be converted.
>> + *
>> + * ErrorPropagationPair may be used as errp, which points to local_err,
>> + * as it's type is compatible.
>> + */
>> +typedef Error *ErrorPropagationPair[2];
>> +
>> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair *arr)
>> +{
>> +Error *local_err = (*arr)[0];
>> +Error **errp = (Error **)(*arr)[1];
>> +
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +}
>> +}
>> +
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
>> + error_propagation_pair_cleanup);
>> +
>> +/*
>> + * DEF_AUTO_ERRP
>> + *
>> + * Define auto_errp variable, which may be used instead of errp, and
>> + * *auto_errp may be safely checked to be zero or not, and may be safely
>> + * used for error_append_hint(). auto_errp is automatically propagated
>> + * to errp at function exit.
>> + */
>> +#define DEF_AUTO_ERRP(auto_errp, errp) \
>> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
>> +
>> +
>> +/*
>> + * Another variant:
>> + *   Pros:
>> + * - normal structure instead of cheating with array
>> + * - we can directly use errp, if it's not NULL and don't point to
>> + *   error_abort or error_fatal
>> + *   Cons:
>> + * - we need to define two variables instead of one
>> + */
>> +typedef struct ErrorPropagationStruct {
>> +Error *local_err;
>> +Error **errp;
>> +} ErrorPropagationStruct;
>> +
>> +static inline void error_propagation_struct_cleanup(ErrorPropagationStruct 
>> *prop)
>> +{
>> +if (prop->local_err) {
>> +error_propagate(prop->errp, prop->local_err);
>> +}
>> +}
>> +
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
>> + error_propagation_struct_cleanup);
>> +
>> +#define DEF_AUTO_ERRP_V2(auto_errp,

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Kevin Wolf

Am 18.09.2019 um 19:10 hat Eric Blake geschrieben:
> On 9/18/19 8:02 AM, Vladimir Sementsov-Ogievskiy wrote:
> > + */
> > +#define MAKE_ERRP_SAFE(errp) \
> > +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> > +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
> > +(errp) = &__auto_errp_prop.local_err; \
> > +}
> 
> Not written to take a trailing semicolon in the caller.
> 
> You could even set __auto_errp_prop unconditionally rather than trying
> to reuse incoming errp (the difference being that error_propagate() gets
> called more frequently).

I think this difference is actually a problem.

When debugging things, I hate error_propagate(). It means that the Error
(specifically its fields src/func/line) points to the outermost
error_propagate() rather than the place where the error really happened.
It also makes error_abort completely useless because at the point where
the process gets aborted, the interesting information is already lost.

So I'd really like to restrict the use of error_propagate() to places
where it's absolutely necessary. Unless, of course, you can fix these
practical problems that error_propagate() causes for debugging.

In fact, in the context of Greg's series, I think we really only need to
support hints for error_fatal, which are cases that users are supposed
to see. We should exclude error_abort in MAKE_ERRP_SAFE() because these
are things that are never supposed to happen. A good stack trace is more
important there than adding a hint to the message.

Kevin

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 11:59, Max Reitz wrote:
> On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Here is a proposal (three of them, actually) of auto propagation for
>> local_err, to not call error_propagate on every exit point, when we
>> deal with local_err.
>>
>> It also may help make Greg's series[1] about error_append_hint smaller.
>>
>> See definitions and examples below.
>>
>> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
>> it, the idea will touch same code (and may be more).
>>
>> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   include/qapi/error.h | 102 +++
>>   block.c  |  63 --
>>   block/backup.c   |   8 +++-
>>   block/gluster.c  |   7 +++
>>   4 files changed, 144 insertions(+), 36 deletions(-)
> 
> If the combination of “if (local_err) { error_propagate(...); ... }” is
> what’s cumbersome, can’t this be done simpler by adding an
> error_propagate() variant with a return value?
> 
> i.e.
> 
> bool has_error_then_propagate(Error **errp, Error *err)
> {
>  if (!err) {
>  return false;
>  }
>  error_propagate(errp, err);
>  return true;
> }
> 
> And then turn all instances of
> 
> if (local_err) {
>  error_propagate(errp, local_err);
>  ...
> }
> 
> into
> 
> if (has_error_then_propagate(errp, local_err)) {
>  ...
> }
> 
> ?
> 
> Max
> 

No, originally cumbersome is introducing local_err in a lot of new places by
Greg's series. MAKE_ERRP_SAFE macro makes it as simple as one macro call
instead (in each function where we need local_err).

Also, auto-propagation seems correct thing to do, which fits good into
g_auto concept, so even without any macro it just allows to drop several 
error_propagate
calls (or may be several goto statements to do one error_propagate call) into
one definitions. It's the same story like with g_autofree vs g_free.

-- 
Best regards,
Vladimir

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Max Reitz

On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is a proposal (three of them, actually) of auto propagation for
> local_err, to not call error_propagate on every exit point, when we
> deal with local_err.
> 
> It also may help make Greg's series[1] about error_append_hint smaller.
> 
> See definitions and examples below.
> 
> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> it, the idea will touch same code (and may be more).
> 
> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qapi/error.h | 102 +++
>  block.c  |  63 --
>  block/backup.c   |   8 +++-
>  block/gluster.c  |   7 +++
>  4 files changed, 144 insertions(+), 36 deletions(-)

If the combination of “if (local_err) { error_propagate(...); ... }” is
what’s cumbersome, can’t this be done simpler by adding an
error_propagate() variant with a return value?

i.e.

bool has_error_then_propagate(Error **errp, Error *err)
{
if (!err) {
return false;
}
error_propagate(errp, err);
return true;
}

And then turn all instances of

if (local_err) {
error_propagate(errp, local_err);
...
}

into

if (has_error_then_propagate(errp, local_err)) {
...
}

?

Max

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Greg Kurz

On Wed, 18 Sep 2019 16:02:44 +0300
Vladimir Sementsov-Ogievskiy  wrote:

> Hi all!
> 
> Here is a proposal (three of them, actually) of auto propagation for
> local_err, to not call error_propagate on every exit point, when we
> deal with local_err.
> 
> It also may help make Greg's series[1] about error_append_hint smaller.
> 

This will depend on whether we reach a consensus soon enough (soft
freeze for 4.2 is 2019-10-29). Otherwise, I think my series should
be merged as is, and auto-propagation to be delayed to 4.3.

> See definitions and examples below.
> 
> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> it, the idea will touch same code (and may be more).
> 

When we have a good auto-propagation mechanism available, I guess
this can be beneficial for most of the code, not only the places
where we add hints (or prepend something, see below).

> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qapi/error.h | 102 +++
>  block.c  |  63 --
>  block/backup.c   |   8 +++-
>  block/gluster.c  |   7 +++
>  4 files changed, 144 insertions(+), 36 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 3f95141a01..083e061014 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>  ErrorClass err_class, const char *fmt, ...)
>  GCC_FMT_ATTR(6, 7);
>  
> +typedef struct ErrorPropagator {
> +Error **errp;
> +Error *local_err;
> +} ErrorPropagator;
> +
> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);

We also have error_propagate_prepend(), which is basically a variant of
error_prepend()+error_propagate() that can cope with _fatal. This
was introduced by Markus in commit 4b5766488fd3, for similar reasons that
motivated my series. It has ~30 users in the tree.

There's no such thing a generic cleanup function with a printf-like
interface, so all of these should be converted back to error_prepend()
if we go for auto-propagation.

Aside from propagation, one can sometime choose to call things like
error_free() or error_free_or_abort()... Auto-propagation should
detect that and not call error_propagate() in this case.

> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
> +
> +/*
> + * ErrorPropagationPair
> + *
> + * [Error *local_err, Error **errp]
> + *
> + * First element is local_err, second is original errp, which is propagation
> + * target. Yes, errp has a bit another type, so it should be converted.
> + *
> + * ErrorPropagationPair may be used as errp, which points to local_err,
> + * as it's type is compatible.
> + */
> +typedef Error *ErrorPropagationPair[2];
> +
> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair *arr)
> +{
> +Error *local_err = (*arr)[0];
> +Error **errp = (Error **)(*arr)[1];
> +
> +if (local_err) {
> +error_propagate(errp, local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
> + error_propagation_pair_cleanup);
> +
> +/*
> + * DEF_AUTO_ERRP
> + *
> + * Define auto_errp variable, which may be used instead of errp, and
> + * *auto_errp may be safely checked to be zero or not, and may be safely
> + * used for error_append_hint(). auto_errp is automatically propagated
> + * to errp at function exit.
> + */
> +#define DEF_AUTO_ERRP(auto_errp, errp) \
> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
> +
> +
> +/*
> + * Another variant:
> + *   Pros:
> + * - normal structure instead of cheating with array
> + * - we can directly use errp, if it's not NULL and don't point to
> + *   error_abort or error_fatal
> + *   Cons:
> + * - we need to define two variables instead of one
> + */
> +typedef struct ErrorPropagationStruct {
> +Error *local_err;
> +Error **errp;
> +} ErrorPropagationStruct;
> +
> +static inline void error_propagation_struct_cleanup(ErrorPropagationStruct 
> *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
> + error_propagation_struct_cleanup);
> +
> +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> +Error **auto_errp = \
> +((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> ? \
> +&__auto_errp_prop.local_err : \
> +(errp);
> +
> +/*
> + * Third variant:
> + *   Pros:
> + * - simpler movement for functions which don't have local_err yet
> + *   the

Re: [Qemu-block] [PATCH 10/17] s390x/css: Pass local error object pointer to error_append_hint()

2019-09-19 Thread Cornelia Huck

On Wed, 18 Sep 2019 12:46:42 -0500
Eric Blake  wrote:

> On 9/18/19 5:26 AM, Cornelia Huck wrote:
> > On Tue, 17 Sep 2019 18:36:20 +0200
> > Greg Kurz  wrote:
> >   
> >> On Tue, 17 Sep 2019 13:24:12 +0200
> >> Cornelia Huck  wrote:
> >>  
> >>> On Tue, 17 Sep 2019 12:21:34 +0200
> >>> Greg Kurz  wrote:
> >>> 
>  Ensure that hints are added even if errp is _fatal or _abort.
> 
>  Signed-off-by: Greg Kurz 
>  ---
>   hw/s390x/s390-ccw.c |6 --
>   1 file changed, 4 insertions(+), 2 deletions(-)
> >>>
> >>> Reviewed-by: Cornelia Huck 
> >>>
> >>> Can also take this via the s390 tree, let me know what would work best.   
> >>>  
> >>
> >> I guess it would be easier to merge if each individual patch goes
> >> through the corresponding sub-maintainer tree. But Eric mentioned
> >> in another mail that the whole whole series could maybe go through
> >> Markus' error tree... so, I don't know which is best. :)  
> > 
> > Ok, it's probably best to take this through the s390 tree, as I plan to
> > send a pull request tomorrow :)  
> 
> If we go with Vladimir's idea of auto-propagation, this change just gets
> rewritten again as part of our simplifications to drop all use of
> 'local_err' in favor of instead using the macro that makes errp always
> safe to use locally.
> 

Fair enough. That auto-propagation approach really looks nice, so I'll
go ahead and unqueue this patch again.

Re: [Qemu-block] [Qemu-devel] [PATCH] iotests: Require Python 3.5 or later

2019-09-19 Thread Kevin Wolf

Am 18.09.2019 um 20:49 hat John Snow geschrieben:
> 
> 
> On 9/18/19 4:55 AM, Kevin Wolf wrote:
> > Running iotests is not required to build QEMU, so we can have stricter
> > version requirements for Python here and can make use of new features
> > and drop compatibility code earlier.
> > 
> > This makes qemu-iotests skip all Python tests if a Python version before
> > 3.5 is used for the build.
> > 
> > Suggested-by: Eduardo Habkost 
> > Signed-off-by: Kevin Wolf 
> > ---
> >   tests/qemu-iotests/check | 14 +-
> >   1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> > index 875399d79f..a68f414d6c 100755
> > --- a/tests/qemu-iotests/check
> > +++ b/tests/qemu-iotests/check
> > @@ -633,6 +633,13 @@ then
> >   export SOCKET_SCM_HELPER="$build_iotests/socket_scm_helper"
> >   fi
> > +# Note that if the Python conditional here evaluates True we will exit
> > +# with status 1 which is a shell 'false' value.
> > +python_usable=false
> > +if ! $PYTHON -c 'import sys; sys.exit(sys.version_info >= (3,5))'; then
> > +python_usable=true
> > +fi
> > +
> 
> Do we want this as a temporary fix only until we can stipulate the same
> version in the configure file?

I thought that maybe we should leave the code around so that at some
later point, we could upgrade it to 3.6 (or something else) before QEMU
as a whole does so.

In fact... Could we already require 3.6 now instead of using 3.5, which
I think we only chose because of Debian Stretch (oldstable)?

Kevin

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread David Hildenbrand

On 19.09.19 10:20, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 10:53, David Hildenbrand wrote:
>> On 19.09.19 09:41, Vladimir Sementsov-Ogievskiy wrote:
>>> 19.09.2019 10:32, David Hildenbrand wrote:
 On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
>
> Here is a proposal (three of them, actually) of auto propagation for
> local_err, to not call error_propagate on every exit point, when we
> deal with local_err.
>
> It also may help make Greg's series[1] about error_append_hint smaller.
>
> See definitions and examples below.
>
> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> it, the idea will touch same code (and may be more).
>
> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>include/qapi/error.h | 102 +++
>block.c  |  63 --
>block/backup.c   |   8 +++-
>block/gluster.c  |   7 +++
>4 files changed, 144 insertions(+), 36 deletions(-)
>
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 3f95141a01..083e061014 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>ErrorClass err_class, const char *fmt, ...)
>GCC_FMT_ATTR(6, 7);
>
> +typedef struct ErrorPropagator {
> +Error **errp;
> +Error *local_err;
> +} ErrorPropagator;
> +
> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
> error_propagator_cleanup);
> +
> +/*
> + * ErrorPropagationPair
> + *
> + * [Error *local_err, Error **errp]
> + *
> + * First element is local_err, second is original errp, which is 
> propagation
> + * target. Yes, errp has a bit another type, so it should be converted.
> + *
> + * ErrorPropagationPair may be used as errp, which points to local_err,
> + * as it's type is compatible.
> + */
> +typedef Error *ErrorPropagationPair[2];
> +
> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair 
> *arr)
> +{
> +Error *local_err = (*arr)[0];
> +Error **errp = (Error **)(*arr)[1];
> +
> +if (local_err) {
> +error_propagate(errp, local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
> + error_propagation_pair_cleanup);
> +
> +/*
> + * DEF_AUTO_ERRP
> + *
> + * Define auto_errp variable, which may be used instead of errp, and
> + * *auto_errp may be safely checked to be zero or not, and may be safely
> + * used for error_append_hint(). auto_errp is automatically propagated
> + * to errp at function exit.
> + */
> +#define DEF_AUTO_ERRP(auto_errp, errp) \
> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
> +
> +
> +/*
> + * Another variant:
> + *   Pros:
> + * - normal structure instead of cheating with array
> + * - we can directly use errp, if it's not NULL and don't point to
> + *   error_abort or error_fatal
> + *   Cons:
> + * - we need to define two variables instead of one
> + */
> +typedef struct ErrorPropagationStruct {
> +Error *local_err;
> +Error **errp;
> +} ErrorPropagationStruct;
> +
> +static inline void 
> error_propagation_struct_cleanup(ErrorPropagationStruct *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
> + error_propagation_struct_cleanup);
> +
> +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = 
> (errp)}; \
> +Error **auto_errp = \
> +((errp) == NULL || *(errp) == error_abort || *(errp) == 
> error_fatal) ? \
> +&__auto_errp_prop.local_err : \
> +(errp);
> +
> +/*
> + * Third variant:
> + *   Pros:
> + * - simpler movement for functions which don't have local_err yet
> + *   the only thing to do is to call one macro at function start.
> + *   This extremely simplifies Greg's series
> + *   Cons:
> + * - looks like errp shadowing.. Still seems safe.
> + * - must be after all definitions of local variables and before any
> + *   code.
>

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 10:53, David Hildenbrand wrote:
> On 19.09.19 09:41, Vladimir Sementsov-Ogievskiy wrote:
>> 19.09.2019 10:32, David Hildenbrand wrote:
>>> On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
 Hi all!

 Here is a proposal (three of them, actually) of auto propagation for
 local_err, to not call error_propagate on every exit point, when we
 deal with local_err.

 It also may help make Greg's series[1] about error_append_hint smaller.

 See definitions and examples below.

 I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
 it, the idea will touch same code (and may be more).

 [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html

 Signed-off-by: Vladimir Sementsov-Ogievskiy 
 ---
include/qapi/error.h | 102 +++
block.c  |  63 --
block/backup.c   |   8 +++-
block/gluster.c  |   7 +++
4 files changed, 144 insertions(+), 36 deletions(-)

 diff --git a/include/qapi/error.h b/include/qapi/error.h
 index 3f95141a01..083e061014 100644
 --- a/include/qapi/error.h
 +++ b/include/qapi/error.h
 @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
ErrorClass err_class, const char *fmt, ...)
GCC_FMT_ATTR(6, 7);

 +typedef struct ErrorPropagator {
 +Error **errp;
 +Error *local_err;
 +} ErrorPropagator;
 +
 +static inline void error_propagator_cleanup(ErrorPropagator *prop)
 +{
 +if (prop->local_err) {
 +error_propagate(prop->errp, prop->local_err);
 +}
 +}
 +
 +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
 error_propagator_cleanup);
 +
 +/*
 + * ErrorPropagationPair
 + *
 + * [Error *local_err, Error **errp]
 + *
 + * First element is local_err, second is original errp, which is 
 propagation
 + * target. Yes, errp has a bit another type, so it should be converted.
 + *
 + * ErrorPropagationPair may be used as errp, which points to local_err,
 + * as it's type is compatible.
 + */
 +typedef Error *ErrorPropagationPair[2];
 +
 +static inline void error_propagation_pair_cleanup(ErrorPropagationPair 
 *arr)
 +{
 +Error *local_err = (*arr)[0];
 +Error **errp = (Error **)(*arr)[1];
 +
 +if (local_err) {
 +error_propagate(errp, local_err);
 +}
 +}
 +
 +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
 + error_propagation_pair_cleanup);
 +
 +/*
 + * DEF_AUTO_ERRP
 + *
 + * Define auto_errp variable, which may be used instead of errp, and
 + * *auto_errp may be safely checked to be zero or not, and may be safely
 + * used for error_append_hint(). auto_errp is automatically propagated
 + * to errp at function exit.
 + */
 +#define DEF_AUTO_ERRP(auto_errp, errp) \
 +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
 +
 +
 +/*
 + * Another variant:
 + *   Pros:
 + * - normal structure instead of cheating with array
 + * - we can directly use errp, if it's not NULL and don't point to
 + *   error_abort or error_fatal
 + *   Cons:
 + * - we need to define two variables instead of one
 + */
 +typedef struct ErrorPropagationStruct {
 +Error *local_err;
 +Error **errp;
 +} ErrorPropagationStruct;
 +
 +static inline void 
 error_propagation_struct_cleanup(ErrorPropagationStruct *prop)
 +{
 +if (prop->local_err) {
 +error_propagate(prop->errp, prop->local_err);
 +}
 +}
 +
 +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
 + error_propagation_struct_cleanup);
 +
 +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
 +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; 
 \
 +Error **auto_errp = \
 +((errp) == NULL || *(errp) == error_abort || *(errp) == 
 error_fatal) ? \
 +&__auto_errp_prop.local_err : \
 +(errp);
 +
 +/*
 + * Third variant:
 + *   Pros:
 + * - simpler movement for functions which don't have local_err yet
 + *   the only thing to do is to call one macro at function start.
 + *   This extremely simplifies Greg's series
 + *   Cons:
 + * - looks like errp shadowing.. Still seems safe.
 + * - must be after all definitions of local variables and before any
 + *   code.
 + * - like v2, several statements in one open macro
 + */
 +#define MAKE_ERRP_SAFE(errp) \
 +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
 +if

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread David Hildenbrand

On 19.09.19 09:41, Vladimir Sementsov-Ogievskiy wrote:
> 19.09.2019 10:32, David Hildenbrand wrote:
>> On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> Here is a proposal (three of them, actually) of auto propagation for
>>> local_err, to not call error_propagate on every exit point, when we
>>> deal with local_err.
>>>
>>> It also may help make Greg's series[1] about error_append_hint smaller.
>>>
>>> See definitions and examples below.
>>>
>>> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
>>> it, the idea will touch same code (and may be more).
>>>
>>> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>>> ---
>>>   include/qapi/error.h | 102 +++
>>>   block.c  |  63 --
>>>   block/backup.c   |   8 +++-
>>>   block/gluster.c  |   7 +++
>>>   4 files changed, 144 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>>> index 3f95141a01..083e061014 100644
>>> --- a/include/qapi/error.h
>>> +++ b/include/qapi/error.h
>>> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>>>   ErrorClass err_class, const char *fmt, ...)
>>>   GCC_FMT_ATTR(6, 7);
>>>   
>>> +typedef struct ErrorPropagator {
>>> +Error **errp;
>>> +Error *local_err;
>>> +} ErrorPropagator;
>>> +
>>> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
>>> +{
>>> +if (prop->local_err) {
>>> +error_propagate(prop->errp, prop->local_err);
>>> +}
>>> +}
>>> +
>>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, 
>>> error_propagator_cleanup);
>>> +
>>> +/*
>>> + * ErrorPropagationPair
>>> + *
>>> + * [Error *local_err, Error **errp]
>>> + *
>>> + * First element is local_err, second is original errp, which is 
>>> propagation
>>> + * target. Yes, errp has a bit another type, so it should be converted.
>>> + *
>>> + * ErrorPropagationPair may be used as errp, which points to local_err,
>>> + * as it's type is compatible.
>>> + */
>>> +typedef Error *ErrorPropagationPair[2];
>>> +
>>> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair 
>>> *arr)
>>> +{
>>> +Error *local_err = (*arr)[0];
>>> +Error **errp = (Error **)(*arr)[1];
>>> +
>>> +if (local_err) {
>>> +error_propagate(errp, local_err);
>>> +}
>>> +}
>>> +
>>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
>>> + error_propagation_pair_cleanup);
>>> +
>>> +/*
>>> + * DEF_AUTO_ERRP
>>> + *
>>> + * Define auto_errp variable, which may be used instead of errp, and
>>> + * *auto_errp may be safely checked to be zero or not, and may be safely
>>> + * used for error_append_hint(). auto_errp is automatically propagated
>>> + * to errp at function exit.
>>> + */
>>> +#define DEF_AUTO_ERRP(auto_errp, errp) \
>>> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
>>> +
>>> +
>>> +/*
>>> + * Another variant:
>>> + *   Pros:
>>> + * - normal structure instead of cheating with array
>>> + * - we can directly use errp, if it's not NULL and don't point to
>>> + *   error_abort or error_fatal
>>> + *   Cons:
>>> + * - we need to define two variables instead of one
>>> + */
>>> +typedef struct ErrorPropagationStruct {
>>> +Error *local_err;
>>> +Error **errp;
>>> +} ErrorPropagationStruct;
>>> +
>>> +static inline void error_propagation_struct_cleanup(ErrorPropagationStruct 
>>> *prop)
>>> +{
>>> +if (prop->local_err) {
>>> +error_propagate(prop->errp, prop->local_err);
>>> +}
>>> +}
>>> +
>>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
>>> + error_propagation_struct_cleanup);
>>> +
>>> +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
>>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>>> +Error **auto_errp = \
>>> +((errp) == NULL || *(errp) == error_abort || *(errp) == 
>>> error_fatal) ? \
>>> +&__auto_errp_prop.local_err : \
>>> +(errp);
>>> +
>>> +/*
>>> + * Third variant:
>>> + *   Pros:
>>> + * - simpler movement for functions which don't have local_err yet
>>> + *   the only thing to do is to call one macro at function start.
>>> + *   This extremely simplifies Greg's series
>>> + *   Cons:
>>> + * - looks like errp shadowing.. Still seems safe.
>>> + * - must be after all definitions of local variables and before any
>>> + *   code.
>>> + * - like v2, several statements in one open macro
>>> + */
>>> +#define MAKE_ERRP_SAFE(errp) \
>>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>>> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
>>> +(errp) = &__auto_errp_prop.local_err; \
>>> +}
>>
>>
>> Using that idea, what about something like this:
>>
>>

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

19.09.2019 10:32, David Hildenbrand wrote:
> On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> Here is a proposal (three of them, actually) of auto propagation for
>> local_err, to not call error_propagate on every exit point, when we
>> deal with local_err.
>>
>> It also may help make Greg's series[1] about error_append_hint smaller.
>>
>> See definitions and examples below.
>>
>> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
>> it, the idea will touch same code (and may be more).
>>
>> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   include/qapi/error.h | 102 +++
>>   block.c  |  63 --
>>   block/backup.c   |   8 +++-
>>   block/gluster.c  |   7 +++
>>   4 files changed, 144 insertions(+), 36 deletions(-)
>>
>> diff --git a/include/qapi/error.h b/include/qapi/error.h
>> index 3f95141a01..083e061014 100644
>> --- a/include/qapi/error.h
>> +++ b/include/qapi/error.h
>> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>>   ErrorClass err_class, const char *fmt, ...)
>>   GCC_FMT_ATTR(6, 7);
>>   
>> +typedef struct ErrorPropagator {
>> +Error **errp;
>> +Error *local_err;
>> +} ErrorPropagator;
>> +
>> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
>> +{
>> +if (prop->local_err) {
>> +error_propagate(prop->errp, prop->local_err);
>> +}
>> +}
>> +
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
>> +
>> +/*
>> + * ErrorPropagationPair
>> + *
>> + * [Error *local_err, Error **errp]
>> + *
>> + * First element is local_err, second is original errp, which is propagation
>> + * target. Yes, errp has a bit another type, so it should be converted.
>> + *
>> + * ErrorPropagationPair may be used as errp, which points to local_err,
>> + * as it's type is compatible.
>> + */
>> +typedef Error *ErrorPropagationPair[2];
>> +
>> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair *arr)
>> +{
>> +Error *local_err = (*arr)[0];
>> +Error **errp = (Error **)(*arr)[1];
>> +
>> +if (local_err) {
>> +error_propagate(errp, local_err);
>> +}
>> +}
>> +
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
>> + error_propagation_pair_cleanup);
>> +
>> +/*
>> + * DEF_AUTO_ERRP
>> + *
>> + * Define auto_errp variable, which may be used instead of errp, and
>> + * *auto_errp may be safely checked to be zero or not, and may be safely
>> + * used for error_append_hint(). auto_errp is automatically propagated
>> + * to errp at function exit.
>> + */
>> +#define DEF_AUTO_ERRP(auto_errp, errp) \
>> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
>> +
>> +
>> +/*
>> + * Another variant:
>> + *   Pros:
>> + * - normal structure instead of cheating with array
>> + * - we can directly use errp, if it's not NULL and don't point to
>> + *   error_abort or error_fatal
>> + *   Cons:
>> + * - we need to define two variables instead of one
>> + */
>> +typedef struct ErrorPropagationStruct {
>> +Error *local_err;
>> +Error **errp;
>> +} ErrorPropagationStruct;
>> +
>> +static inline void error_propagation_struct_cleanup(ErrorPropagationStruct 
>> *prop)
>> +{
>> +if (prop->local_err) {
>> +error_propagate(prop->errp, prop->local_err);
>> +}
>> +}
>> +
>> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
>> + error_propagation_struct_cleanup);
>> +
>> +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>> +Error **auto_errp = \
>> +((errp) == NULL || *(errp) == error_abort || *(errp) == 
>> error_fatal) ? \
>> +&__auto_errp_prop.local_err : \
>> +(errp);
>> +
>> +/*
>> + * Third variant:
>> + *   Pros:
>> + * - simpler movement for functions which don't have local_err yet
>> + *   the only thing to do is to call one macro at function start.
>> + *   This extremely simplifies Greg's series
>> + *   Cons:
>> + * - looks like errp shadowing.. Still seems safe.
>> + * - must be after all definitions of local variables and before any
>> + *   code.
>> + * - like v2, several statements in one open macro
>> + */
>> +#define MAKE_ERRP_SAFE(errp) \
>> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
>> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
>> +(errp) = &__auto_errp_prop.local_err; \
>> +}
> 
> 
> Using that idea, what about something like this:
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 8bfb6684cb..043ad69f8b 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -58,22 +58,42 @@ S390CPU

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread David Hildenbrand

On 18.09.19 15:02, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> Here is a proposal (three of them, actually) of auto propagation for
> local_err, to not call error_propagate on every exit point, when we
> deal with local_err.
> 
> It also may help make Greg's series[1] about error_append_hint smaller.
> 
> See definitions and examples below.
> 
> I'm cc-ing to this RFC everyone from series[1] CC list, as if we like
> it, the idea will touch same code (and may be more).
> 
> [1]: https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg03449.html
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  include/qapi/error.h | 102 +++
>  block.c  |  63 --
>  block/backup.c   |   8 +++-
>  block/gluster.c  |   7 +++
>  4 files changed, 144 insertions(+), 36 deletions(-)
> 
> diff --git a/include/qapi/error.h b/include/qapi/error.h
> index 3f95141a01..083e061014 100644
> --- a/include/qapi/error.h
> +++ b/include/qapi/error.h
> @@ -322,6 +322,108 @@ void error_set_internal(Error **errp,
>  ErrorClass err_class, const char *fmt, ...)
>  GCC_FMT_ATTR(6, 7);
>  
> +typedef struct ErrorPropagator {
> +Error **errp;
> +Error *local_err;
> +} ErrorPropagator;
> +
> +static inline void error_propagator_cleanup(ErrorPropagator *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagator, error_propagator_cleanup);
> +
> +/*
> + * ErrorPropagationPair
> + *
> + * [Error *local_err, Error **errp]
> + *
> + * First element is local_err, second is original errp, which is propagation
> + * target. Yes, errp has a bit another type, so it should be converted.
> + *
> + * ErrorPropagationPair may be used as errp, which points to local_err,
> + * as it's type is compatible.
> + */
> +typedef Error *ErrorPropagationPair[2];
> +
> +static inline void error_propagation_pair_cleanup(ErrorPropagationPair *arr)
> +{
> +Error *local_err = (*arr)[0];
> +Error **errp = (Error **)(*arr)[1];
> +
> +if (local_err) {
> +error_propagate(errp, local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationPair,
> + error_propagation_pair_cleanup);
> +
> +/*
> + * DEF_AUTO_ERRP
> + *
> + * Define auto_errp variable, which may be used instead of errp, and
> + * *auto_errp may be safely checked to be zero or not, and may be safely
> + * used for error_append_hint(). auto_errp is automatically propagated
> + * to errp at function exit.
> + */
> +#define DEF_AUTO_ERRP(auto_errp, errp) \
> +g_auto(ErrorPropagationPair) (auto_errp) = {NULL, (Error *)(errp)}
> +
> +
> +/*
> + * Another variant:
> + *   Pros:
> + * - normal structure instead of cheating with array
> + * - we can directly use errp, if it's not NULL and don't point to
> + *   error_abort or error_fatal
> + *   Cons:
> + * - we need to define two variables instead of one
> + */
> +typedef struct ErrorPropagationStruct {
> +Error *local_err;
> +Error **errp;
> +} ErrorPropagationStruct;
> +
> +static inline void error_propagation_struct_cleanup(ErrorPropagationStruct 
> *prop)
> +{
> +if (prop->local_err) {
> +error_propagate(prop->errp, prop->local_err);
> +}
> +}
> +
> +G_DEFINE_AUTO_CLEANUP_CLEAR_FUNC(ErrorPropagationStruct,
> + error_propagation_struct_cleanup);
> +
> +#define DEF_AUTO_ERRP_V2(auto_errp, errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> +Error **auto_errp = \
> +((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) 
> ? \
> +&__auto_errp_prop.local_err : \
> +(errp);
> +
> +/*
> + * Third variant:
> + *   Pros:
> + * - simpler movement for functions which don't have local_err yet
> + *   the only thing to do is to call one macro at function start.
> + *   This extremely simplifies Greg's series
> + *   Cons:
> + * - looks like errp shadowing.. Still seems safe.
> + * - must be after all definitions of local variables and before any
> + *   code.
> + * - like v2, several statements in one open macro
> + */
> +#define MAKE_ERRP_SAFE(errp) \
> +g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = (errp)}; \
> +if ((errp) == NULL || *(errp) == error_abort || *(errp) == error_fatal) { \
> +(errp) = &__auto_errp_prop.local_err; \
> +}


Using that idea, what about something like this:

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index 8bfb6684cb..043ad69f8b 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -58,22 +58,42 @@ S390CPU *s390_cpu_addr2state(uint16_t cpu_addr)
 return S390_CPU(ms->possible_cpus->cpus[cpu_addr].cpu);
 }
 
+typedef struct ErrorPropagator {
+Error **errp;
+Error *local_err;
+}

Re: [Qemu-block] [Qemu-devel] [PATCH] block/backup: install notifier during creation

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

18.09.2019 23:31, John Snow wrote:
> 
> 
> On 9/10/19 9:23 AM, John Snow wrote:
>>
>>
>> On 9/10/19 4:19 AM, Stefan Hajnoczi wrote:
>>> On Wed, Aug 21, 2019 at 04:01:52PM -0400, John Snow wrote:


 On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote:
> 09.08.2019 23:13, John Snow wrote:
>> Backup jobs may yield prior to installing their handler, because of the
>> job_co_entry shim which guarantees that a job won't begin work until
>> we are ready to start an entire transaction.
>>
>> Unfortunately, this makes proving correctness about transactional
>> points-in-time for backup hard to reason about. Make it explicitly clear
>> by moving the handler registration to creation time, and changing the
>> write notifier to a no-op until the job is started.
>>
>> Reported-by: Vladimir Sementsov-Ogievskiy 
>> Signed-off-by: John Snow 
>> ---
>>block/backup.c | 32 +++-
>>include/qemu/job.h |  5 +
>>job.c  |  2 +-
>>3 files changed, 29 insertions(+), 10 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 07d751aea4..4df5b95415 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_notify(
>>assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
>>assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
>>
>> +/* The handler is installed at creation time; the actual 
>> point-in-time
>> + * starts at job_start(). Transactions guarantee those two points 
>> are
>> + * the same point in time. */
>> +if (!job_started(>common.job)) {
>> +return 0;
>> +}
>
> Hmm, sorry if it is a stupid question, I'm not good in multiprocessing 
> and in
> Qemu iothreads..
>
> job_started just reads job->co. If bs runs in iothread, and therefore 
> write-notifier
> is in iothread, when job_start is called from main thread.. Is it 
> guaranteed that
> write-notifier will see job->co variable change early enough to not miss 
> guest write?
> Should not job->co be volatile for example or something like this?
>
> If not think about this patch looks good for me.
>

 You know, it's a really good question.
 So good, in fact, that I have no idea.

 ¯\_(ツ)_/¯

 I'm fairly certain that IO will not come in until the .clean phase of a
 qmp_transaction, because bdrv_drained_begin(bs) is called during
 .prepare, and we activate the handler (by starting the job) in .commit.
 We do not end the drained section until .clean.

 I'm not fully clear on what threading guarantees we have otherwise,
 though; is it possible that "Thread A" would somehow lift the bdrv_drain
 on an IO thread ("Thread B") and, after that, "Thread B" would somehow
 still be able to see an outdated version of job->co that was set by
 "Thread A"?

 I doubt it; but I can't prove it.
>>>
>>> In the qmp_backup() case (not qmp_transaction()) there is:
>>>
>>>void qmp_drive_backup(DriveBackup *arg, Error **errp)
>>>{
>>>
>>>BlockJob *job;
>>>job = do_drive_backup(arg, NULL, errp);
>>>if (job) {
>>>job_start(>job);
>>>}
>>>}
>>>
>>> job_start() is called without any thread synchronization, which is
>>> usually fine because the coroutine doesn't run until job_start() calls
>>> aio_co_enter().
>>>
>>> Now that the before write notifier has been installed early, there is
>>> indeed a race between job_start() and the write notifier accessing
>>> job->co from an IOThread.
>>>
>>> The write before notifier might see job->co != NULL before job_start()
>>> has finished.  This could lead to issues if job_*() APIs are invoked by
>>> the write notifier and access an in-between job state.
>>>
>>
>> I see. I think in this case, as long as it sees != NULL, that the
>> notifier is actually safe to run. I agree that this might be confusing
>> to verify and could bite us in the future. The worry we had, too, is
>> more the opposite: will it see NULL for too long? We want to make sure
>> that it is registering as true *before the first yield*.
>>
>>> A safer approach is to set a BackupBlockJob variable at the beginning of
>>> backup_run() and check it from the before write notifier.
>>>
>>
>> That's too late, for reasons below.
>>
>>> That said, I don't understand the benefit of this patch and IMO it makes
>>> the code harder to understand because now we need to think about the
>>> created but not started state too.
>>>
>>> Stefan
>>>
>>
>> It's always possible I've hyped myself up into believing there's a
>> problem where there isn't one, but the fear is this:
>>
>> The point in time from a QMP transaction covers the job creation and the
>> job start, but when we start the job it

Re: [Qemu-block] [Qemu-devel] [PATCH v12 2/2] block/backup: fix backup_cow_with_offload for last cluster

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

18.09.2019 23:14, John Snow wrote:
> 
> 
> On 9/17/19 12:07 PM, Vladimir Sementsov-Ogievskiy wrote:
>> We shouldn't try to copy bytes beyond EOF. Fix it.
>>
>> Fixes: 9ded4a0114968e
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> Reviewed-by: Max Reitz 
>> ---
>>   block/backup.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index d8fdbfadfe..89f7f89200 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -161,7 +161,7 @@ static int coroutine_fn 
>> backup_cow_with_offload(BackupBlockJob *job,
>>   assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));
>>   assert(QEMU_IS_ALIGNED(start, job->cluster_size));
>> -    nbytes = MIN(job->copy_range_size, end - start);
>> +    nbytes = MIN(job->copy_range_size, MIN(end, job->len) - start);
> 
> I'm a little confused. I think the patch as written is correct, but I don't 
> know what problem it solves.

last cluster may exceed EOF. And backup_do_cow (who calls  
backup_cow_with_offload) rounds all to clusters.
It's not bad, but we need to crop nbytes before calling actual io functions. 
backup_cow_with_bounce_buffer does the same thing.

> 
> If we're going to allow the caller to pass in an end that's beyond EOF, does 
> that mean we are *requiring* the caller to pass in a value that's aligned?

Actually yes, as we are resetting dirty bitmap.

> 
> We should probably assert what kind of a value we're accepted here, right? We 
> do for start, but should we for 'end' as well?

Yes assertion may be added.

> 
> Then ...
> 
>>   nr_clusters = DIV_ROUND_UP(nbytes, job->cluster_size);
> 
> Don't we just round this right back up immediately anyway? Does that mean we 
> have callers that are passing in an 'end' that's more than 1 job-cluster 
> beyond EOF? That seems like something that should be fixed in the caller, 
> surely?

nr_clusters are used to set/reset dirty bitmap. It's OK. Still, for last 
cluster we can drop it and use nbytes directly. No there should not be such 
callers.
nbytes is used to call blk_co_copy_range, and must be cropped to not exceed EOF.

Also, backup_cow_with_bounce_buffer behave in similar way: it crops nbytes.

Of course, there is a place for good refactoring, but I think not in this 
patch, it's a small bug fix.

> 
>>   bdrv_reset_dirty_bitmap(job->copy_bitmap, start,
>>   job->cluster_size * nr_clusters);
>>
> 


-- 
Best regards,
Vladimir

Re: [Qemu-block] [Qemu-devel] [PATCH v12 1/2] block/backup: fix max_transfer handling for copy_range

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

18.09.2019 22:57, John Snow wrote:
> 
> 
> On 9/17/19 12:07 PM, Vladimir Sementsov-Ogievskiy wrote:
>> Of course, QEMU_ALIGN_UP is a typo, it should be QEMU_ALIGN_DOWN, as we
>> are trying to find aligned size which satisfy both source and target.
>> Also, don't ignore too small max_transfer. In this case seems safer to
>> disable copy_range.
>>
>> Fixes: 9ded4a0114968e
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   block/backup.c | 12 
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/backup.c b/block/backup.c
>> index 763f0d7ff6..d8fdbfadfe 100644
>> --- a/block/backup.c
>> +++ b/block/backup.c
>> @@ -741,12 +741,16 @@ BlockJob *backup_job_create(const char *job_id, 
>> BlockDriverState *bs,
>>   job->cluster_size = cluster_size;
>>   job->copy_bitmap = copy_bitmap;
>>   copy_bitmap = NULL;
>> -    job->use_copy_range = !compress; /* compression isn't supported for it 
>> */
>>   job->copy_range_size = 
>> MIN_NON_ZERO(blk_get_max_transfer(job->common.blk),
>>   blk_get_max_transfer(job->target));
>> -    job->copy_range_size = MAX(job->cluster_size,
>> -   QEMU_ALIGN_UP(job->copy_range_size,
>> - job->cluster_size));
>> +    job->copy_range_size = QEMU_ALIGN_DOWN(job->copy_range_size,
>> +   job->cluster_size);
>> +    /*
>> + * Compression is not supported for copy_range. Also, we don't want to
>> + * handle small max_transfer for copy_range (which currently don't
>> + * handle max_transfer at all).
>> + */
>> +    job->use_copy_range = !compress && job->copy_range_size > 0;
>>   /* Required permissions are already taken with target's blk_new() */
>>   block_job_add_bdrv(>common, "target", target, 0, BLK_PERM_ALL,
>>
> 
> I'm clear on the alignment fix, I'm not clear on the comment about 
> max_transfer and how it relates to copy_range_size being non-zero.
> 
> "small max transfer" -- what happens when it's zero? we're apparently OK with 
> a single cluster, but when it's zero, what happens?

if it zero it means that source or target requires max_transfer less than 
cluster_size. It seems not valid to call copy_range in this case.
Still it's OK to use normal read/write, as they handle max_transfer internally 
in a loop (copy_range doesn't do it).

-- 
Best regards,
Vladimir

Re: [Qemu-block] [RFC] error: auto propagated local_err

2019-09-19 Thread Vladimir Sementsov-Ogievskiy

18.09.2019 21:32, Eric Blake wrote:
> On 9/18/19 1:05 PM, Eric Blake wrote:
> 
 #define MAKE_ERRP_SAFE() \
 g_auto(ErrorPropagationStruct) (__auto_errp_prop) = {.errp = errp}; \
 errp = &__auto_errp_prop.local_err

> 
> I tried to see if this could be done with just a single declaration
> line, as in:
> 
> typedef struct ErrorPropagator {
>  Error **errp;
>  Error *local_err;
> } ErrorPropagator;
> 
> #define MAKE_ERRP_SAFE() \
>g_auto(ErrorPropagator) __auto_errp_prop = { \
>  errp, ((errp = &__auto_errp_prop.local_err), NULL) }
> 
> But sadly, C17 paragraph 6.7.9P23 states:
> 
> "The evaluations of the initialization list expressions are
> indeterminately sequenced with respect to one another and thus the order
> in which any side effects occur is unspecified."
> 
> which does not bode well for the assignment to __auto_errp_prop.errp.
> All changes to errp would have to be within the same initializer.  Maybe:
> 
> #define MAKE_ERRP_SAFE() \
>g_auto(ErrorPropagator) __auto_errp_prop = { \
>  .local_err = (__auto_errp_prop.err = errp, \
>  (errp = &__auto_errp_prop.local_err), NULL) }

Is it guaranteed that .errp will not be initialized to NULL after evaluating of
.local_err initializer?

> 
> but by the time you get that complicated, just using a statement is
> easier to read.
> 


-- 
Best regards,
Vladimir

70 matches

Mail list logo