On 3/25/22 12:05, Thomas Lamprecht wrote:
(re-sending, seems you forgot to hit Reply-All and only sent it to me directly).

On 25.03.22 10:41, Daniel Tschlatscher wrote:
On 3/25/22 09:29, Thomas Lamprecht wrote:
Thanks for the patch, some comments inline.
On 24.03.22 16:44, Daniel Tschlatscher wrote:
please format the commit message according our submission rules:
https://pve.proxmox.com/wiki/Developer_Documentation#Commits_and_Commit_Messages

an output example (maybe even an before/after) excerpt would be nice to have 
here.
Just leave out the ":" after "fix", right? Or is there something else I should 
change?
I actually meant the per-line character length, quoting:
Make sure the line length of the commit's message is not longer than 70 
characters.
Note, HTTPS links are an exception and should not be split.
but yeah, now that you say it, it really should be `fix #3865: ...`


   +typedef struct VmaRestoreState {
   +    BlockBackend *target;
   +    bool write_zeroes;
@@ -649,13 +653,31 @@ index 0000000000..2b1d1cdab3
   +
   +        if (verbose) {
   +            time_t duration = time(NULL) - vmar->start_time;
The resulting accuracy for time in seconds could be not that good, did you make 
any
thoughts or comparison regarding that.
I "built" upon what was here. This is the way that seconds passed were measured 
in the original code and works well enough in this case. time_t has some functionality 
for tracking milliseconds though but I don't think that much fine control over the time 
elapsed has any real benefits here.
sorry but what? Example we have a byte delta of 100 MiB since last report, and 
nothing
guarantees that we're currently very close to a second boundary, so lets see
how the error compares comparing second resolution to some millisecond ones.

Using second granularity anything between 1 <= t < 2 gets you 100 MiB/s
On the other hand:

                     "actual"     "1s cut off"   "error"
100 MiB / 1.000s = 100.0 MiB/s   100.0 MiB/s      0.0
100 MiB / 1.050s =  95.2 MiB/s   100.0 MiB/s      4.8
100 MiB / 1.123s =  89.0 MiB/s   100.0 MiB/s     11.0
100 MiB / 1.130s =  88.5 MiB/s   100.0 MiB/s     11.5
100 MiB / 1.500s =  66.7 MiB/s   100.0 MiB/s     33.3
100 MiB / 1.750s =  57.1 MiB/s   100.0 MiB/s     42.9
100 MiB / 1.999s =  50.0 MiB/s   100.0 MiB/s     50.0

So even just 7 ms (1.123s vs 1.13s) make a 0.5 MiB/s difference, IMO already 
relevant.

Wouldn't dare to call that "working well enough", or are we talking about
different things?

++
++                if (delta != 0)
++                    bps /= delta;
currently, if delta would be null you'd print a size unit as throughput unit?

This is due to the underlying code. The function that is wrapping this part of the 
code is called multiple times every second (something like ~20 times / sec) and in 
the original code a print statement would only occur after the percentage of bytes 
transferred increased by at least 1%.>
The time elapsed between an increase in percent could be less than a second. In 
this case the output simply prints how much bytes were transferred since this 
second began. This means the value for throughput would at the time of the 
print be always lower than the real world B/s and only the last print in that 
second would represent the actual B/s.
This just doesn't makes sense to me, the if clauses enforces that its called at
max every 2s (or 100% edge case) already, and using wrong units is using wrong
units, you cannot really argue basic math/physics away..

The 100% edge case is the problem for the most part.

It comes down to the inaccuracy of keeping time in seconds rather than something more granular. While it is true that in 0 time no actual data could be transferred, 0 in this case means somewhere from 0 => 0.999 seconds. Delta could be zero while the transferred bytes for example count 20MB.

Just printing 0 would be confusing because *some* data was in fact transferred -> reflected in the bps variable. It could also give the user a false impression that their disk is slow or something similar if they see a lot of "0 B/s" entries (which was the actual reasoning here). The current implementation means, if delta is 0, bps = 20MB and in reality 0.33 seconds have elapsed that the program prints "20MB/s". In actuality the expected rate of transfer would be "60 MB/s" though.

My thinking not using milliseconds here was: Even though I could calculate how much the transfer rate per second is, during the time of the print, it is still an approximation. Imagine following scenario: In the first 0.33 secs 20MB/s are  transferred (=> print goes out with 60 MB/s)
In the next 0.33 secs only 10MB/s are transferred
In the last 0.33 secs only 0.5MB/s are transferred
In this case "60MB/s" was printed but 30.5MB have been transferred.

The current implementation might be severely understating the actual throughput, but will never show a value for B/s that is higher than the actual throughput per second. The unaccounted for bytes here would be then reflected in the following output. The millisecond approach would have both the same and opposite problem, e.g. if delta is only a few milliseconds and the throughput in that time a lot higher than average, then the printed B/s value could be highly exaggerated.

I let the current implementation be because, in practice, at the end of the backup restore there are usually a lot of sparse bytes (I hope I am using this term correctly). This means that the restore would go from 56% to 100% in less than a second.  Simply writing 0 here would probably confuse the user more than anything because the value suddenly jumped by e.g.  6GiB with 0B/s.

Still, I do concede that using milliseconds just makes more sense because it's the more "proper" way to implement something like this and especially because of the unit. In the current implementation it would be something like B/last-output. Also, it seems to me that the rate of throughput plummeting or skyrocketing in the span of less than a second is rather unlikely, the millisecond method would statistically just be more correct.
I will change it in v3.

Nonetheless, I later adapted this code to behave like the backup create 
function does, which updates every 3 seconds, eliminating this problem 
entirely. This is therefore dead code anyway.

later where?

Actual improvement, track duration in milliseconds or less and still catch the 
edge
case for when delta is zero and just set throughput to zero then (as no data can
transfer in zero time).

Later in the sense of time. I rewrote the if statement above to only execute if delta is greater than 2 seconds (before it updated with every percent increase). _>> __"eliminating this problem entirely"_ is, I retract, not true though because I forgot about the check whether progress is at 100%, in which case delta could still be 0.

would IMO be worth it to factor above lines out in a static local helper to 
avoid
crowding this function to much, e.g. (types may be adapted if reasonable):

print_restore_progress(uint64_t total_byte, uint64_t restored_byte, uint64_t 
duration_ms);

I would handle the last_X statics in there and also re-calculate the percentage 
as float
in there, so that we can print a similar amount of digits after the decimal 
place like
we do in backup.
A function sounds good, though I am not quite sure what you mean by "print a similar 
amount of digits after the decimal place like we do in backup"
Hmm, I thought percentage isn't a integer there, so meant something like 5.1%,
but apparently it is a plain integer (may just have been in an experiment done
once I rewrote that whole thing a bit ago), so for now you can ignore that.

_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to