Re: [PATCH] expand the units that pg_size_pretty supports on output
On Thu, 8 Jul 2021 at 05:44, David Christensen wrote: > Enclosed is the patch to change the return type to numeric, as well as one > for expanding units to > add PB and EB. I ended up not changing the return type of pg_size_bytes(). > I figured that PB and EB are probably good enough additions at this point, so > we can debate whether > to add the others. Per Tom's concern both with changing the return type of pg_size_bytes() and his and my concern about going too far adding more units, I've adjusted your patch to only add petabytes and pushed it. The maximum range of BIGINT is only 8 exabytes, so the BIGINT version would never show in exabytes anyway. It would still be measuring in petabytes at the 64-bit range limit. After a bit of searching, I found reports that the estimated entire stored digital data on Earth as of 2020 to be 59 zettabytes, or about 0.06 yottabytes. I feel like we've gone far enough by adding petabytes today. Maybe that's worth revisiting in a couple of storage generations. After we're done there, we can start working on the LSN wraparound code. David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Thu, 8 Jul 2021 at 14:38, David Rowley wrote: > > I gave it a bit of exercise by running pgbench and calling this procedure: > > It ran 8526956 times, so with the loop that's 8.5 billion random > numbers. No variations between the two functions. I got the same > after removing the 0 - to test positive numbers. Wow, that's a lot of testing! I just tried a few hand-picked edge cases. > If you like, I can push this in my morning, or if you'd rather do it > yourself, please go ahead. No, I didn't get as much time as I thought I would today, so please go ahead. Regards, Dean
Re: [PATCH] expand the units that pg_size_pretty supports on output
tOn Thu, 8 Jul 2021 at 20:23, Dean Rasheed wrote: > > > On Thu, 8 Jul 2021 at 13:31, David Rowley wrote: > > Here's a patch which I believe makes pg_size_pretty() and > > pg_size_pretty_numeric() match in regards to negative values. > > LGTM, except I think it's worth also making the numeric code not refer > to bit shifting either. > > > Maybe this plus your regression test would be ok to back-patch? > > +1 > > Here's an update with matching updates to the numeric code, plus the > regression tests. Looks good. I gave it a bit of exercise by running pgbench and calling this procedure: CREATE OR REPLACE PROCEDURE public.test_size_pretty2() LANGUAGE plpgsql AS $procedure$ declare b bigint; begin FOR i IN 1..1000 LOOP b := 0 - (random() * 9223372036854775807)::bigint; if pg_size_pretty(b) <> pg_size_pretty(b::numeric) then raise notice '%. % != %', b, pg_size_pretty(b), pg_size_pretty(b::numeric); end if; END LOOP; END; $procedure$ It ran 8526956 times, so with the loop that's 8.5 billion random numbers. No variations between the two functions. I got the same after removing the 0 - to test positive numbers. If you like, I can push this in my morning, or if you'd rather do it yourself, please go ahead. David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Thu, 8 Jul 2021 at 07:31, Tom Lane wrote: > > David Christensen writes: > > Enclosed is the patch to change the return type to numeric, as well as one > > for expanding units to > > add PB and EB. > > Can we really get away with changing the return type? That would > by no stretch of the imagination be free; one could expect breakage > of a few user views, for example. That's a good point. We should probably leave it alone then. I had had it in mind that it might be ok since we did this for extract() in 14. At least we have date_part() as a backup there. I'm fine to leave the return value of pg_size_bytes as-is. > Independently of that, I'm pretty much -1 on going further than PB. > Even if the additional abbreviations you mention are actually recognized > standards, I think not that many people are familiar with them, and such > input is way more likely to be a typo than intended data. I'm fine with that too. In [1] I mentioned my concerns with adding all the defined units up to Yottabyte. David reduced that down to just exabytes, but I think if we're keeping pg_size_bytes returning bigint then drawing the line at PB seems ok to me. Anything more than pg_size_bytes('8 EB') would overflow. David [1] https://www.postgresql.org/message-id/caaphdvp9ym+rsqngosrpjh+j6tj1tfbhft+jolff_rbzq1e...@mail.gmail.com
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Thu, 8 Jul 2021 at 05:30, David Rowley wrote: > > On Thu, 8 Jul 2021 at 13:31, David Rowley wrote: > > It feels like if we're going to fix this negative rounding thing then > > we should maybe do it and backpatch a fix then rebase this work on top > > of that. Yes, that was my thinking too. > Here's a patch which I believe makes pg_size_pretty() and > pg_size_pretty_numeric() match in regards to negative values. LGTM, except I think it's worth also making the numeric code not refer to bit shifting either. > Maybe this plus your regression test would be ok to back-patch? +1 Here's an update with matching updates to the numeric code, plus the regression tests. Regards, Dean diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c new file mode 100644 index 9c39e7d..d03e103 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -31,8 +31,8 @@ #include "utils/relmapper.h" #include "utils/syscache.h" -/* Divide by two and round towards positive infinity. */ -#define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2) +/* Divide by two and round away from zero */ +#define half_rounded(x) (((x) + ((x) < 0 ? -1 : 1)) / 2) /* Return physical size of directory contents, or 0 if dir doesn't exist */ static int64 @@ -542,25 +542,29 @@ pg_size_pretty(PG_FUNCTION_ARGS) snprintf(buf, sizeof(buf), INT64_FORMAT " bytes", size); else { - size >>= 9;/* keep one extra bit for rounding */ + /* + * We use divide instead of bit shifting so that behavior matches for + * both positive and negative size values. + */ + size /= (1 << 9);/* keep one extra bit for rounding */ if (Abs(size) < limit2) snprintf(buf, sizeof(buf), INT64_FORMAT " kB", half_rounded(size)); else { - size >>= 10; + size /= (1 << 10); if (Abs(size) < limit2) snprintf(buf, sizeof(buf), INT64_FORMAT " MB", half_rounded(size)); else { -size >>= 10; +size /= (1 << 10); if (Abs(size) < limit2) snprintf(buf, sizeof(buf), INT64_FORMAT " GB", half_rounded(size)); else { - size >>= 10; + size /= (1 << 10); snprintf(buf, sizeof(buf), INT64_FORMAT " TB", half_rounded(size)); } @@ -621,13 +625,13 @@ numeric_half_rounded(Numeric n) } static Numeric -numeric_shift_right(Numeric n, unsigned count) +numeric_truncated_divide(Numeric n, int64 divisor) { Datum d = NumericGetDatum(n); Datum divisor_numeric; Datum result; - divisor_numeric = NumericGetDatum(int64_to_numeric(((int64) 1) << count)); + divisor_numeric = NumericGetDatum(int64_to_numeric(divisor)); result = DirectFunctionCall2(numeric_div_trunc, d, divisor_numeric); return DatumGetNumeric(result); } @@ -650,8 +654,8 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) else { /* keep one extra bit for rounding */ - /* size >>= 9 */ - size = numeric_shift_right(size, 9); + /* size /= (1 << 9) */ + size = numeric_truncated_divide(size, 1 << 9); if (numeric_is_less(numeric_absolute(size), limit2)) { @@ -660,8 +664,9 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } else { - /* size >>= 10 */ - size = numeric_shift_right(size, 10); + /* size /= (1 << 10) */ + size = numeric_truncated_divide(size, 1 << 10); + if (numeric_is_less(numeric_absolute(size), limit2)) { size = numeric_half_rounded(size); @@ -669,8 +674,8 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } else { -/* size >>= 10 */ -size = numeric_shift_right(size, 10); +/* size /= (1 << 10) */ +size = numeric_truncated_divide(size, 1 << 10); if (numeric_is_less(numeric_absolute(size), limit2)) { @@ -679,8 +684,8 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } else { - /* size >>= 10 */ - size = numeric_shift_right(size, 10); + /* size /= (1 << 10) */ + size = numeric_truncated_divide(size, 1 << 10); size = numeric_half_rounded(size); result = psprintf("%s TB", numeric_to_cstring(size)); } diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out new file mode 100644 index e901a2c..29804ae --- a/src/test/regress/expected/dbsize.out +++ b/src/test/regress/expected/dbsize.out @@ -35,6 +35,48 @@ SELECT size, pg_size_pretty(size), pg_si 1000.5 | 909 TB | -909 TB (12 rows) +-- test where units change up +SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM +(VALUES (10239::bigint), (10240::bigint), +(10485247::bigint), (10485248::bigint), +(10736893951::bigint), (10736893952::bigint), +(10994579406847::bigint), (10994579406848::bigint), +(11258449312612351::bigint), (11258449312612352::bigint)) x(size); + size| pg_size_pretty | pg_size_pretty +---++ + 10239 | 10239 bytes| -10239 bytes + 10240 | 10 kB | -10 kB +
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Thu, 8 Jul 2021 at 13:31, David Rowley wrote: > It feels like if we're going to fix this negative rounding thing then > we should maybe do it and backpatch a fix then rebase this work on top > of that. Here's a patch which I believe makes pg_size_pretty() and pg_size_pretty_numeric() match in regards to negative values. Maybe this plus your regression test would be ok to back-patch? David adjust_pg_size_pretty_rounding_for_negative_values.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Thu, 8 Jul 2021 at 01:32, Dean Rasheed wrote: > Hmm, this looked easy, but... > > It occurred to me that there ought to be regression tests for the edge > cases where it steps from one unit to the next. So, in the style of > the existing regression tests, I tried the following: > > SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM > (VALUES (10239::bigint), (10240::bigint), > (10485247::bigint), (10485248::bigint), > (10736893951::bigint), (10736893952::bigint), > (10994579406847::bigint), (10994579406848::bigint), > (11258449312612351::bigint), (11258449312612352::bigint)) x(size); > 11258449312612352 | 10240 TB | -10239 TB Hmm, yeah, I noticed that pg_size_pretty(bigint) vs pg_size_pretty(numeric) didn't always match when I was testing this patch, but I was more focused on having my results matching the unpatched version than I was with making sure bigint and numeric matched. I imagine this must date back to 8a1fab36ab. Do you feel like it's something this patch should fix? I was mostly hoping to keep this patch about rewriting the code to both make it easier to add new units and also to make it easier to keep all 3 functions in sync. It feels like if we're going to fix this negative rounding thing then we should maybe do it and backpatch a fix then rebase this work on top of that. What are your thoughts? David
Re: [PATCH] expand the units that pg_size_pretty supports on output
Tom Lane writes: > David Christensen writes: >> Enclosed is the patch to change the return type to numeric, as well as one >> for expanding units to >> add PB and EB. > > Can we really get away with changing the return type? That would > by no stretch of the imagination be free; one could expect breakage > of a few user views, for example. Hmm, that's a good point, and we can't really make the return type polymorphic (being as there isn't a source type of the given return value). > Independently of that, I'm pretty much -1 on going further than PB. > Even if the additional abbreviations you mention are actually recognized > standards, I think not that many people are familiar with them, and such > input is way more likely to be a typo than intended data. If we do go ahead and restrict the expansion to just PB, the return value of pg_size_bytes() would still support up to 8192 PB before running into range limitations. I assume it's not worth creating a pg_size_bytes_numeric() with the full range of supported units, but that is presumably an option as well. Best, David
Re: [PATCH] expand the units that pg_size_pretty supports on output
David Christensen writes: > Enclosed is the patch to change the return type to numeric, as well as one > for expanding units to > add PB and EB. Can we really get away with changing the return type? That would by no stretch of the imagination be free; one could expect breakage of a few user views, for example. Independently of that, I'm pretty much -1 on going further than PB. Even if the additional abbreviations you mention are actually recognized standards, I think not that many people are familiar with them, and such input is way more likely to be a typo than intended data. regards, tom lane
Re: [PATCH] expand the units that pg_size_pretty supports on output
David Rowley writes: > On Wed, 7 Jul 2021 at 02:46, David Christensen > wrote: >> if we do decide to expand the units table there will be a >> few additional changes (most significantly, the return value of >> `pg_size_bytes()` will need to switch >> to `numeric`). > > I wonder if it's worth changing pg_size_bytes() to return NUMERIC > regardless of if we add any additional units or not. > > Would you like to create 2 patches, one to change the return type and > another to add the new units, both based on top of the v2 patch I sent > earlier? > > David Enclosed is the patch to change the return type to numeric, as well as one for expanding units to add PB and EB. If we decide to expand further, the current implementation will need to change, as ZB and YB have 70 and 80 bits needing to be shifted accordingly, so int64 isn't enough to hold it. (I fixed this particular issue in the original version of this patch, so there is at least a blueprint of how to fix.) I figured that PB and EB are probably good enough additions at this point, so we can debate whether to add the others. Best, David >From 57bd2eafff5404313426a10f63b0b7098314998a Mon Sep 17 00:00:00 2001 From: David Christensen Date: Wed, 7 Jul 2021 11:46:09 -0500 Subject: [PATCH] Make pg_size_bytes() return numeric instead of bigint --- src/backend/utils/adt/dbsize.c | 7 --- src/include/catalog/pg_proc.dat | 2 +- src/test/regress/expected/dbsize.out | 17 + src/test/regress/sql/dbsize.sql | 6 -- 4 files changed, 22 insertions(+), 10 deletions(-) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index 4b7331d85c..a5a3ebe34e 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -813,10 +813,11 @@ pg_size_bytes(PG_FUNCTION_ARGS) } } - result = DatumGetInt64(DirectFunctionCall1(numeric_int8, - NumericGetDatum(num))); + /* now finally truncate, since this is always in integer-like units */ + num = DatumGetNumeric(DirectFunctionCall1(numeric_ceil, + NumericGetDatum(num))); - PG_RETURN_INT64(result); + PG_RETURN_NUMERIC(num); } /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fde251fa4f..73326e3618 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -7187,7 +7187,7 @@ prosrc => 'pg_size_pretty_numeric' }, { oid => '3334', descr => 'convert a size in human-readable format with size units into bytes', - proname => 'pg_size_bytes', prorettype => 'int8', proargtypes => 'text', + proname => 'pg_size_bytes', prorettype => 'numeric', proargtypes => 'text', prosrc => 'pg_size_bytes' }, { oid => '2997', descr => 'disk space usage for the specified table, including TOAST, free space and visibility map', diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out index e901a2c92a..2950e017d0 100644 --- a/src/test/regress/expected/dbsize.out +++ b/src/test/regress/expected/dbsize.out @@ -101,6 +101,19 @@ SELECT size, pg_size_bytes(size) FROM -.0 gb | 0 (8 rows) +-- valid inputs outside bigint range (previous errors) +SELECT pg_size_bytes('9223372036854775807.9'); +pg_size_bytes +- + 9223372036854775808 +(1 row) + +SELECT pg_size_bytes('1e100'); + pg_size_bytes +--- + 1 +(1 row) + -- invalid inputs SELECT pg_size_bytes('1 AB'); ERROR: invalid size: "1 AB" @@ -114,10 +127,6 @@ SELECT pg_size_bytes('1 AB A'); ERROR: invalid size: "1 AB A" DETAIL: Invalid size unit: "AB A". HINT: Valid units are "bytes", "kB", "MB", "GB", and "TB". -SELECT pg_size_bytes('9223372036854775807.9'); -ERROR: bigint out of range -SELECT pg_size_bytes('1e100'); -ERROR: bigint out of range SELECT pg_size_bytes('1e100'); ERROR: value overflows numeric format SELECT pg_size_bytes('1 byte'); -- the singular "byte" is not supported diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql index d10a4d7f68..e88184e9c9 100644 --- a/src/test/regress/sql/dbsize.sql +++ b/src/test/regress/sql/dbsize.sql @@ -30,12 +30,14 @@ SELECT size, pg_size_bytes(size) FROM (VALUES ('-1.'), ('-1.kb'), ('-1. kb'), ('-0. gb'), ('-.1'), ('-.1kb'), ('-.1 kb'), ('-.0 gb')) x(size); +-- valid inputs outside bigint range (previous errors) +SELECT pg_size_bytes('9223372036854775807.9'); +SELECT pg_size_bytes('1e100'); + -- invalid inputs SELECT pg_size_bytes('1 AB'); SELECT pg_size_bytes('1 AB A'); SELECT pg_size_bytes('1 AB A'); -SELECT pg_size_bytes('9223372036854775807.9'); -SELECT pg_size_bytes('1e100'); SELECT
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 03:47, David Rowley wrote: > > Updated patch attached. > Hmm, this looked easy, but... It occurred to me that there ought to be regression tests for the edge cases where it steps from one unit to the next. So, in the style of the existing regression tests, I tried the following: SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (VALUES (10239::bigint), (10240::bigint), (10485247::bigint), (10485248::bigint), (10736893951::bigint), (10736893952::bigint), (10994579406847::bigint), (10994579406848::bigint), (11258449312612351::bigint), (11258449312612352::bigint)) x(size); size| pg_size_pretty | pg_size_pretty ---++ 10239 | 10239 bytes| -10239 bytes 10240 | 10 kB | -10 kB 10485247 | 10239 kB | -10 MB 10485248 | 10 MB | -10 MB 10736893951 | 10239 MB | -10 GB 10736893952 | 10 GB | -10 GB 10994579406847 | 10239 GB | -10 TB 10994579406848 | 10 TB | -10 TB 11258449312612351 | 10239 TB | -10239 TB 11258449312612352 | 10240 TB | -10239 TB (10 rows) SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (VALUES (10239::numeric), (10240::numeric), (10485247::numeric), (10485248::numeric), (10736893951::numeric), (10736893952::numeric), (10994579406847::numeric), (10994579406848::numeric), (11258449312612351::numeric), (11258449312612352::numeric)) x(size); size| pg_size_pretty | pg_size_pretty ---++ 10239 | 10239 bytes| -10239 bytes 10240 | 10 kB | -10 kB 10485247 | 10239 kB | -10239 kB 10485248 | 10 MB | -10 MB 10736893951 | 10239 MB | -10239 MB 10736893952 | 10 GB | -10 GB 10994579406847 | 10239 GB | -10239 GB 10994579406848 | 10 TB | -10 TB 11258449312612351 | 10239 TB | -10239 TB 11258449312612352 | 10240 TB | -10240 TB (10 rows) Under the assumption that what we're trying to achieve here is schoolbook rounding (ties away from zero), the numeric results are correct and the bigint results are wrong. The reason is that bit shifting isn't the same as division for negative numbers, since bit shifting rounds towards negative infinity whereas division rounds towards zero (truncates), which is what I think we really need. Regards, Dean
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 02:46, David Christensen wrote: > if we do decide to expand the units table there will be a > few additional changes (most significantly, the return value of > `pg_size_bytes()` will need to switch > to `numeric`). I wonder if it's worth changing pg_size_bytes() to return NUMERIC regardless of if we add any additional units or not. Would you like to create 2 patches, one to change the return type and another to add the new units, both based on top of the v2 patch I sent earlier? David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 02:54, Tom Lane wrote: > > Minor nit: use "const char *text" in the struct declaration, so > that all of the static data can be placed in fixed storage. Thanks for pointing that out. > David Rowley writes: > > (I'm not sure why pgindent removed the space between != and NULL, but > > it did, so I left it.) > > It did that because "text" is a typedef name, so it's a bit confused > about whether the statement is really a declaration. Personally I'd > have used "name" or something like that for that field, anyway. I should have thought of that. Thanks for highlighting it. I've renamed the field. Updated patch attached. David v2-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 7 Jul 2021 at 00:51, Dean Rasheed wrote: > 10. half_round(21) == 11 :: 20 >> 1 = 10 > > The correct result should be 10 (it would be very odd to claim that > 10241 bytes should be displayed as 11kb), but the half-rounding keeps > rounding up at each stage. > > That's a general property of rounding -- you need to be very careful > when rounding more than once, since otherwise errors will propagate. > C.f. 4083f445c0, which removed a double-round in numeric sqrt(). Thanks. I've adjusted the patch to re-add the round bool flag and get rid of the rightShift field. I'm now calculating how many bits to shift right by based on the difference between the unitbits of the current and next unit then taking 1 bit less if the next unit does half rounding and the current one does not, or adding an extra bit on in the opposite case. I'll post another patch shortly. David
Re: [PATCH] expand the units that pg_size_pretty supports on output
David Rowley writes: > Does anyone want to have a look over this? If not, I plan to push it > in the next day or so. Minor nit: use "const char *text" in the struct declaration, so that all of the static data can be placed in fixed storage. > (I'm not sure why pgindent removed the space between != and NULL, but > it did, so I left it.) It did that because "text" is a typedef name, so it's a bit confused about whether the statement is really a declaration. Personally I'd have used "name" or something like that for that field, anyway. regards, tom lane
Re: [PATCH] expand the units that pg_size_pretty supports on output
David Rowley writes: > On Mon, 5 Jul 2021 at 20:00, David Rowley wrote: >> I don't really like the fact that I had to add the doHalfRound field >> to get the same rounding behaviour as the original functions. I'm >> wondering if it would just be too clever just to track how many bits >> we've shifted right by in pg_size_pretty* and compare that to the >> value of multishift for the current unit and do appropriate rounding >> to get us to the value of multishift. In theory, we could just keep >> calling the half_rounded macro until we make it to the multishift >> value. My current thoughts are that it's unlikely that anyone would >> twiddle with the size_pretty_units array in such a way for the extra >> code to be worth it. Maybe someone else feels differently. > > I made another pass over this and ended up removing the doHalfRound > field in favour of just doing rounding based on the previous > bitshifts. > > I did a few other tidy ups and I think it's a useful patch as it > reduces the amount of code a bit and makes it dead simple to add new > units in the future. Most importantly it'll help keep pg_size_pretty, > pg_size_pretty_numeric and pg_size_bytes all in sync in regards to > what units they support. > > Does anyone want to have a look over this? If not, I plan to push it > in the next day or so. > > (I'm not sure why pgindent removed the space between != and NULL, but > it did, so I left it.) > > David I like the approach you took here; much cleaner to have one table for all of the individual codepaths. Testing worked as expected; if we do decide to expand the units table there will be a few additional changes (most significantly, the return value of `pg_size_bytes()` will need to switch to `numeric`). Thanks, David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 6 Jul 2021 at 13:15, David Rowley wrote: > > Can you give an example where calling half_rounded too many times will > give the wrong value? Keeping in mind we call half_rounded the number > of times that the passed in value would need to be left-shifted by to > get the equivalent truncated value. > ./half_rounded 10241 10 1. half_round(10241) == 5121 :: 10241 >> 1 = 5120 2. half_round(5121) == 2561 :: 5120 >> 1 = 2560 3. half_round(2561) == 1281 :: 2560 >> 1 = 1280 4. half_round(1281) == 641 :: 1280 >> 1 = 640 5. half_round(641) == 321 :: 640 >> 1 = 320 6. half_round(321) == 161 :: 320 >> 1 = 160 7. half_round(161) == 81 :: 160 >> 1 = 80 8. half_round(81) == 41 :: 80 >> 1 = 40 9. half_round(41) == 21 :: 40 >> 1 = 20 10. half_round(21) == 11 :: 20 >> 1 = 10 The correct result should be 10 (it would be very odd to claim that 10241 bytes should be displayed as 11kb), but the half-rounding keeps rounding up at each stage. That's a general property of rounding -- you need to be very careful when rounding more than once, since otherwise errors will propagate. C.f. 4083f445c0, which removed a double-round in numeric sqrt(). To be clear, I'm not saying that the current code half-rounds more than once, just that it reads as if it does. Regards, Dean
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 6 Jul 2021 at 23:39, Dean Rasheed wrote: > When I first read this: > > +/* half-round until we get down to unitBits */ > +while (rightshifts++ < unit->unitBits) > +size = half_rounded(size); > > it looked to me like it would be invoking half_rounded() multiple > times, which raised alarm bells because that would risk rounding the > wrong way. Eventually I realised that by the time it reaches that, > rightshifts will always equal unit->unitBits or unit->unitBits - 1, so > it'll never do more than one half-round, which is important. It's true that based on how the units table is set up now, it'll only ever do it once for all but the first loop. I wrote the attached .c file just to try to see if it ever goes wrong and I didn't manage to find any inputs where it did. I always seem to get the half rounded value either the same as the shifted value or 1 higher towards positive infinity $ ./half_rounded -102 3 1. half_round(-102) == -51 :: -102 >> 1 = -51 2. half_round(-51) == -25 :: -51 >> 1 = -26 3. half_round(-25) == -12 :: -26 >> 1 = -13 $ ./half_rounded 6432 3 1. half_round(6432) == 3216 :: 6432 >> 1 = 3216 2. half_round(3216) == 1608 :: 3216 >> 1 = 1608 3. half_round(1608) == 804 :: 1608 >> 1 = 804 Can you give an example where calling half_rounded too many times will give the wrong value? Keeping in mind we call half_rounded the number of times that the passed in value would need to be left-shifted by to get the equivalent truncated value. David #define half_rounded(x) (((x) + ((x) < 0 ? 0 : 1)) / 2) #include #include int main(int argc, char **argv) { int num, times; int i; int r, shift; if (argc < 3) { printf("Usage: %s \n"); return -1; } num = atoi(argv[1]); times = atoi(argv[2]); r = num; shift = num; for (i = 0; i < times; i++) { int rr = half_rounded(r); printf("%d. half_round(%d) == %d :: %d >> 1 = %d\n", i+1, r, rr, shift, shift >> 1); r = rr; shift >>= 1; } return 0; }
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 6 Jul 2021 at 10:20, David Rowley wrote: > > I made another pass over this and ended up removing the doHalfRound > field in favour of just doing rounding based on the previous > bitshifts. > When I first read this: +/* half-round until we get down to unitBits */ +while (rightshifts++ < unit->unitBits) +size = half_rounded(size); it looked to me like it would be invoking half_rounded() multiple times, which raised alarm bells because that would risk rounding the wrong way. Eventually I realised that by the time it reaches that, rightshifts will always equal unit->unitBits or unit->unitBits - 1, so it'll never do more than one half-round, which is important. So perhaps using doHalfRound would be clearer, but it could just be a local variable tracking whether or not it's the first time through the loop. Regards, Dean
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Mon, 5 Jul 2021 at 20:00, David Rowley wrote: > I don't really like the fact that I had to add the doHalfRound field > to get the same rounding behaviour as the original functions. I'm > wondering if it would just be too clever just to track how many bits > we've shifted right by in pg_size_pretty* and compare that to the > value of multishift for the current unit and do appropriate rounding > to get us to the value of multishift. In theory, we could just keep > calling the half_rounded macro until we make it to the multishift > value. My current thoughts are that it's unlikely that anyone would > twiddle with the size_pretty_units array in such a way for the extra > code to be worth it. Maybe someone else feels differently. I made another pass over this and ended up removing the doHalfRound field in favour of just doing rounding based on the previous bitshifts. I did a few other tidy ups and I think it's a useful patch as it reduces the amount of code a bit and makes it dead simple to add new units in the future. Most importantly it'll help keep pg_size_pretty, pg_size_pretty_numeric and pg_size_bytes all in sync in regards to what units they support. Does anyone want to have a look over this? If not, I plan to push it in the next day or so. (I'm not sure why pgindent removed the space between != and NULL, but it did, so I left it.) David v1-0001-Use-lookup-table-for-units-in-pg_size_pretty-and-.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 30 Jun 2021 at 05:11, David Christensen wrote: > 1) A basic refactor of the existing code to easily handle expanding the > units we use into a table-based format. This also includes changing the > return value of `pg_size_bytes()` from an int64 into a numeric, and > minor test adjustments to reflect this. This is not quite what I had imagined when you said about adding a table to make it easier to add new units in the future. I expected a single table that handles all units, not just the ones above kB and not one for each function. There are actually two pg_size_pretty functions, one for BIGINT and one for NUMERIC. I see you only changed the NUMERIC version. I'd expect them both to have the same treatment and use the same table so there's consistency between the two functions. The attached is more like what I had in mind. There's a very small net reduction in lines of code with this and it also helps keep pg_size_pretty() and pg_size_pretty_numeric() in sync. I don't really like the fact that I had to add the doHalfRound field to get the same rounding behaviour as the original functions. I'm wondering if it would just be too clever just to track how many bits we've shifted right by in pg_size_pretty* and compare that to the value of multishift for the current unit and do appropriate rounding to get us to the value of multishift. In theory, we could just keep calling the half_rounded macro until we make it to the multishift value. My current thoughts are that it's unlikely that anyone would twiddle with the size_pretty_units array in such a way for the extra code to be worth it. Maybe someone else feels differently. David tidy_up_pg_size_pretty_functions.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
shinya11.k...@nttdata.com writes: >>I had not really looked at the patch, but if there's a cleanup portion to the >>same >>patch as you're adding the YB too, then maybe it's worth separating those out >>into another patch so that the two can be considered independently. > > I agree with this opinion. It seems to me that we should think about units > and refactoring separately. > Sorry for the confusion. > > Best regards, > Shinya Kato Hi folks, Had some time to rework this patch from the two that had previously been here into two separate parts: 1) A basic refactor of the existing code to easily handle expanding the units we use into a table-based format. This also includes changing the return value of `pg_size_bytes()` from an int64 into a numeric, and minor test adjustments to reflect this. 2) Expanding the units that both pg_size_bytes() and pg_size_pretty() recognize up through Yottabytes. This includes documentation and test updates to reflect the changes made here. How many additional units we add here is up for discussion (inevitably), but my opinion remains that there is no harm in supporting all units available. Best, David >From ac30b06e3ddcb57eebb380560c2f4a47430dfd74 Mon Sep 17 00:00:00 2001 From: David Christensen Date: Tue, 29 Jun 2021 10:20:05 -0500 Subject: [PATCH 1/2] Refactor pg_size_pretty and pg_size_bytes to allow for supported unit expansion --- src/backend/utils/adt/dbsize.c | 90 src/include/catalog/pg_proc.dat | 2 +- src/test/regress/expected/dbsize.out | 4 -- src/test/regress/sql/dbsize.sql | 2 - 4 files changed, 53 insertions(+), 45 deletions(-) diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index 9c39e7d3b3..df08845932 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -638,7 +638,7 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) Numeric size = PG_GETARG_NUMERIC(0); Numeric limit, limit2; - char *result; + char *result = NULL; limit = int64_to_numeric(10 * 1024); limit2 = int64_to_numeric(10 * 1024 * 2 - 1); @@ -660,31 +660,32 @@ pg_size_pretty_numeric(PG_FUNCTION_ARGS) } else { - /* size >>= 10 */ - size = numeric_shift_right(size, 10); - if (numeric_is_less(numeric_absolute(size), limit2)) - { -size = numeric_half_rounded(size); -result = psprintf("%s MB", numeric_to_cstring(size)); - } - else - { + int idx, max_iter = 2; /* highest index of table_below */ + char *output_formats[] = { +"%s MB", +"%s GB", +"%s TB" + }; + + for (idx = 0; idx < max_iter; idx++) { /* size >>= 10 */ size = numeric_shift_right(size, 10); - if (numeric_is_less(numeric_absolute(size), limit2)) { size = numeric_half_rounded(size); - result = psprintf("%s GB", numeric_to_cstring(size)); -} -else -{ - /* size >>= 10 */ - size = numeric_shift_right(size, 10); - size = numeric_half_rounded(size); - result = psprintf("%s TB", numeric_to_cstring(size)); + result = psprintf(output_formats[idx], numeric_to_cstring(size)); + break; } } + + if (!result) { +/* this uses the last format in the table above for anything else */ + +/* size >>= 10 */ +size = numeric_shift_right(size, 10); +size = numeric_half_rounded(size); +result = psprintf(output_formats[max_iter], numeric_to_cstring(size)); + } } } @@ -703,7 +704,6 @@ pg_size_bytes(PG_FUNCTION_ARGS) *endptr; char saved_char; Numeric num; - int64 result; bool have_digits = false; str = text_to_cstring(arg); @@ -786,7 +786,16 @@ pg_size_bytes(PG_FUNCTION_ARGS) /* Handle possible unit */ if (*strptr != '\0') { - int64 multiplier = 0; + int64 multiplier = 1; + int i; + int unit_count = 5; /* sizeof units table */ + char *units[] = { + "bytes", + "kb", + "mb", + "gb", + "tb", + }; /* Trim any trailing whitespace */ endptr = str + VARSIZE_ANY_EXHDR(arg) - 1; @@ -797,21 +806,20 @@ pg_size_bytes(PG_FUNCTION_ARGS) endptr++; *endptr = '\0'; - /* Parse the unit case-insensitively */ - if (pg_strcasecmp(strptr, "bytes") == 0) - multiplier = (int64) 1; - else if (pg_strcasecmp(strptr, "kb") == 0) - multiplier = (int64) 1024; - else if (pg_strcasecmp(strptr, "mb") == 0) - multiplier = ((int64) 1024) * 1024; - - else if (pg_strcasecmp(strptr, "gb") == 0) - multiplier = ((int64) 1024) * 1024 * 1024; - - else if (pg_strcasecmp(strptr, "tb") == 0) - multiplier = ((int64) 1024) * 1024 * 1024 * 1024; + for (i = 0; i < unit_count; i++) { + printf("strptr: %s units: %s", strptr, units[i]); + if (pg_strcasecmp(strptr, units[i]) == 0) +break; + /* + * Note: int64 isn't large enough to store the full multiplier + * going past ~ 9EB, but since this is a fixed value, we can apply + * it twice, thus storing use 2 ** 5 = 32 here, but 2 ** 10 = 1024 + * on actual
Re: [PATCH] expand the units that pg_size_pretty supports on output
>> I had not really looked at the patch, but if there's a cleanup portion to >> the same >> patch as you're adding the YB too, then maybe it's worth separating those out >> into another patch so that the two can be considered independently. > > I agree with this opinion. It seems to me that we should think about units > and refactoring separately. > Sorry for the confusion. Sure thing, I think that makes sense. Refactor with existing units and debate the number of additions units to include. I do think Petabytes and Exabytes are at least within the realm of ones we should include; less tied to ZB and YB; just included for completeness.
RE: [PATCH] expand the units that pg_size_pretty supports on output
>I had not really looked at the patch, but if there's a cleanup portion to the >same >patch as you're adding the YB too, then maybe it's worth separating those out >into another patch so that the two can be considered independently. I agree with this opinion. It seems to me that we should think about units and refactoring separately. Sorry for the confusion. Best regards, Shinya Kato
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, 16 Jun 2021 at 02:58, David Christensen wrote: > That said, it seems like having the code structured in a way that we can > expand via adding an element to a table instead of the existing way it's > written with nested if blocks is still a useful refactor, whatever we decide > the cutoff units should be. I had not really looked at the patch, but if there's a cleanup portion to the same patch as you're adding the YB too, then maybe it's worth separating those out into another patch so that the two can be considered independently. David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, Jun 15, 2021 at 8:26 AM David Rowley wrote: > On Tue, 15 Jun 2021 at 21:24, wrote: > > Hmmm, I didn't think YB was necessary, but what do others think? > > For me personally, without consulting Wikipedia, I know that Petabyte > comes after Terabyte and then I'm pretty sure it's Exabyte. After > that, I'd need to check. > > Assuming I'm not the only person who can't tell exactly how many bytes > are in a Yottabyte, would it actually be a readability improvement if > we started showing these units to people? > I hadn't really thought about that TBH; to me it seemed like an improvement, but I do see that others might not, and adding confusion is definitely not helpful. That said, it seems like having the code structured in a way that we can expand via adding an element to a table instead of the existing way it's written with nested if blocks is still a useful refactor, whatever we decide the cutoff units should be. > I'd say there might be some argument to implement as far as PB one > day, maybe not that far out into the future, especially if we got > something like built-in clustering. But I just don't think there's any > need to go all out and take it all the way to YB. There's an above > zero chance we'll break something of someones by doing this, so I > think any changes here should be driven off an actual requirement. > I got motivated to do this due to some (granted synthetic) work/workloads, where I was seeing 6+digit TB numbers and thought it was ugly. Looked at the code and thought the refactor was the way to go, and just stuck all of the known units in. > I really think this change is more likely to upset someone than please > someone. > I'd be interested to see reactions from people; to me, it seems a +1, but seems like -1, 0, +1 all valid opinions here; I'd expect more 0's and +1s, but I'm probably biased since I wrote this. :-)
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, Jun 15, 2021 at 8:31 AM Isaac Morland wrote: > On Tue, 15 Jun 2021 at 05:24, wrote: > >> >> I don't see the need to extend the unit to YB. >> >> What use case do you have in mind? >> > >> >Practical or no, I saw no reason not to support all defined units. I >> assume we’ll >> >get to a need sooner or later. :) >> >> Thank you for your reply! >> Hmmm, I didn't think YB was necessary, but what do others think? >> > > If I’m reading the code correctly, the difference between supporting YB > and not supporting it is whether there is a line for it in the list of > prefixes and their multiples. As such, I don’t see why we’re even > discussing whether or not to include all the standard prefixes. A YB is > still an absurd amount of storage, but that’s not the point; just put all > the standard prefixes and be done with it. If actual code changes were > required in the new code as they are in the old it might be worth > discussing. > Agreed, this is why I went this way. One and done. > One question: why is there no “k” in the list of prefixes? > kB has a special-case code block before you get to this point. I didn't look into the reasons, but assume there are some. > Also: why not have only the prefixes in the array, and use a single fixed > output format "%s %sB" all the time? > > It feels like it should be possible to calculate the appropriate idx to > use (while adjusting the number to print as is done now) and then just have > one psprintf call for all cases. > Sure, if that seems more readable/understandable. > A more significant question is YB vs. YiB. I know there is a long > tradition within computer-related fields of saying that k = 1024, M = > 1024^2, etc., but we’re not special enough to override the more general > principles of SI (Système International) which provide that k = 1000, M = > 1000^2 and so on universally and provide the alternate prefixes ki, Mi, > etc. which use 1024 as the multiple. > > So I would suggest either display 200 as 2MB or as 1.907MiB. > Heh, I was just expanding the existing logic; if others want to have this particular battle go ahead and I'll adjust the code/prefixes, but obviously the logic will need to change if we want to support true MB instead of MiB as MB. Also, this will presumably be a breaking change for anyone using the existing units MB == 1024 * 1024, as we've had for something like 20 years. Changing these units to the *iB will be trivial with this patch, but not looking forward to garnering the consensus to change this part. David
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 15 Jun 2021 at 05:24, wrote: > >> I don't see the need to extend the unit to YB. > >> What use case do you have in mind? > > > >Practical or no, I saw no reason not to support all defined units. I > assume we’ll > >get to a need sooner or later. :) > > Thank you for your reply! > Hmmm, I didn't think YB was necessary, but what do others think? > If I’m reading the code correctly, the difference between supporting YB and not supporting it is whether there is a line for it in the list of prefixes and their multiples. As such, I don’t see why we’re even discussing whether or not to include all the standard prefixes. A YB is still an absurd amount of storage, but that’s not the point; just put all the standard prefixes and be done with it. If actual code changes were required in the new code as they are in the old it might be worth discussing. One question: why is there no “k” in the list of prefixes? Also: why not have only the prefixes in the array, and use a single fixed output format "%s %sB" all the time? It feels like it should be possible to calculate the appropriate idx to use (while adjusting the number to print as is done now) and then just have one psprintf call for all cases. A more significant question is YB vs. YiB. I know there is a long tradition within computer-related fields of saying that k = 1024, M = 1024^2, etc., but we’re not special enough to override the more general principles of SI (Système International) which provide that k = 1000, M = 1000^2 and so on universally and provide the alternate prefixes ki, Mi, etc. which use 1024 as the multiple. So I would suggest either display 200 as 2MB or as 1.907MiB.
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Tue, 15 Jun 2021 at 21:24, wrote: > Hmmm, I didn't think YB was necessary, but what do others think? For me personally, without consulting Wikipedia, I know that Petabyte comes after Terabyte and then I'm pretty sure it's Exabyte. After that, I'd need to check. Assuming I'm not the only person who can't tell exactly how many bytes are in a Yottabyte, would it actually be a readability improvement if we started showing these units to people? I'd say there might be some argument to implement as far as PB one day, maybe not that far out into the future, especially if we got something like built-in clustering. But I just don't think there's any need to go all out and take it all the way to YB. There's an above zero chance we'll break something of someones by doing this, so I think any changes here should be driven off an actual requirement. I really think this change is more likely to upset someone than please someone. Just my thoughts. David
RE: [PATCH] expand the units that pg_size_pretty supports on output
>> I don't see the need to extend the unit to YB. >> What use case do you have in mind? > >Practical or no, I saw no reason not to support all defined units. I assume >we’ll >get to a need sooner or later. :) Thank you for your reply! Hmmm, I didn't think YB was necessary, but what do others think? Best regards, Shinya Kato
Re: [PATCH] expand the units that pg_size_pretty supports on output
> I don't see the need to extend the unit to YB. > What use case do you have in mind? Practical or no, I saw no reason not to support all defined units. I assume we’ll get to a need sooner or later. :) David
RE: [PATCH] expand the units that pg_size_pretty supports on output
>From: David Christensen >Sent: Friday, June 4, 2021 4:18 AM >To: PostgreSQL-development >Subject: Re: [PATCH] expand the units that pg_size_pretty supports on output > >New versions attached to address the initial CF feedback and rebase on HEAD as >of now. > >0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch > >- expands the units that pg_size_pretty() can handle up to YB. > >0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch > >- expands the units that pg_size_bytes() can handle, up to YB. > I don't see the need to extend the unit to YB. What use case do you have in mind? Regards, Shinya Kato
Re: [PATCH] expand the units that pg_size_pretty supports on output
New versions attached to address the initial CF feedback and rebase on HEAD as of now. 0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch - expands the units that pg_size_pretty() can handle up to YB. 0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch - expands the units that pg_size_bytes() can handle, up to YB. 0002-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch Description: Binary data 0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
The following review has been posted through the commitfest application: make installcheck-world: not tested Implements feature: not tested Spec compliant: not tested Documentation:not tested Hi David, I was reviewing this patch and the compilation failed with following error on CentOS 7. dbsize.c: In function ‘pg_size_bytes’: dbsize.c:808:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] const int unit_count = 9; /* sizeof units table */ ^ dbsize.c:809:3: error: variable length array ‘units’ is used [-Werror=vla] const char *units[unit_count] = { ^ I believe "unit_count" ought to be a #define here. Regards, Asif The new status of this patch is: Waiting on Author
Re: [PATCH] expand the units that pg_size_pretty supports on output
A second patch to teach the same units to pg_size_bytes(). Best, David On Wed, Apr 14, 2021 at 11:13 AM David Christensen < david.christen...@crunchydata.com> wrote: > Hi folks, > > Enclosed is a patch that expands the unit output for > pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing > numeric output code to account for the larger number of units we're using > rather than just adding nesting levels. > > There are also a few other places that could benefit from expanded units, > but this is a standalone starting point. > > Best, > > David > 0001-Expand-the-supported-units-in-pg_size_bytes-to-cover.patch Description: Binary data
Re: [PATCH] expand the units that pg_size_pretty supports on output
On Wed, Apr 14, 2021 at 11:13:47AM -0500, David Christensen wrote: > Enclosed is a patch that expands the unit output for > pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing > numeric output code to account for the larger number of units we're using > rather than just adding nesting levels. > > There are also a few other places that could benefit from expanded units, > but this is a standalone starting point. Please don't forget to add this patch to the next commit fest of July if you want to get some reviews: https://commitfest.postgresql.org/33/ Note that the development of Postgres 14 is over, and that there was a feature freeze last week, but this can be considered for 15. -- Michael signature.asc Description: PGP signature
[PATCH] expand the units that pg_size_pretty supports on output
Hi folks, Enclosed is a patch that expands the unit output for pg_size_pretty(numeric) going up to Yottabytes; I reworked the existing numeric output code to account for the larger number of units we're using rather than just adding nesting levels. There are also a few other places that could benefit from expanded units, but this is a standalone starting point. Best, David 0001-Expand-the-units-that-pg_size_pretty-numeric-knows-a.patch Description: Binary data