Hi 2016-01-26 6:25 GMT+01:00 Vitaly Burovoy <vitaly.buro...@gmail.com>:
> Hello! > > > Regression tests are present (see p.II). > > pg_proc.h has changed, so the CATALOG_VERSION_NO must be also changed. > this is not a part of patch - only a commiter knows CATALOG_VERSION_NO > > Code comments, error message string, DESCR in pg_proc.h and doc > changes need proof-reading by a native English speaker, which I am > not. > > > === > The last patch fixes only part of my last notes, so I can only return > it to the next round with the same notes. > > The list of my notes has not been done (all points listed below was > mentioned in my last review[1]): > > 1. The documentation has duplicate parts of size units (in table and > in the text). I think that one of them should be deleted (from the > table). > fixed > > 2. Test has a note about sharing(using) size units via(from) > memory_unit_conversion_table and a requirement to update the > documentation when new units are supported, but the note will not be > seen because there is no tests for that fail for units ("PB", "EB", > "ZB") which will change in the future by the > "memory_unit_conversion_table". > fixed > 3. Code design is looked a little odd for me. > > 3a. There is an attempt to check a correctness of a numeric value > instead of using numeric_in which does the same thing. In the last > case the code looks more compact and nice (see the full explanation in > the previous review[1] and both PoC there). > > Possibly that way is chosen to emit its own error messages ("invalid > size:" instead of "invalid input syntax for type numeric:"), but it > still leads errors like 'invalid unit: "+ kB"' for input value "+912+ > kB" (see "dbsize.out"). > In tried to enhanced this point (I don't agree, so this is a bug) > > 3b. There is an extra copying from the buffer "str" to the "buffer" > (it can be avoided and shown in an PoC "single_buffer.c"). > > 3c. There is freeing buffers "str" and "buffer" that looks as a > counter-design since MemoryContext frees them at once with other > expired buffers (like "num" from numeric_in and numeric_mul; also > "mul_num" from int8_numeric) > Numeric operations are on critical path for OLAP applications - so the code is rewritten to minimize palloc/free operations. I don't expect usage pg_size_bytes in OLAP critical path. > > > === > While I was writing that text I found five additional notes. Both > parts (above and below) are important (see concusion). > > I. The new version of the patch (pg-size-bytes-10.patch) has the > simplest typo in an updated (from v9) row "inavalid size". > fixed > > II. There is no test for an empty input value, but it works as > expected (an error 'invalid size: ""'). > fixed > > III. There is no support of 'bytes' unit, it seems such behavior got > majority approval[2]. > No, because I have to use the supported units by configuration. The configuration supports only three chars long units. Support for "bytes" was removed, when I removed proprietary unit table. > > IV. Code design of the function "parse_memory_unit" is returning > "bool" value and set "multiplier" via pointer. But while Tom Lane was > reviewed my patch[3] in the similar case he changed code that now > returns (see NonFiniteTimestampTzPart) not a flag, but a value (where > zero still means an error). Be honest it looks better, but the current > code is written like nearby functions. > It is a commiter decision. This is a implementation detail, and I don't see any variant as significant better - using 0 as error flag isn't the best too. But this is really detail. > > V. The documentation lacks a note that the base of the "pg_size_bytes" > is 1024 whereas the base of the "pg_size_pretty" is 1000. > It isn't true, base for both are 1024. It was designed when special table was used for pg_size_bytes. But when we share same control table, then is wrong to use different table. The result can be optically different, but semantically same. postgres=# select pg_size_pretty(pg_size_bytes('1MB')); ┌────────────────┐ │ pg_size_pretty │ ╞════════════════╡ │ 1024 kB │ └────────────────┘ (1 row) Time: 0.787 ms postgres=# select pg_size_pretty(pg_size_bytes('1024MB')); ┌────────────────┐ │ pg_size_pretty │ ╞════════════════╡ │ 1024 MB │ └────────────────┘ (1 row) > > === > Concusion: > > I *can't* mark this patch as "Ready for Committer" with all previous > notes, and I ask experienced hackers for a help. > I'm not a strong Postgres hacker yet and sending to the next round > with the same notes seems a nitpicking for me. > > So questions to the hackers: > > a. Are the rest of the notes from the previous review[1] (and listed > in pp.1-3c) a nitpicking or should be done? > > b. Does the function in PoC "single_buffer.c" look more complex or easer? > > c. Is it worth for p.II to change error message like 'Empty input > value is not a valid size for pg_size_bytes: "%s"'? > It is simply to implement, but It is looking like overengineering > > d. Should the function "pg_size_bytes" handle "bytes" as a size unit > (see p.III)? > There was no disagreement, but if I saw that thread before I became a > reviewer and saw silent approval I would have argued. Moreover > supporting of negative sizes was explained[4] by a symmetry for the > "pg_size_pretty", which should include all size units produced by the > "pg_size_pretty". On the other hand those functions can't be symmetric > due to difference in bases (see p.V and [5]) > negative values is fully supported. support of "bytes" depends on support "bytes" unit by GUC. When "bytes" unit will be supported, then it can be used in pg_size_bytes immediately. updated patch attached Regards Pavel > > e. What way is preferred for p.IV? > > f. If p.V is actual, how to write it in the documentation (be honest > I've no idea)? Should a description of the "pg_size_pretty" be changed > too? > > > [1] > http://www.postgresql.org/message-id/CAKOSWNn=p8gx-p5y-b4t4dg-aahatup+crg41hq9beobzwx...@mail.gmail.com > [2] > http://www.postgresql.org/message-id/cacaco5qw7ffsffhkstjtycp4qf3oh9za14sc6z3dxx2yssj...@mail.gmail.com > [3] > http://git.postgresql.org/pg/commitdiff/647d87c56ab6da70adb753c08d7cdf7ee905ea8a > [4] > http://www.postgresql.org/message-id/ca+tgmozfomg4eyorzzgf7pzotg9pxpuhtqvxlfskim4izh8...@mail.gmail.com > [5] > http://www.postgresql.org/message-id/ca+tgmoygs_xixuqy+xyw9futtmrtbc_s0c1c0_wobwvbnf8...@mail.gmail.com >
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml new file mode 100644 index 9c143b2..c170b78 *** a/doc/src/sgml/func.sgml --- b/doc/src/sgml/func.sgml *************** postgres=# SELECT * FROM pg_xlogfile_nam *** 17724,17729 **** --- 17724,17732 ---- <primary>pg_relation_size</primary> </indexterm> <indexterm> + <primary>pg_size_bytes</primary> + </indexterm> + <indexterm> <primary>pg_size_pretty</primary> </indexterm> <indexterm> *************** postgres=# SELECT * FROM pg_xlogfile_nam *** 17795,17800 **** --- 17798,17812 ---- </row> <row> <entry> + <literal><function>pg_size_bytes(<type>text</type>)</function></literal> + </entry> + <entry><type>bigint</type></entry> + <entry> + Converts a size in human-readable format with size units into bytes. + </entry> + </row> + <row> + <entry> <literal><function>pg_size_pretty(<type>bigint</type>)</function></literal> </entry> <entry><type>text</type></entry> *************** postgres=# SELECT * FROM pg_xlogfile_nam *** 17930,17935 **** --- 17942,17955 ---- </para> <para> + <function>pg_size_bytes</> can be used to get a size in bytes as + bigint from a human-readable format for a comparison purposes (it is + the opposite to <function>pg_size_pretty</> function). The format is + numeric with an optional size unit: kB, MB, GB or TB. The unit string + is case insensitive. + </para> + + <para> The functions above that operate on tables or indexes accept a <type>regclass</> argument, which is simply the OID of the table or index in the <structname>pg_class</> system catalog. You do not have to look up diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c new file mode 100644 index 2084692..da8db86 *** a/src/backend/utils/adt/dbsize.c --- b/src/backend/utils/adt/dbsize.c *************** *** 25,30 **** --- 25,31 ---- #include "storage/fd.h" #include "utils/acl.h" #include "utils/builtins.h" + #include "utils/guc.h" #include "utils/numeric.h" #include "utils/rel.h" #include "utils/relfilenodemap.h" *************** pg_size_pretty_numeric(PG_FUNCTION_ARGS) *** 700,705 **** --- 701,855 ---- } /* + * Convert human readable size to long int. + * + * Due to support for decimal values and case insensitivity of units + * a function parse_int cannot be used. + */ + Datum + pg_size_bytes(PG_FUNCTION_ARGS) + { + text *arg = PG_GETARG_TEXT_PP(0); + char *str, + *strptr; + char *buffer, + *bufptr; + Numeric num; + int64 result; + bool found_digit = false; + bool found_point = false; + + /* The content of str should be immutable, it is used in error messages. */ + str = text_to_cstring(arg); + strptr = str; + + /* prepare buffer for parts of parsed input string */ + buffer = (char *) palloc(strlen(str) + 1); + bufptr = buffer; + + /* Skip leading spaces */ + while (isspace((unsigned char) *strptr)) + strptr++; + + switch (*strptr) + { + case '+': + case '-': + *bufptr++ = *strptr++; + break; + } + + while(*strptr != '\0') + { + if (isdigit((unsigned char) *strptr)) + { + *bufptr++ = *strptr++; + found_digit = true; + } + else if (*strptr == '.') + { + if (found_point) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid size: \"%s\"", str))); + + *bufptr++ = *strptr++; + found_point = true; + } + else + /* any non digit char, the unit is starting */ + break; + } + + if (!found_digit) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid size: \"%s\"", str))); + + *bufptr = '\0'; + + /* don't allow empty string */ + if (*buffer == '\0') + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid size: \"%s\"", str))); + + num = DatumGetNumeric(DirectFunctionCall3(numeric_in, + CStringGetDatum(buffer), 0, -1)); + + /* allow whitespace between number and unit */ + while (isspace((unsigned char) *strptr)) + strptr++; + + /* Handle possible unit */ + if (*strptr != '\0') + { + int multiplier; + Numeric mul_num; + const char *hintmsg; + const char *unitstr; + + bufptr = buffer; + + /* + * unitstr holds complete string related to unit part. Used + * as part of error message. The parse_memory_unit is called + * with this value, when we detected invalid format, and we + * would to emit error result to get hintstr. Otherwise + * trimmed value is used. + */ + unitstr = strptr; + + /* copy chars to buffer and stop on first space or end string */ + while (*strptr != '\0' && !isspace((unsigned char) *strptr)) + { + /* only alphabetic character are allowed */ + if (!isalpha((unsigned char) *strptr)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid size: \"%s\"", str))); + + *bufptr++ = *strptr++; + } + + *bufptr = '\0'; + + /* We allow trailing spaces. Skip all spaces. */ + while (isspace((unsigned char) *strptr)) + strptr++; + + /* Use buffer as unitstr, when we didn't find more words. */ + if (*strptr == '\0') + unitstr = buffer; + + /* Still, the unit can be invalid: too long, unknown */ + if (!parse_memory_unit(unitstr, &multiplier, &hintmsg)) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("invalid unit: \"%s\"", unitstr), + errhint("%s", _(hintmsg)))); + + /* + * Now, the multiplier is in KB units. It should be multiplied by 1024 + * before usage. + */ + mul_num = DatumGetNumeric(DirectFunctionCall1(int8_numeric, + Int64GetDatum(multiplier * 1024L))); + + num = DatumGetNumeric(DirectFunctionCall2(numeric_mul, + NumericGetDatum(mul_num), + NumericGetDatum(num))); + } + + result = DatumGetInt64(DirectFunctionCall1(numeric_int8, NumericGetDatum(num))); + + pfree(buffer); + pfree(str); + + PG_RETURN_INT64(result); + } + + /* * Get the filenode of a relation * * This is expected to be used in queries like diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c new file mode 100644 index 38ba82f..4020df7 *** a/src/backend/utils/misc/guc.c --- b/src/backend/utils/misc/guc.c *************** convert_from_base_unit(int64 base_value, *** 5238,5243 **** --- 5238,5272 ---- /* + * Parse value as some known memory unit to their size in bytes. + * Used in pg_size_bytes function. Against convert_to_base_unit, a string + * comparation is case insensitive. + */ + bool + parse_memory_unit(const char *unit, int *multiplier, + const char **hintmsg) + { + int i; + + for (i = 0; *memory_unit_conversion_table[i].unit; i++) + { + const unit_conversion *conv = &memory_unit_conversion_table[i]; + + if (conv->base_unit == GUC_UNIT_KB && + strcasecmp(unit, conv->unit) == 0) + { + *multiplier = conv->multiplier; + return true; + } + } + + *hintmsg = memory_units_hint; + + return false; + } + + + /* * Try to parse value as an integer. The accepted formats are the * usual decimal, octal, or hexadecimal formats, optionally followed by * a unit name if "flags" indicates a unit is allowed. diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h new file mode 100644 index 79e92ff..5921bac *** a/src/include/catalog/pg_proc.h --- b/src/include/catalog/pg_proc.h *************** DATA(insert OID = 2288 ( pg_size_pretty *** 3587,3592 **** --- 3587,3594 ---- DESCR("convert a long int to a human readable text using size units"); DATA(insert OID = 3166 ( pg_size_pretty PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 25 "1700" _null_ _null_ _null_ _null_ _null_ pg_size_pretty_numeric _null_ _null_ _null_ )); DESCR("convert a numeric to a human readable text using size units"); + DATA(insert OID = 3331 ( pg_size_bytes PGNSP PGUID 12 1 0 0 0 f f f f t f i s 1 0 20 "25" _null_ _null_ _null_ _null_ _null_ pg_size_bytes _null_ _null_ _null_ )); + DESCR("convert a size in human-readable format with size units into bytes"); DATA(insert OID = 2997 ( pg_table_size PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 20 "2205" _null_ _null_ _null_ _null_ _null_ pg_table_size _null_ _null_ _null_ )); DESCR("disk space usage for the specified table, including TOAST, free space and visibility map"); DATA(insert OID = 2998 ( pg_indexes_size PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 20 "2205" _null_ _null_ _null_ _null_ _null_ pg_indexes_size _null_ _null_ _null_ )); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h new file mode 100644 index c2e529f..bb21615 *** a/src/include/utils/builtins.h --- b/src/include/utils/builtins.h *************** extern Datum pg_relation_size(PG_FUNCTIO *** 470,475 **** --- 470,476 ---- extern Datum pg_total_relation_size(PG_FUNCTION_ARGS); extern Datum pg_size_pretty(PG_FUNCTION_ARGS); extern Datum pg_size_pretty_numeric(PG_FUNCTION_ARGS); + extern Datum pg_size_bytes(PG_FUNCTION_ARGS); extern Datum pg_table_size(PG_FUNCTION_ARGS); extern Datum pg_indexes_size(PG_FUNCTION_ARGS); extern Datum pg_relation_filenode(PG_FUNCTION_ARGS); diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h new file mode 100644 index e1de1a5..3bfe0f4 *** a/src/include/utils/guc.h --- b/src/include/utils/guc.h *************** extern int NewGUCNestLevel(void); *** 357,362 **** --- 357,364 ---- extern void AtEOXact_GUC(bool isCommit, int nestLevel); extern void BeginReportingGUCOptions(void); extern void ParseLongOption(const char *string, char **name, char **value); + extern bool parse_memory_unit(const char *unit, int *multiplier, + const char **hintmsg); extern bool parse_int(const char *value, int *result, int flags, const char **hintmsg); extern bool parse_real(const char *value, double *result); diff --git a/src/test/regress/expected/dbsize.out b/src/test/regress/expected/dbsize.out new file mode 100644 index aa513e7..e44a67c *** a/src/test/regress/expected/dbsize.out --- b/src/test/regress/expected/dbsize.out *************** *** 1,3 **** --- 1,6 ---- + -- These functions shares memory_unit_conversion_table. + -- Currently only max TB unit is supported. When you increase + -- supported unit, update related documentation too. SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (VALUES (10::bigint), (1000::bigint), (1000000::bigint), (1000000000::bigint), (1000000000000::bigint), *************** SELECT size, pg_size_pretty(size), pg_si *** 35,37 **** --- 38,106 ---- 1000000000000000.5 | 909 TB | -909 TB (12 rows) + SELECT pg_size_bytes(size) FROM + (VALUES('1'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '), + ('1TB'), ('3000 TB')) x(size); + pg_size_bytes + ------------------ + 1 + 1024 + 1048576 + 1073741824 + 1610612736 + 1099511627776 + 3298534883328000 + (7 rows) + + -- case insensitive units are supported + SELECT pg_size_bytes(size) FROM + (VALUES('1'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '), + ('1tb')) x(size); + pg_size_bytes + --------------- + 1 + 1024 + 1048576 + 1073741824 + 1610612736 + 1099511627776 + (6 rows) + + -- negative numbers are supported + SELECT pg_size_bytes(size) FROM + (VALUES('-1'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '), + ('-1tb')) x(size); + pg_size_bytes + ---------------- + -1 + -1024 + -1048576 + -1073741824 + -1610612736 + -1099511627776 + (6 rows) + + --should fail + SELECT pg_size_bytes('1 AB'); + ERROR: invalid unit: "AB" + HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB". + SELECT pg_size_bytes('1 AB A'); + ERROR: invalid unit: "AB A" + HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB". + select pg_size_bytes('9223372036854775807.9'); + ERROR: bigint out of range + select pg_size_bytes('1024 bytes'); + ERROR: invalid unit: "bytes" + HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB". + select pg_size_bytes(''); + ERROR: invalid size: "" + select pg_size_bytes('.+912'); + ERROR: invalid size: ".+912" + select pg_size_bytes('+912+ kB'); + ERROR: invalid size: "+912+ kB" + select pg_size_bytes('++123 kB'); + ERROR: invalid size: "++123 kB" + --only TB and less are supported + SELECT pg_size_bytes('1PB'); + ERROR: invalid unit: "PB" + HINT: Valid units for this parameter are "kB", "MB", "GB", and "TB". diff --git a/src/test/regress/sql/dbsize.sql b/src/test/regress/sql/dbsize.sql new file mode 100644 index c118090..5e47302 *** a/src/test/regress/sql/dbsize.sql --- b/src/test/regress/sql/dbsize.sql *************** *** 1,3 **** --- 1,7 ---- + -- These functions shares memory_unit_conversion_table. + -- Currently only max TB unit is supported. When you increase + -- supported unit, update related documentation too. + SELECT size, pg_size_pretty(size), pg_size_pretty(-1 * size) FROM (VALUES (10::bigint), (1000::bigint), (1000000::bigint), (1000000000::bigint), (1000000000000::bigint), *************** SELECT size, pg_size_pretty(size), pg_si *** 10,12 **** --- 14,44 ---- (10.5::numeric), (1000.5::numeric), (1000000.5::numeric), (1000000000.5::numeric), (1000000000000.5::numeric), (1000000000000000.5::numeric)) x(size); + + SELECT pg_size_bytes(size) FROM + (VALUES('1'), ('1kB'), ('1MB'), (' 1 GB'), ('1.5 GB '), + ('1TB'), ('3000 TB')) x(size); + + -- case insensitive units are supported + SELECT pg_size_bytes(size) FROM + (VALUES('1'), ('1kb'), ('1mb'), (' 1 Gb'), ('1.5 gB '), + ('1tb')) x(size); + + -- negative numbers are supported + SELECT pg_size_bytes(size) FROM + (VALUES('-1'), ('-1kb'), ('-1mb'), (' -1 Gb'), ('-1.5 gB '), + ('-1tb')) x(size); + + --should fail + SELECT pg_size_bytes('1 AB'); + SELECT pg_size_bytes('1 AB A'); + select pg_size_bytes('9223372036854775807.9'); + select pg_size_bytes('1024 bytes'); + select pg_size_bytes(''); + + select pg_size_bytes('.+912'); + select pg_size_bytes('+912+ kB'); + select pg_size_bytes('++123 kB'); + + --only TB and less are supported + SELECT pg_size_bytes('1PB');
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers