Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
On Wed, Jan 11, 2017 at 4:50 PM, Michael Paquierwrote: > Looking at this patch, I am not sure that it is worth worrying about. > This is a receipt to make back-patching a bit more complicated, and it > makes the code more complicated to understand. So I would vote for > rejecting it and move on. I have done so for now to make the CF move, if somebody wants to complain feel free... -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
On Fri, Dec 9, 2016 at 3:23 AM, Aleksander Alekseevwrote: >> You could just change it to >> if (str[strspn(str, " \t\n\r\f")] == '\0') >> to mitigate calling strlen. It's safe to do so because strspn will >> only return values from 0 to strlen(str). > >> [...] I have serious doubts that the "optimized" implementation >> you propose is actually faster than a naive one; it may be slower, and >> it's certainly longer and harder to understand/test. > > I would like to point out that I never said it's optimized. However I > like the code Geoff proposed. It definitely doesn't make anything worse. > I decided to keep pg_str_contansonly procedure (i.e. not to inline this > code) for now. Code with strspn looks OK in a simple example. However in > a concrete context it looks like a bad Perl code in ROT13 to me: Looking at this patch, I am not sure that it is worth worrying about. This is a receipt to make back-patching a bit more complicated, and it makes the code more complicated to understand. So I would vote for rejecting it and move on. By the way, as you are placing this routine in src/common/, you may want to consider updating the code in src/bin/ that use libpqcommon. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
Tom, Geoff, Thanks for your feedback! Here is version 2 of this patch. > You could just change it to > if (str[strspn(str, " \t\n\r\f")] == '\0') > to mitigate calling strlen. It's safe to do so because strspn will > only return values from 0 to strlen(str). > [...] I have serious doubts that the "optimized" implementation > you propose is actually faster than a naive one; it may be slower, and > it's certainly longer and harder to understand/test. I would like to point out that I never said it's optimized. However I like the code Geoff proposed. It definitely doesn't make anything worse. I decided to keep pg_str_contansonly procedure (i.e. not to inline this code) for now. Code with strspn looks OK in a simple example. However in a concrete context it looks like a bad Perl code in ROT13 to me: ``` /* try to figure out what's exactly going on */ if(somelongname[strspn(somelongname /* yes, again */, "longlistofchars")] != '\0') ``` > Function name seems weirdly spelled. I named it the same way pg_str_endswith is named. However I'm open for better suggestions here. > Whether it's worth worrying about, I dunno. This is hardly the only > C idiom you need to be familiar with to read the PG code. Well, at least this v2 version of the patch removes second string scanning. And I still believe that this inlined strspn is not readable or obvious at all. -- Best regards, Aleksander Alekseev diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c index a8bb472..a1c5853 100644 --- a/src/backend/parser/parse_type.c +++ b/src/backend/parser/parse_type.c @@ -17,6 +17,7 @@ #include "access/htup_details.h" #include "catalog/namespace.h" #include "catalog/pg_type.h" +#include "common/string.h" #include "lib/stringinfo.h" #include "nodes/makefuncs.h" #include "parser/parser.h" @@ -696,7 +697,7 @@ typeStringToTypeName(const char *str) ErrorContextCallback ptserrcontext; /* make sure we give useful error for empty input */ - if (strspn(str, " \t\n\r\f") == strlen(str)) + if (pg_str_containsonly(str, " \t\n\r\f")) goto fail; initStringInfo(); diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c index 9c3e15c..81d6132 100644 --- a/src/backend/tsearch/ts_utils.c +++ b/src/backend/tsearch/ts_utils.c @@ -17,6 +17,7 @@ #include #include "miscadmin.h" +#include "common/string.h" #include "tsearch/ts_locale.h" #include "tsearch/ts_utils.h" @@ -45,7 +46,7 @@ get_tsearch_config_filename(const char *basename, * and case-insensitive filesystems, and non-ASCII characters create other * interesting risks, so on the whole a tight policy seems best. */ - if (strspn(basename, "abcdefghijklmnopqrstuvwxyz0123456789_") != strlen(basename)) + if (!pg_str_containsonly(basename, "abcdefghijklmnopqrstuvwxyz0123456789_")) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid text search configuration file name \"%s\"", diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index 394042c..7e41359 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -32,6 +32,7 @@ #include "catalog/pg_ts_config.h" #include "catalog/pg_ts_dict.h" #include "catalog/pg_type.h" +#include "common/string.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "parser/parse_type.h" @@ -75,7 +76,7 @@ regprocin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (pro_name_or_oid[0] >= '0' && pro_name_or_oid[0] <= '9' && - strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid)) + pg_str_containsonly(pro_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(pro_name_or_oid))); @@ -286,7 +287,7 @@ regprocedurein(PG_FUNCTION_ARGS) /* Numeric OID? */ if (pro_name_or_oid[0] >= '0' && pro_name_or_oid[0] <= '9' && - strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid)) + pg_str_containsonly(pro_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(pro_name_or_oid))); @@ -535,7 +536,7 @@ regoperin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (opr_name_or_oid[0] >= '0' && opr_name_or_oid[0] <= '9' && - strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid)) + pg_str_containsonly(opr_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(opr_name_or_oid))); @@ -750,7 +751,7 @@ regoperatorin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (opr_name_or_oid[0] >= '0' && opr_name_or_oid[0] <= '9' && - strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid)) + pg_str_containsonly(opr_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(opr_name_or_oid))); @@ -995,7 +996,7 @@ regclassin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (class_name_or_oid[0] >= '0' && class_name_or_oid[0] <= '9' && -
Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
Aleksander Alekseevwrites: > How about rewriting such a code like this? > if (pg_str_containsonly(str, " \t\n\r\f")) Function name seems weirdly spelled. Also, I believe that in nearly all use-cases the number of data characters that would typically be examined is small, so I have serious doubts that the "optimized" implementation you propose is actually faster than a naive one; it may be slower, and it's certainly longer and harder to understand/test. Whether it's worth worrying about, I dunno. This is hardly the only C idiom you need to be familiar with to read the PG code. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
On 8 December 2016 at 15:54, Aleksander Alekseevwrote: > Hi. > > I noticed that there is a lot of repeating code like this: > > ``` > if (strspn(str, " \t\n\r\f") == strlen(str)) > ``` > > I personally don't find it particularly readable, not mentioning that > traversing a string twice doesn't look as a good idea (you can check > using objdump that latest GCC 6.2 doesn't optimize this code). You could just change it to if (str[strspn(str, " \t\n\r\f")] == '\0') to mitigate calling strlen. It's safe to do so because strspn will only return values from 0 to strlen(str). Geoff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code
Hi. I noticed that there is a lot of repeating code like this: ``` if (strspn(str, " \t\n\r\f") == strlen(str)) ``` I personally don't find it particularly readable, not mentioning that traversing a string twice doesn't look as a good idea (you can check using objdump that latest GCC 6.2 doesn't optimize this code). How about rewriting such a code like this? ``` if (pg_str_containsonly(str, " \t\n\r\f")) ``` Corresponding patch is attached. I don't claim that my implementation of pg_str_containsonly procedure is faster that strspn + strlen, but at least such refactoring makes code a little more readable. -- Best regards, Aleksander Alekseev diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c index a8bb472..a1c5853 100644 --- a/src/backend/parser/parse_type.c +++ b/src/backend/parser/parse_type.c @@ -17,6 +17,7 @@ #include "access/htup_details.h" #include "catalog/namespace.h" #include "catalog/pg_type.h" +#include "common/string.h" #include "lib/stringinfo.h" #include "nodes/makefuncs.h" #include "parser/parser.h" @@ -696,7 +697,7 @@ typeStringToTypeName(const char *str) ErrorContextCallback ptserrcontext; /* make sure we give useful error for empty input */ - if (strspn(str, " \t\n\r\f") == strlen(str)) + if (pg_str_containsonly(str, " \t\n\r\f")) goto fail; initStringInfo(); diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c index 9c3e15c..81d6132 100644 --- a/src/backend/tsearch/ts_utils.c +++ b/src/backend/tsearch/ts_utils.c @@ -17,6 +17,7 @@ #include #include "miscadmin.h" +#include "common/string.h" #include "tsearch/ts_locale.h" #include "tsearch/ts_utils.h" @@ -45,7 +46,7 @@ get_tsearch_config_filename(const char *basename, * and case-insensitive filesystems, and non-ASCII characters create other * interesting risks, so on the whole a tight policy seems best. */ - if (strspn(basename, "abcdefghijklmnopqrstuvwxyz0123456789_") != strlen(basename)) + if (!pg_str_containsonly(basename, "abcdefghijklmnopqrstuvwxyz0123456789_")) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid text search configuration file name \"%s\"", diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c index 394042c..7e41359 100644 --- a/src/backend/utils/adt/regproc.c +++ b/src/backend/utils/adt/regproc.c @@ -32,6 +32,7 @@ #include "catalog/pg_ts_config.h" #include "catalog/pg_ts_dict.h" #include "catalog/pg_type.h" +#include "common/string.h" #include "lib/stringinfo.h" #include "miscadmin.h" #include "parser/parse_type.h" @@ -75,7 +76,7 @@ regprocin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (pro_name_or_oid[0] >= '0' && pro_name_or_oid[0] <= '9' && - strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid)) + pg_str_containsonly(pro_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(pro_name_or_oid))); @@ -286,7 +287,7 @@ regprocedurein(PG_FUNCTION_ARGS) /* Numeric OID? */ if (pro_name_or_oid[0] >= '0' && pro_name_or_oid[0] <= '9' && - strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid)) + pg_str_containsonly(pro_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(pro_name_or_oid))); @@ -535,7 +536,7 @@ regoperin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (opr_name_or_oid[0] >= '0' && opr_name_or_oid[0] <= '9' && - strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid)) + pg_str_containsonly(opr_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(opr_name_or_oid))); @@ -750,7 +751,7 @@ regoperatorin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (opr_name_or_oid[0] >= '0' && opr_name_or_oid[0] <= '9' && - strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid)) + pg_str_containsonly(opr_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(opr_name_or_oid))); @@ -995,7 +996,7 @@ regclassin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (class_name_or_oid[0] >= '0' && class_name_or_oid[0] <= '9' && - strspn(class_name_or_oid, "0123456789") == strlen(class_name_or_oid)) + pg_str_containsonly(class_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(class_name_or_oid))); @@ -1186,7 +1187,7 @@ regtypein(PG_FUNCTION_ARGS) /* Numeric OID? */ if (typ_name_or_oid[0] >= '0' && typ_name_or_oid[0] <= '9' && - strspn(typ_name_or_oid, "0123456789") == strlen(typ_name_or_oid)) + pg_str_containsonly(typ_name_or_oid, "0123456789")) { result = DatumGetObjectId(DirectFunctionCall1(oidin, CStringGetDatum(typ_name_or_oid))); @@ -1358,7 +1359,7 @@ regconfigin(PG_FUNCTION_ARGS) /* Numeric OID? */ if (cfg_name_or_oid[0] >= '0' && cfg_name_or_oid[0] <= '9' && -