Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2017-01-16 Thread Michael Paquier
On Wed, Jan 11, 2017 at 4:50 PM, Michael Paquier
 wrote:
> Looking at this patch, I am not sure that it is worth worrying about.
> This is a receipt to make back-patching a bit more complicated, and it
> makes the code more complicated to understand. So I would vote for
> rejecting it and move on.

I have done so for now to make the CF move, if somebody wants to
complain feel free...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2017-01-10 Thread Michael Paquier
On Fri, Dec 9, 2016 at 3:23 AM, Aleksander Alekseev
 wrote:
>> You could just change it to
>> if (str[strspn(str, " \t\n\r\f")] == '\0')
>> to mitigate calling strlen. It's safe to do so because strspn will
>> only return values from 0 to strlen(str).
>
>> [...] I have serious doubts that the "optimized" implementation
>> you propose is actually faster than a naive one; it may be slower, and
>> it's certainly longer and harder to understand/test.
>
> I would like to point out that I never said it's optimized. However I
> like the code Geoff proposed. It definitely doesn't make anything worse.
> I decided to keep pg_str_contansonly procedure (i.e. not to inline this
> code) for now. Code with strspn looks OK in a simple example. However in
> a concrete context it looks like a bad Perl code in ROT13 to me:

Looking at this patch, I am not sure that it is worth worrying about.
This is a receipt to make back-patching a bit more complicated, and it
makes the code more complicated to understand. So I would vote for
rejecting it and move on.

By the way, as you are placing this routine in src/common/, you may
want to consider updating the code in src/bin/ that use libpqcommon.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Aleksander Alekseev
Tom, Geoff,

Thanks for your feedback! Here is version 2 of this patch.

> You could just change it to
> if (str[strspn(str, " \t\n\r\f")] == '\0')
> to mitigate calling strlen. It's safe to do so because strspn will
> only return values from 0 to strlen(str).

> [...] I have serious doubts that the "optimized" implementation
> you propose is actually faster than a naive one; it may be slower, and
> it's certainly longer and harder to understand/test.

I would like to point out that I never said it's optimized. However I
like the code Geoff proposed. It definitely doesn't make anything worse.
I decided to keep pg_str_contansonly procedure (i.e. not to inline this
code) for now. Code with strspn looks OK in a simple example. However in
a concrete context it looks like a bad Perl code in ROT13 to me:

```
/* try to figure out what's exactly going on */
if(somelongname[strspn(somelongname /* yes, again */, "longlistofchars")] != 
'\0')
```
> Function name seems weirdly spelled.

I named it the same way pg_str_endswith is named. However I'm open for
better suggestions here.

> Whether it's worth worrying about, I dunno.  This is hardly the only
> C idiom you need to be familiar with to read the PG code.

Well, at least this v2 version of the patch removes second string
scanning. And I still believe that this inlined strspn is not readable
or obvious at all.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index a8bb472..a1c5853 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "nodes/makefuncs.h"
 #include "parser/parser.h"
@@ -696,7 +697,7 @@ typeStringToTypeName(const char *str)
 	ErrorContextCallback ptserrcontext;
 
 	/* make sure we give useful error for empty input */
-	if (strspn(str, " \t\n\r\f") == strlen(str))
+	if (pg_str_containsonly(str, " \t\n\r\f"))
 		goto fail;
 
 	initStringInfo();
diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c
index 9c3e15c..81d6132 100644
--- a/src/backend/tsearch/ts_utils.c
+++ b/src/backend/tsearch/ts_utils.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "miscadmin.h"
+#include "common/string.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 
@@ -45,7 +46,7 @@ get_tsearch_config_filename(const char *basename,
 	 * and case-insensitive filesystems, and non-ASCII characters create other
 	 * interesting risks, so on the whole a tight policy seems best.
 	 */
-	if (strspn(basename, "abcdefghijklmnopqrstuvwxyz0123456789_") != strlen(basename))
+	if (!pg_str_containsonly(basename, "abcdefghijklmnopqrstuvwxyz0123456789_"))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid text search configuration file name \"%s\"",
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 394042c..7e41359 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "parser/parse_type.h"
@@ -75,7 +76,7 @@ regprocin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -286,7 +287,7 @@ regprocedurein(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -535,7 +536,7 @@ regoperin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -750,7 +751,7 @@ regoperatorin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -995,7 +996,7 @@ regclassin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (class_name_or_oid[0] >= '0' &&
 		class_name_or_oid[0] <= '9' &&
-		

Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Tom Lane
Aleksander Alekseev  writes:
> How about rewriting such a code like this?
> if (pg_str_containsonly(str, " \t\n\r\f"))

Function name seems weirdly spelled.  Also, I believe that in nearly all
use-cases the number of data characters that would typically be examined
is small, so I have serious doubts that the "optimized" implementation
you propose is actually faster than a naive one; it may be slower, and
it's certainly longer and harder to understand/test.

Whether it's worth worrying about, I dunno.  This is hardly the only
C idiom you need to be familiar with to read the PG code.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Geoff Winkless
On 8 December 2016 at 15:54, Aleksander Alekseev
 wrote:
> Hi.
>
> I noticed that there is a lot of repeating code like this:
>
> ```
> if (strspn(str, " \t\n\r\f") == strlen(str))
> ```
>
> I personally don't find it particularly readable, not mentioning that
> traversing a string twice doesn't look as a good idea (you can check
> using objdump that latest GCC 6.2 doesn't optimize this code).

You could just change it to

if (str[strspn(str, " \t\n\r\f")] == '\0')

to mitigate calling strlen. It's safe to do so because strspn will
only return values from 0 to strlen(str).

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Refactor "if(strspn(str, ...) == strlen(str)" code

2016-12-08 Thread Aleksander Alekseev
Hi.

I noticed that there is a lot of repeating code like this:

```
if (strspn(str, " \t\n\r\f") == strlen(str))
```

I personally don't find it particularly readable, not mentioning that
traversing a string twice doesn't look as a good idea (you can check
using objdump that latest GCC 6.2 doesn't optimize this code).

How about rewriting such a code like this?

```
if (pg_str_containsonly(str, " \t\n\r\f"))
```

Corresponding patch is attached. I don't claim that my implementation of
pg_str_containsonly procedure is faster that strspn + strlen, but at
least such refactoring makes code a little more readable.

-- 
Best regards,
Aleksander Alekseev
diff --git a/src/backend/parser/parse_type.c b/src/backend/parser/parse_type.c
index a8bb472..a1c5853 100644
--- a/src/backend/parser/parse_type.c
+++ b/src/backend/parser/parse_type.c
@@ -17,6 +17,7 @@
 #include "access/htup_details.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "nodes/makefuncs.h"
 #include "parser/parser.h"
@@ -696,7 +697,7 @@ typeStringToTypeName(const char *str)
 	ErrorContextCallback ptserrcontext;
 
 	/* make sure we give useful error for empty input */
-	if (strspn(str, " \t\n\r\f") == strlen(str))
+	if (pg_str_containsonly(str, " \t\n\r\f"))
 		goto fail;
 
 	initStringInfo();
diff --git a/src/backend/tsearch/ts_utils.c b/src/backend/tsearch/ts_utils.c
index 9c3e15c..81d6132 100644
--- a/src/backend/tsearch/ts_utils.c
+++ b/src/backend/tsearch/ts_utils.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "miscadmin.h"
+#include "common/string.h"
 #include "tsearch/ts_locale.h"
 #include "tsearch/ts_utils.h"
 
@@ -45,7 +46,7 @@ get_tsearch_config_filename(const char *basename,
 	 * and case-insensitive filesystems, and non-ASCII characters create other
 	 * interesting risks, so on the whole a tight policy seems best.
 	 */
-	if (strspn(basename, "abcdefghijklmnopqrstuvwxyz0123456789_") != strlen(basename))
+	if (!pg_str_containsonly(basename, "abcdefghijklmnopqrstuvwxyz0123456789_"))
 		ereport(ERROR,
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid text search configuration file name \"%s\"",
diff --git a/src/backend/utils/adt/regproc.c b/src/backend/utils/adt/regproc.c
index 394042c..7e41359 100644
--- a/src/backend/utils/adt/regproc.c
+++ b/src/backend/utils/adt/regproc.c
@@ -32,6 +32,7 @@
 #include "catalog/pg_ts_config.h"
 #include "catalog/pg_ts_dict.h"
 #include "catalog/pg_type.h"
+#include "common/string.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "parser/parse_type.h"
@@ -75,7 +76,7 @@ regprocin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -286,7 +287,7 @@ regprocedurein(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (pro_name_or_oid[0] >= '0' &&
 		pro_name_or_oid[0] <= '9' &&
-		strspn(pro_name_or_oid, "0123456789") == strlen(pro_name_or_oid))
+		pg_str_containsonly(pro_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(pro_name_or_oid)));
@@ -535,7 +536,7 @@ regoperin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -750,7 +751,7 @@ regoperatorin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (opr_name_or_oid[0] >= '0' &&
 		opr_name_or_oid[0] <= '9' &&
-		strspn(opr_name_or_oid, "0123456789") == strlen(opr_name_or_oid))
+		pg_str_containsonly(opr_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(opr_name_or_oid)));
@@ -995,7 +996,7 @@ regclassin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (class_name_or_oid[0] >= '0' &&
 		class_name_or_oid[0] <= '9' &&
-		strspn(class_name_or_oid, "0123456789") == strlen(class_name_or_oid))
+		pg_str_containsonly(class_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		CStringGetDatum(class_name_or_oid)));
@@ -1186,7 +1187,7 @@ regtypein(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (typ_name_or_oid[0] >= '0' &&
 		typ_name_or_oid[0] <= '9' &&
-		strspn(typ_name_or_oid, "0123456789") == strlen(typ_name_or_oid))
+		pg_str_containsonly(typ_name_or_oid, "0123456789"))
 	{
 		result = DatumGetObjectId(DirectFunctionCall1(oidin,
 		  CStringGetDatum(typ_name_or_oid)));
@@ -1358,7 +1359,7 @@ regconfigin(PG_FUNCTION_ARGS)
 	/* Numeric OID? */
 	if (cfg_name_or_oid[0] >= '0' &&
 		cfg_name_or_oid[0] <= '9' &&
-