Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
Hi, On 2018-07-20 08:27:34 -0400, Robert Haas wrote: > On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund wrote: > >> 1. Why the error message changes? If there's a good reason, it should > >> be done as a separate commit, or at least well-documented in the > >> commit message. > > > > Because there's a lot of "invalid input syntax for type %s: \"%s\"", > > error messages, and we shouldn't force translators to have separate > > version that inlines the first %s. But you're right, it'd be worthwhile > > to point that out in the commit message. > > It just seems weird that they're bundled together in one commit like this. I'll push it separately. > Nothing else from me... Thanks for looking! Greetings, Andres Freund
Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund wrote: >> 1. Why the error message changes? If there's a good reason, it should >> be done as a separate commit, or at least well-documented in the >> commit message. > > Because there's a lot of "invalid input syntax for type %s: \"%s\"", > error messages, and we shouldn't force translators to have separate > version that inlines the first %s. But you're right, it'd be worthwhile > to point that out in the commit message. It just seems weird that they're bundled together in one commit like this. >> 2. Does the likely/unlikely stuff make a noticeable difference? > > Yes. It's also largely a copy from existing code (scanint8), so I don't > really want to differ here. OK. >> 3. If this is a drop-in replacement for pg_atoi, why not just recode >> pg_atoi this way -- or have it call this -- and leave the callers >> unchanged? > > Because pg_atoi supports a variable 'terminator'. OK. >> 4. Are we sure this is faster on all platforms, or could it work out >> the other way on, say, BSD? > > I'd be *VERY* surprised if any would be faster. It's not easy to write a > faster implmentation, than what I've proposed, and especially not so if > you use strtol() as the API (variable bases, a bit of locale support). OK. Nothing else from me... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
Hi, On 2018-07-18 14:34:34 -0400, Robert Haas wrote: > On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund wrote: > > FWIW, here's a rebased version of this patch. Could probably be polished > > further. One might argue that we should do a bit more wide ranging > > changes, to convert scanint8 and pg_atoi to be also unified. But it > > might also just be worthwhile to apply without those, given the > > performance benefit. > > Wouldn't hurt to do that one too, but might be OK to just do this > much. Questions: > > 1. Why the error message changes? If there's a good reason, it should > be done as a separate commit, or at least well-documented in the > commit message. Because there's a lot of "invalid input syntax for type %s: \"%s\"", error messages, and we shouldn't force translators to have separate version that inlines the first %s. But you're right, it'd be worthwhile to point that out in the commit message. > 2. Does the likely/unlikely stuff make a noticeable difference? Yes. It's also largely a copy from existing code (scanint8), so I don't really want to differ here. > 3. If this is a drop-in replacement for pg_atoi, why not just recode > pg_atoi this way -- or have it call this -- and leave the callers > unchanged? Because pg_atoi supports a variable 'terminator'. Supporting that would create a bit slower code, without being particularly useful. I think there's only a single in-core caller left after the patch (int2vectorin). There's a fair argument that that should just be open-coded to handle the weird space parsing, but given there's probably external pg_atoi() callers, I'm not sure it's worth doing so? I don't think it's a good idea to continue to have pg_atoi as a wrapper - it takes a size argument, which makes efficient code hard. > 4. Are we sure this is faster on all platforms, or could it work out > the other way on, say, BSD? I'd be *VERY* surprised if any would be faster. It's not easy to write a faster implmentation, than what I've proposed, and especially not so if you use strtol() as the API (variable bases, a bit of locale support). Greetings, Andres Freund
Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund wrote: > FWIW, here's a rebased version of this patch. Could probably be polished > further. One might argue that we should do a bit more wide ranging > changes, to convert scanint8 and pg_atoi to be also unified. But it > might also just be worthwhile to apply without those, given the > performance benefit. Wouldn't hurt to do that one too, but might be OK to just do this much. Questions: 1. Why the error message changes? If there's a good reason, it should be done as a separate commit, or at least well-documented in the commit message. 2. Does the likely/unlikely stuff make a noticeable difference? 3. If this is a drop-in replacement for pg_atoi, why not just recode pg_atoi this way -- or have it call this -- and leave the callers unchanged? 4. Are we sure this is faster on all platforms, or could it work out the other way on, say, BSD? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)
Hi, On 2017-12-08 13:44:37 -0800, Andres Freund wrote: > On 2017-12-08 10:17:34 -0800, Andres Freund wrote: > > the strtoll is libc functionality triggered by pg_atoi(), something I've > > seen show up in numerous profiles. I think it's probably time to have > > our own optimized version of it rather than relying on libcs. > > Attached is a hand-rolled version. After quickly hacking up one from > scratch, I noticed we already kind of have one for int64 (scanint8), so > I changed the structure of this one to be relatively similar. > > It's currently using the overflow logic from [1], but that's not > fundamentally required, we could rely on fwrapv for this one too. > > This one improves performance of the submitted workload from 1223.950ms > to 1020.640ms (best of three). The profile's shape changes quite > noticeably: FWIW, here's a rebased version of this patch. Could probably be polished further. One might argue that we should do a bit more wide ranging changes, to convert scanint8 and pg_atoi to be also unified. But it might also just be worthwhile to apply without those, given the performance benefit. Anybody have an opinion on that? Greetings, Andres Freund >From a31bdd83fe02fc228263d099f5a4d2d4611970fc Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Fri, 8 Dec 2017 13:31:15 -0800 Subject: [PATCH v1] Hand code string to integer conversion for performance. Author: Reviewed-By: Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hm...@alap3.anarazel.de Backpatch: --- contrib/dblink/expected/dblink.out| 2 +- .../postgres_fdw/expected/postgres_fdw.out| 8 +- contrib/spi/refint.c | 2 +- doc/src/sgml/sources.sgml | 2 +- src/backend/libpq/pqmq.c | 6 +- .../libpqwalreceiver/libpqwalreceiver.c | 4 +- src/backend/tsearch/wparser_def.c | 8 +- src/backend/utils/adt/arrayutils.c| 3 +- src/backend/utils/adt/int.c | 4 +- src/backend/utils/adt/int8.c | 4 +- src/backend/utils/adt/numutils.c | 175 ++ src/backend/utils/adt/varlena.c | 4 +- src/include/utils/builtins.h | 2 + .../expected/plpython_subtransaction.out | 4 +- src/pl/plpython/expected/plpython_types.out | 2 +- src/pl/tcl/expected/pltcl_subxact.out | 6 +- src/test/regress/expected/aggregates.out | 2 +- src/test/regress/expected/alter_table.out | 2 +- src/test/regress/expected/copy2.out | 2 +- src/test/regress/expected/int2.out| 14 +- src/test/regress/expected/int4.out| 14 +- src/test/regress/expected/int8.out| 10 +- src/test/regress/expected/plpgsql.out | 4 +- src/test/regress/expected/select_parallel.out | 2 +- src/test/regress/regress.c| 4 +- 25 files changed, 233 insertions(+), 57 deletions(-) diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out index dfd49b937e8..6ceabb453c0 100644 --- a/contrib/dblink/expected/dblink.out +++ b/contrib/dblink/expected/dblink.out @@ -1154,7 +1154,7 @@ FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int); SELECT * FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int); -ERROR: invalid input syntax for integer: "not an int" +ERROR: invalid input syntax for type integer: "not an int" -- Make sure that the local settings have retained their values in spite -- of shenanigans on the connection. SHOW datestyle; diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index cf4863c5aa2..c321a466114 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -4087,16 +4087,16 @@ DROP FUNCTION f_test(int); -- === ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int; SELECT * FROM ft1 WHERE c1 = 1; -- ERROR -ERROR: invalid input syntax for integer: "foo" +ERROR: invalid input syntax for type integer: "foo" CONTEXT: column "c8" of foreign table "ft1" SELECT ft1.c1, ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR -ERROR: invalid input syntax for integer: "foo" +ERROR: invalid input syntax for type integer: "foo" CONTEXT: column "c8" of foreign table "ft1" SELECT ft1.c1, ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR -ERROR: invalid input syntax for integer: "foo" +ERROR: invalid input syntax for type integer: "foo" CONTEXT: whole-row reference to foreign table "ft1" SELECT sum(c2), array_agg(c8) FROM ft1 GROUP BY c8; -- ERROR -ERROR: invalid input syntax for integer: "foo" +ERROR: invalid input syntax for type integer: "foo" CONTEXT: processing expression at position 2 in select list ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum; -- =