Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

2018-07-20 Thread Andres Freund
Hi,

On 2018-07-20 08:27:34 -0400, Robert Haas wrote:
> On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund  wrote:
> >> 1. Why the error message changes?  If there's a good reason, it should
> >> be done as a separate commit, or at least well-documented in the
> >> commit message.
> >
> > Because there's a lot of "invalid input syntax for type %s: \"%s\"",
> > error messages, and we shouldn't force translators to have separate
> > version that inlines the first %s.  But you're right, it'd be worthwhile
> > to point that out in the commit message.
> 
> It just seems weird that they're bundled together in one commit like this.

I'll push it separately.

> Nothing else from me...

Thanks for looking!

Greetings,

Andres Freund



Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

2018-07-20 Thread Robert Haas
On Thu, Jul 19, 2018 at 4:32 PM, Andres Freund  wrote:
>> 1. Why the error message changes?  If there's a good reason, it should
>> be done as a separate commit, or at least well-documented in the
>> commit message.
>
> Because there's a lot of "invalid input syntax for type %s: \"%s\"",
> error messages, and we shouldn't force translators to have separate
> version that inlines the first %s.  But you're right, it'd be worthwhile
> to point that out in the commit message.

It just seems weird that they're bundled together in one commit like this.

>> 2. Does the likely/unlikely stuff make a noticeable difference?
>
> Yes. It's also largely a copy from existing code (scanint8), so I don't
> really want to differ here.

OK.

>> 3. If this is a drop-in replacement for pg_atoi, why not just recode
>> pg_atoi this way -- or have it call this -- and leave the callers
>> unchanged?
>
> Because pg_atoi supports a variable 'terminator'.

OK.

>> 4. Are we sure this is faster on all platforms, or could it work out
>> the other way on, say, BSD?
>
> I'd be *VERY* surprised if any would be faster. It's not easy to write a
> faster implmentation, than what I've proposed, and especially not so if
> you use strtol() as the API (variable bases, a bit of locale support).

OK.

Nothing else from me...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

2018-07-19 Thread Andres Freund
Hi,

On 2018-07-18 14:34:34 -0400, Robert Haas wrote:
> On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund  wrote:
> > FWIW, here's a rebased version of this patch. Could probably be polished
> > further. One might argue that we should do a bit more wide ranging
> > changes, to convert scanint8 and pg_atoi to be also unified. But it
> > might also just be worthwhile to apply without those, given the
> > performance benefit.
> 
> Wouldn't hurt to do that one too, but might be OK to just do this
> much.  Questions:
> 
> 1. Why the error message changes?  If there's a good reason, it should
> be done as a separate commit, or at least well-documented in the
> commit message.

Because there's a lot of "invalid input syntax for type %s: \"%s\"",
error messages, and we shouldn't force translators to have separate
version that inlines the first %s.  But you're right, it'd be worthwhile
to point that out in the commit message.


> 2. Does the likely/unlikely stuff make a noticeable difference?

Yes. It's also largely a copy from existing code (scanint8), so I don't
really want to differ here.


> 3. If this is a drop-in replacement for pg_atoi, why not just recode
> pg_atoi this way -- or have it call this -- and leave the callers
> unchanged?

Because pg_atoi supports a variable 'terminator'. Supporting that would
create a bit slower code, without being particularly useful.  I think
there's only a single in-core caller left after the patch
(int2vectorin). There's a fair argument that that should just be
open-coded to handle the weird space parsing, but given there's probably
external pg_atoi() callers, I'm not sure it's worth doing so?

I don't think it's a good idea to continue to have pg_atoi as a wrapper
- it takes a size argument, which makes efficient code hard.


> 4. Are we sure this is faster on all platforms, or could it work out
> the other way on, say, BSD?

I'd be *VERY* surprised if any would be faster. It's not easy to write a
faster implmentation, than what I've proposed, and especially not so if
you use strtol() as the API (variable bases, a bit of locale support).

Greetings,

Andres Freund



Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

2018-07-18 Thread Robert Haas
On Sat, Jul 7, 2018 at 4:01 PM, Andres Freund  wrote:
> FWIW, here's a rebased version of this patch. Could probably be polished
> further. One might argue that we should do a bit more wide ranging
> changes, to convert scanint8 and pg_atoi to be also unified. But it
> might also just be worthwhile to apply without those, given the
> performance benefit.

Wouldn't hurt to do that one too, but might be OK to just do this
much.  Questions:

1. Why the error message changes?  If there's a good reason, it should
be done as a separate commit, or at least well-documented in the
commit message.

2. Does the likely/unlikely stuff make a noticeable difference?

3. If this is a drop-in replacement for pg_atoi, why not just recode
pg_atoi this way -- or have it call this -- and leave the callers
unchanged?

4. Are we sure this is faster on all platforms, or could it work out
the other way on, say, BSD?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Faster str to int conversion (was Table with large number of int columns, very slow COPY FROM)

2018-07-07 Thread Andres Freund
Hi,

On 2017-12-08 13:44:37 -0800, Andres Freund wrote:
> On 2017-12-08 10:17:34 -0800, Andres Freund wrote:
> > the strtoll is libc functionality triggered by pg_atoi(), something I've
> > seen show up in numerous profiles. I think it's probably time to have
> > our own optimized version of it rather than relying on libcs.
> 
> Attached is a hand-rolled version. After quickly hacking up one from
> scratch, I noticed we already kind of have one for int64 (scanint8), so
> I changed the structure of this one to be relatively similar.
> 
> It's currently using the overflow logic from [1], but that's not
> fundamentally required, we could rely on fwrapv for this one too.
> 
> This one improves performance of the submitted workload from 1223.950ms
> to 1020.640ms (best of three). The profile's shape changes quite
> noticeably:

FWIW, here's a rebased version of this patch. Could probably be polished
further. One might argue that we should do a bit more wide ranging
changes, to convert scanint8 and pg_atoi to be also unified. But it
might also just be worthwhile to apply without those, given the
performance benefit.

Anybody have an opinion on that?

Greetings,

Andres Freund
>From a31bdd83fe02fc228263d099f5a4d2d4611970fc Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Fri, 8 Dec 2017 13:31:15 -0800
Subject: [PATCH v1] Hand code string to integer conversion for performance.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/20171208214437.qgn6zdltyq5hm...@alap3.anarazel.de
Backpatch:
---
 contrib/dblink/expected/dblink.out|   2 +-
 .../postgres_fdw/expected/postgres_fdw.out|   8 +-
 contrib/spi/refint.c  |   2 +-
 doc/src/sgml/sources.sgml |   2 +-
 src/backend/libpq/pqmq.c  |   6 +-
 .../libpqwalreceiver/libpqwalreceiver.c   |   4 +-
 src/backend/tsearch/wparser_def.c |   8 +-
 src/backend/utils/adt/arrayutils.c|   3 +-
 src/backend/utils/adt/int.c   |   4 +-
 src/backend/utils/adt/int8.c  |   4 +-
 src/backend/utils/adt/numutils.c  | 175 ++
 src/backend/utils/adt/varlena.c   |   4 +-
 src/include/utils/builtins.h  |   2 +
 .../expected/plpython_subtransaction.out  |   4 +-
 src/pl/plpython/expected/plpython_types.out   |   2 +-
 src/pl/tcl/expected/pltcl_subxact.out |   6 +-
 src/test/regress/expected/aggregates.out  |   2 +-
 src/test/regress/expected/alter_table.out |   2 +-
 src/test/regress/expected/copy2.out   |   2 +-
 src/test/regress/expected/int2.out|  14 +-
 src/test/regress/expected/int4.out|  14 +-
 src/test/regress/expected/int8.out|  10 +-
 src/test/regress/expected/plpgsql.out |   4 +-
 src/test/regress/expected/select_parallel.out |   2 +-
 src/test/regress/regress.c|   4 +-
 25 files changed, 233 insertions(+), 57 deletions(-)

diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index dfd49b937e8..6ceabb453c0 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -1154,7 +1154,7 @@ FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int);
 
 SELECT *
 FROM dblink_fetch('myconn','error_cursor', 1) AS t(i int);
-ERROR:  invalid input syntax for integer: "not an int"
+ERROR:  invalid input syntax for type integer: "not an int"
 -- Make sure that the local settings have retained their values in spite
 -- of shenanigans on the connection.
 SHOW datestyle;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index cf4863c5aa2..c321a466114 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -4087,16 +4087,16 @@ DROP FUNCTION f_test(int);
 -- ===
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE int;
 SELECT * FROM ft1 WHERE c1 = 1;  -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  column "c8" of foreign table "ft1"
 SELECT  ft1.c1,  ft2.c2, ft1.c8 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  column "c8" of foreign table "ft1"
 SELECT  ft1.c1,  ft2.c2, ft1 FROM ft1, ft2 WHERE ft1.c1 = ft2.c1 AND ft1.c1 = 1; -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  whole-row reference to foreign table "ft1"
 SELECT sum(c2), array_agg(c8) FROM ft1 GROUP BY c8; -- ERROR
-ERROR:  invalid input syntax for integer: "foo"
+ERROR:  invalid input syntax for type integer: "foo"
 CONTEXT:  processing expression at position 2 in select list
 ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE user_enum;
 --