Greg Smith wrote:
On Thu, 6 Mar 2008, Heikki Linnakangas wrote:
At the most conservative end, we could fall back to the current
method on the first escape, quote or backslash character.
I would just count the number of escaped/quote characters on each
line, and then at the end of the line switch modes between the current
code on the new version based on what the previous line looked like.
That way the only additional overhead is a small bit only when escapes
show up often, plus a touch more just once per line. Barely noticable
in the case where nothing is escaped, very small regression for
escape-heavy stuff but certainly better than the drop you reported in
the last rev of this patch.
Rev two of that design would keep a weighted moving average of the
total number of escaped characters per line (say
wma=(7*wma+current)/8) and switch modes based on that instead of the
previous one. There's enough play in the transition between where the
two approaches work better at that this should be easy enough to get a
decent transition between. Based on your data I would put the
transition at wma>4, which should keep the old code in play even if
only half the lines have the bad regression that shows up with >8
escapes per line.
I'd be inclined just to look at the first buffer of data we read in, and
make a one-off decision there, if we can get away with it. Then the cost
of testing is fixed rather than per line.
cheers
andrew
--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-patches