Re: [HACKERS] extending relations more efficiently

2012-05-02 Thread Jeroen Vermeulen

On 2012-05-01 22:06, Robert Haas wrote:


It might also be interesting to provide a mechanism to pre-extend a
relation to a certain number of blocks, though if we did that we'd
have to make sure that autovac got the memo not to truncate those
pages away again.


Good point.  And just to check before skipping over it, do we know that 
autovacuum not leaving enough slack space is not a significant cause of 
the bottlenecks in the first place?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-24 Thread Jeroen Vermeulen

On 2012-02-23 22:12, Noah Misch wrote:


That alone would not simplify the patch much.  INSERT/UPDATE/DELETE on the
foreign side would still need to take some kind of tuple lock on the primary
side to prevent primary-side DELETE.  You then still face the complicated case
of a tuple that's both locked and updated (non-key/immutable columns only).
Updates that change keys are relatively straightforward, following what we
already do today.  It's the non-key updates that complicate things.


Ah, so there's the technical hitch.  From previous discussion I was 
under the impression that:


1. Foreign-key updates only inherently conflict with _key_ updates on 
the foreign side, and that non-key updates on the foreign side were just 
caught in the locking cross-fire, so to speak.


And

2. The DELETE case was somehow trivially accounted for.  But, for 
instance, there does not seem to be a lighter lock type that DELETE 
conflicts with but UPDATE does not.  Bummer.




By then, though, that change would share little or no code with the current
patch.  It may have its own value, but it's not a means for carving a subset
from the current patch.


No, to be clear, it was never meant to be.  Only a possible way to give 
users a way out of foreign-key locks more quickly.  Not a way to get 
some of the branch out to users more quickly.


At any rate, that seems to be moot then.  And to be honest, mechanisms 
designed for more than one purpose rarely pan out.


Thanks for explaining!


Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2012-02-23 Thread Jeroen Vermeulen

On 2012-02-23 10:18, Simon Riggs wrote:


However, review of such a large patch should not be simply pass or
fail. We should be looking back at the original problem and ask
ourselves whether some subset of the patch could solve a useful subset
of the problem. For me, that seems quite likely and this is very
definitely an important patch.

Even if we can't solve some part of the problem we can at least commit
some useful parts of infrastructure to allow later work to happen more
smoothly and quickly.

So please let's not focus on the 100%, lets focus on 80/20.


The suggested immutable-column constraint was meant as a potential 
80/20 workaround.  Definitely not a full solution, helpful to some, 
probably easier to do.  I don't know if an immutable key would actually 
be enough to elide foreign-key locks though.


Simon, I think you had a reason why it couldn't work, but I didn't quite 
get your meaning and didn't want to distract things further at that 
stage.  You wrote that it doesn't do what KEY LOCKS are designed to 
do...  any chance you might recall what the problem was?


I don't mean to be pushy about my pet idea, and heaven knows I don't 
have time to implement it, but it'd be good to know whether I should put 
the whole thought to rest.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM ANALYZE is faster than ANALYZE?

2012-02-22 Thread Jeroen Vermeulen

On 2012-02-22 16:29, Tom Lane wrote:

(Snip context)


VACUUM ANALYZE
consists of two separate passes, VACUUM and then ANALYZE, and the second
pass is going to be random I/O by your definition no matter what.


I don't suppose there's a case where the VACUUM (1) gets to delete lots 
and lots of rows that then don't need ANALYZE'ing, and (2) can do so 
without actually touching all those pages?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2011-11-11 Thread Jeroen Vermeulen

On 2011-11-12 00:30, David Kerr wrote:


Is this being suggested in lieu of Alvaro's patch? because it seems to be adding
complexity to the system (multiple types of primary key definitions) instead of
just fixing an obvious problem (over-aggressive locking done on FK checks).


It wouldn't be a new type of primary key definition, just a new type of 
column constraint similar to not null.  Particularly useful with keys, 
but entirely orthogonal to them.


Parser and reserved words aside, it seems a relatively simple change. 
Of course that's not necessarily the same as small.




If it's going to be in addition to, then it sounds like it'd be really nice.


I wasn't thinking that far ahead, myself.  But if some existing lock 
type covers the situation well enough, then that could be a big argument 
for doing it in-lieu-of.


I haven't looked at lock types much so I could be wrong, but my 
impression is that there are dangerously many lock types already.  One 
would expect the risk of subtle locking bugs to grow as the square of 
the number of interacting lock types.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disable OpenSSL compression

2011-11-08 Thread Jeroen Vermeulen

On 2011-11-08 22:59, Albe Laurenz wrote:


In addition to the oprofile data I collected three times:
- the duration as shown in the server log
- the duration as shown by \timing
- the duration of the psql command as measured by time


[...]


I think this makes a good case for disabling compression.


It's a few data points, but is it enough to make a good case?  As I 
understand it, compression can save time not only on transport but also 
on the amount of data that needs to go through encryption -- probably 
depending on choice of cypher, hardware support, machine word width, 
compilation details etc.  Would it make sense to run a wider experiment, 
e.g. in the buld farm?


Another reason why I believe compression is often used with encryption 
is to maximize information content per byte of data: harder to guess, 
harder to crack.  Would that matter?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple queries in transit

2011-11-06 Thread Jeroen Vermeulen

On 2011-11-03 17:26, Marko Kreen wrote:

On Mon, Oct 31, 2011 at 7:09 PM, Tom Lanet...@sss.pgh.pa.us  wrote:

Can't you do that today with a multi-command string submitted to
PQsendQuery, followed by multiple calls to PQgetResult?


It's more annoying to to error handling on that, plus it still keeps the
blocking behaviour, just with larger blocks.


You can combine multi-command query strings with nonblocking mode, 
without any change in libpq itself.


In fact that's exactly what the libpqxx pipeline class does.  So if 
you're working in C++, you already have this feature at your disposal.




Also I would ask for opposite feature: multiple rows in flight.
That means that when server is sending big resultset,
the app can process it row-by-row (or by 10 rows)
without stopping the stream and re-requesting.


Cursors.


Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign key locks, 2nd attempt

2011-11-06 Thread Jeroen Vermeulen

On 2011-11-04 01:12, Alvaro Herrera wrote:


I would like some opinions on the ideas on this patch, and on the patch
itself.  If someone wants more discussion on implementation details of
each part of the patch, I'm happy to provide a textual description --
please just ask.


Jumping in a bit late here, but thanks for working on this: it looks 
like it could solve some annoying problems for us.


I do find myself idly wondering if those problems couldn't be made to go 
away more simply given some kind of “I will never ever update this key” 
constraint.  I'm having trouble picturing the possible lock interactions 
as it is.  :-)



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IDLE in transaction introspection

2011-11-01 Thread Jeroen Vermeulen

On 2011-11-01 21:13, Andrew Dunstan wrote:


Rename it please. current_query will just be wrong. I'd be inclined
just to call it query or query_string and leave it to the docs to
define the exact semantics.


I think query for a query that isn't ongoing would be just as wrong. 
How about last_query?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple queries in transit

2011-10-31 Thread Jeroen Vermeulen

On 2011-11-01 00:53, Merlin Moncure wrote:

On Mon, Oct 31, 2011 at 12:49 PM, Mark Hillsmark.hi...@framestore.com  wrote:



Furthermore, in most apps it'd be a serious PITA to keep track of which
reply is for which query, so I doubt that such a feature is of general
usefulness.


In our UI case, we already have a queue. Because libpq can't pipeline
multiple queries, we have to make our own queue of them anyway.


In libpqxx (the C++ API) you do get support for this kind of pipelining. 
 Look for the pipeline class.  It uses the concatenate queries, 
retrieve multiple results trick.


The pipeline also serves as an easy-to-manage interface for asynchronous 
querying: fire off your query, go do other things while the server is 
working, then ask for the result (at which point you'll block if necessary).


Front page: http://pqxx.org/development/libpqxx/

Pipeline class: 
http://pqxx.org/devprojects/libpqxx/doc/stable/html/Reference/a00062.html


Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] News on Clang

2011-06-24 Thread Jeroen Vermeulen

On 2011-06-25 00:02, Peter Geoghegan wrote:


At a large presentation that I and other PG community members were
present at during FOSDEM, Postgres was specifically cited as an
example of a medium-sized C program that had considerably improved
compile times on Clang. While I was obviously unable to reproduce the
very impressive compile-time numbers claimed (at -O0), I still think
that Clang has a lot of promise. Here are the slides from that
presentation:

http://www.scribd.com/doc/48921683/LLVM-Clang-Advancing-Compiler-Technology


I notice that the slide about compilation speed on postgres compares 
only front-end speeds between gcc and clang, not the speeds for 
optimization and code generation.  That may explain why the difference 
is more pronounced on the slide than it is for a real build.


By the way, I was amazed to see such a young compiler build libpqxx with 
no other problems than a few justified warnings or errors that gcc 
hadn't issued.  And that's C++, which is a lot harder than C!  The 
output was also far more helpful than gcc's.  IIRC I found clang just 
slightly faster than gcc on a full configure/build/test.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Denormal float values break backup/restore

2011-06-20 Thread Jeroen Vermeulen

On 2011-06-20 19:22, Marti Raudsepp wrote:


AIUI that is defined to be a little vague, but includes denormalized numbers
that would undergo any rounding at all.  It says that on overflow the
conversion should return the appropriate HUGE_VAL variant, and set ERANGE.
  On underflow it returns a reasonably appropriate value (and either may or
must set ERANGE, which is the part that isn't clear to me).


Which standard is that? Does IEEE 754 itself define strtod() or is
there another relevant standard?


Urr.  No, this is C and/or Unix standards, not IEEE 754.

I did some more research into this.  The postgres docs do specify the 
range error, but seemingly based on a different interpretation of 
underflow than what I found in some of the instances of strtod() 
documentation:


Numbers too close to zero that are not representable as
distinct from zero will cause an underflow error.

This talks about denormals that get _all_ their significant digits 
rounded away, but some of the documents I saw specify an underflow for 
denormals that get _any_ of their significant digits rounded away (and 
thus have an abnormally high relative rounding error).


The latter would happen for any number that is small enough to be 
denormal, and is also not representable (note: that's not the same thing 
as not representable as distinct from zero!).  It's easy to get 
non-representable numbers when dumping binary floats in a decimal 
format.  For instance 0.1 is not representable, nor are 0.2, 0.01, and 
so on.  The inherent rounding of non-representable values produces 
weirdness like 0.1 + 0.2 - 0.3 != 0.


I made a quick round of the strtod() specifications I could find, and 
they seem to disagree wildly:


  SourceERANGE when Return what
-
PostgreSQL docs All digits lost zero
Linux programmer's manual   All digits lost zero
My GNU/Linux strtod()   Any digits lost rounded number
SunOS 5 Any digits lost rounded number
GNU documentation   All digits lost zero
IEEE 1003.1 (Open Group 2004)   Any digits lost denormal
JTC1/SC22/WG14 N794 Any digits lost denormal
Sun Studio (C99)Implementation-defined  ?
ISO/IEC 9899:TC2Implementation-defined  denormal
C99 Draft N869 (1999)   Implementation-defined  denormal

We can't guarantee very much, then.  It looks like C99 disagrees with 
the postgres interpretation, but also leaves a lot up to the compiler.


I've got a few ideas for solving this, but none of them are very good:

(a) Ignore underflow errors.

This could hurt anyone who relies on knowing their floating-point 
implementation and the underflow error to keep their rounding errors in 
check.  It also leaves a kind of gap in the predictability of the 
database's floating-point behaviour.


Worst hit, or possibly the only real problem, would be algorithms that 
divide other numbers, small enough not to produce infinities, by rounded 
denormals.


(b) Dump REAL and DOUBLE PRECISION in hex.

With this change, the representation problem goes away and ERANGE would 
reliably mean this was written in a precision that I can't reproduce. 
We could sensibly provide an option to ignore that error for 
cross-platform dump/restores.


This trick does raise a bunch of compatibility concerns: it's a new 
format of data to restore, it may not work on pre-C99 compilers, and so 
on.  Also, output for human consumption would have to differ from 
pg_dump output.


(c) Have pg_dump produce calculations, not literals, for denormals.

Did I mention how these were not great ideas?  If your database dump 
contains 1e-308, pg_dump could recognize that this value can be 
calculated in the database but possibly not entered directly, and dump 
e.g. 1e-307::float / 10 instead.


(d) Make pg_dump set some ignore underflows option.

This may make dumps unusable for older postgres versions.  Moreover, it 
doesn't help ORMs and applications that are currently unable to store 
the problem numbers.


(e) Do what the documentation promises.

Actually I have no idea how we could guarantee this.

(f) Ignore ERANGE unless strtod() returns ±0 or ±HUGE_VAL.

This is probably a reasonable stab at common sense.  It does have the 
nasty property that it doesn't give a full guarantee either way: 
restores could still break on pre-C99 systems that return 0 on 
underflow, but C99 doesn't guarantee a particularly accurate denormal. 
In practice though, implementations seem to do their best to give you 
the most appropriate rounded number.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUG] Denormal float values break backup/restore

2011-06-11 Thread Jeroen Vermeulen

On 2011-06-11 01:57, Tom Lane wrote:


(5) Lobby your libc providers to make strtod accept denormals without
throwing ERANGE.  There is no obvious reason why it shouldn't.  (On at
least one of my boxes, it does.)


The standard either explicitly allows or requires this behaviour 
(depending on which text I look at) for underflow.


AIUI that is defined to be a little vague, but includes denormalized 
numbers that would undergo any rounding at all.  It says that on 
overflow the conversion should return the appropriate HUGE_VAL variant, 
and set ERANGE.  On underflow it returns a reasonably appropriate value 
(and either may or must set ERANGE, which is the part that isn't clear 
to me).


ISTM the appropriate response to ERANGE combined with a small return 
value is to either ignore or report the rounding error, but accept the 
return value.  The current code in float.c doesn't check the return 
value at all and treats all ERANGE conditions as equal.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-04 Thread Jeroen Vermeulen

On 2010-12-03 20:49, Ana Carolina Brito de Almeida wrote:


We add a simple case study (sourceforge page):
http://sourceforge.net/projects/hypotheticalind/files/TUTORIAL_8_4.pdf/download


Great, thanks!

I'll try to write a bit more about it later:

http://pqxx.org/development/libpqxx/wiki/HypotheticalIndexes


Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Jeroen Vermeulen

On 2010-12-02 00:48, Ana Carolina Brito de Almeida wrote:


We would like to inform you all that our extension to PostgreSQL, that includes 
hypothetical indexes (and soon index self-tuning), is available through a 
sourgeforge project.
This was suggested at PgCon 2010 and we hope some of you may find it useful, 
contribute and give us your feedback.


Looking at the sourceforge page, I'm left with one burning question: 
what are they for?


I can see what a hypothetical index is, but neither the project pages 
nor the README in the tarball say why I might want one.  I'd be quite 
interested to know that.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hypothetical Indexes - PostgreSQL extension - PGCON 2010

2010-12-03 Thread Jeroen Vermeulen

On 2010-12-03 19:44, Sergio Lifschitz wrote:

Indeed, hypothetical indexes are good to check potentially good
configurations without harming the whole system with actual index
creation. Please observer that we've added an explain hypothetical
command, that will include plans considering hypothetical indexes! We'll
try to add a simple case study that would help those wondering about
this project.


That sounds very useful indeed!


Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indent authentication overloading

2010-11-17 Thread Jeroen Vermeulen

On 2010-11-17 22:43, Magnus Hagander wrote:


at the advantage of not confusing new users. We could of course also
just drop ident-over-tcp completely, but there might be some poor guy
out there who actually *uses* it :-)


As far as I know, companies do use it in their internal networks where 
they do have a reasonable shot at full control over ident connections. 
I don't know how easy it would be for them to switch to other methods.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Indent authentication overloading

2010-11-17 Thread Jeroen Vermeulen

On 2010-11-18 00:14, Magnus Hagander wrote:


If it was a matter of changing it for those who use ident over tcp, I
really wouldn't hesitate - they're few :-) But the problem is that
it's the ident-over-tcp that's correctly named, not the other one...


True.

By the way ISTR we don't fall back to identd for TCP connections alone. 
 I saw it documented somewhere that we also talk to identd on operating 
systems that won't tell us who's on the other end of the socket.  Are 
those still out there?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Amazon now supporting GPU focused EC2 instances

2010-11-15 Thread Jeroen Vermeulen

On 2010-11-15 18:49, Greg Stark wrote:


I've seen papers on doing relational joins using GPUs and I'm sure
there are other algorithms we wonderful stuff we could do. But if it
comes at the cost of being able to handle arbitrary join clauses it'll
be a tough sacrifice to make.


Perhaps the coolest use of all is as an intermediate filtering stage for 
spatial joins, using collision detection.  Draw your data and your 
search region (slightly enlarged) as objects and ask the graphics card 
if the search region collides with anything.  Much like it might ask is 
this player character bumping into any walls?


IIRC it's been prototyped in Oracle with impressive results.  Using 
OpenGL, so quite portable between GPUs.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-26 Thread Jeroen Vermeulen

Mark Mielke wrote:

Re-planning a generic plan with another generic plan may generate zero 
benefit, with a measurable cost. More on this after...


Nobody's talking about doing that any more.  I proposed it initially 
because I didn't know about changes that made it unnecessary.



All the points about ms seem invalid to me. There are many reason why ms 
could increase, and many of them have nothing to do with plan 
efficiency. Again, re-planning due to a high ms, or a high ratio of ms, 
does not indicate that re-planning will improve the success of the plan. 
The planning process does not measure ms or predict ms.


That's true, but missing some very basic points about the idea: one, if 
we can tell that a query is going to be expensive, then the cost of 
re-planning it is marginal.  Two, if we can tell that a query is going 
to be expensive, then we stand a lot to gain if re-planning turns out to 
be useful.  It follows that we can afford to re-plan on the off-chance, 
without anything more than a vague orders-of-magnitude idea of what 
expensive means.


What Tom said validates a big assumption I've been making: that we do in 
fact have a decent shot at telling in advance that a query is going to 
be expensive.  Which means we have a decent shot at stopping your 100ms 
query from taking seconds just because you prepared it and are missing 
out on that tiny partial index.  That would be worth the extra planning 
time at a 1% hit rate, and there's not much downside if we don't reach that.




My idea of an optimal system is as follows:

1) Prepare gathers and caches data about the tables involved in the 
query, including column statistics that are likely to be required during 
the planning process, but prepare does not running the planning process.


It sounds to me like you're in the process of inventing another planning 
process.  Developer time aside, how much CPU time can you afford to 
throw at this?


I don't see any reason to argue over what would be optimal when so much 
information is still missing.  It just makes the problem look harder 
than it is.  To me, our best shot at getting something useful is to stay 
simple and defensive.  After that, if there is still a need, we'll have 
code to help us gather more data and figure out how to make it better. 
Nothing wrong with the lowest-hanging fruit.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-25 Thread Jeroen Vermeulen

Robert Haas wrote:

On Wed, Feb 17, 2010 at 5:52 PM, Jeroen Vermeulen j...@xs4all.nl wrote:

I may have cut this out of my original email for brevity... my impression is
that the planner's estimate is likely to err on the side of scalability, not
best-case response time; and that this is more likely to happen than an
optimistic plan going bad at runtime.


Interestingly, most of the mistakes that I have seen are in the
opposite direction.


I guess there's not much we can do about those, except decide after 
running that it's worth optimizing for specific values.




Yeb points out a devil in the details though: the cost estimate is unitless.
 We'd have to have some orders-of-magnitude notion of how the estimates fit
into the picture of real performance.


I'm not sure to what extent you can assume that the cost is
proportional to the execution time.  I seem to remember someone
(Peter?) arguing that they're not related by any fixed ratio, partly
because things like page costs vs. cpu costs didn't match physical
reality, and that in fact some attempts to gather better empirically
better values for things like random_page_cost and seq_page_cost
actually ended up making the plans worse rather than better.  It would
be nice to see some research in this area...


Getting representative workloads and machine configurations may make 
that hard.  :/


But all we really want is a check for really obscene costs, as an extra 
stopgap so we don't have to wait for the thing to execute before we 
decide it's too costly.  Surely there must be some line we can draw.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq PGresult object and encoding

2010-02-21 Thread Jeroen Vermeulen

Jeff Davis wrote:
libpq has a PQclientEncoding() function that takes a connection object. 


However, the client encoding is, in some cases, a property of the result
object. For instance, if your client_encoding changes, but you keep the
result object around, you have no way to determine later what encoding
the results are in.

The result object already saves the client_encoding. Should an accessor
be provided?


That'd certainly save libpqxx the trouble of lugging a copy around.


Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-21 Thread Jeroen Vermeulen

Greg Stark wrote:


So in principle I agree with this idea. I think a conservative value
for the constant would be more like 100x though. If I told you we had
an easy way to speed all your queries up by 10% by caching queries but
were just choosing not to then I think you would be unhappy. Whereas
if I told you we were spending 1% of the run-time planning queries I
think most people would not be concerned.


Makes sense.  The main thing is that there be an order-of-magnitude 
difference to hide the potential extra planning cost in.  If that 
includes a switched SSL connection, 10% of execution is probably 
reasonable because it's a much smaller portion of overall response 
time--but on a local connection it's a liability.




There's a second problem though. We don't actually know how long any
given query is going to take to plan or execute. We could just
remember how long it took to plan and execute last time or how long it
took to plan last time and the average execution time since we cached
that plan. Perhaps we should track the stddev of the execution plan,
or the max execution time of the plan? Ie there are still unanswered
questions about the precise heuristic to use but I bet we can come up
with something reasonable.


I may have cut this out of my original email for brevity... my 
impression is that the planner's estimate is likely to err on the side 
of scalability, not best-case response time; and that this is more 
likely to happen than an optimistic plan going bad at runtime.


If that is true, then the cost estimate is at least a useful predictor 
of statements that deserve re-planning.  If it's not true (or for cases 
where it's not true), actual execution time would be a useful back-up at 
the cost of an occasional slow execution.


Yeb points out a devil in the details though: the cost estimate is 
unitless.  We'd have to have some orders-of-magnitude notion of how the 
estimates fit into the picture of real performance.



Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-15 Thread Jeroen Vermeulen

Tom Lane wrote:


Well, no, consider the situation where planning takes 50 ms, the generic
plan costs 100ms to execute, but a parameter-specific plan would take 1ms
to execute.  Planning is very expensive compared to execution but it's
still a win to do it.


I think that's a fun and worthwhile problem.  But my limited personal 
aim right now is a stopgap for pathological cases.  I'd like to pick the 
low-hanging fruit; actually squeezing the fat out of prepared statements 
is a job I wouldn't get around to completing.  Sorry for mixing metaphors.


Here's what I like about the really slow plans. (Now why does that sound 
so strange?)  We don't know if re-planning will help, but we do know 
that (1) it won't hurt much relative to execution time, so we really 
don't _care_; and (2) there is lots of potential for improvement, so 
catching just one execution that can be much faster might pay for all 
the extra time spent re-planning.


Where do we draw the line between costly and pathological?  I still like 
Bart's idea of a fixed ratio to planning time, because it reflects a 
self-tuning sense of proportion.  Sure, planning time can vary a lot but 
we're talking about an order-of-magnitude difference, not an exact 19:21 
optimum.  We can be sloppy and still expect to win.


AFAIC a statement could go to re-planning mode if the shortest 
execution time for the generic plan takes at least 10x longer than the 
longest planning time.  That gives us a decent shot at finding 
statements where re-planning is a safe bet.  A parameter that we or the 
user would have to tweak would just be a fragile approximation of that.




A possible scheme is to try it and keep track of whether we ever
actually do get a better plan.  If, after N attempts, none of the custom
plans were ever more than X% cheaper than the generic one, then give up
and stop attempting to produce custom plans.  Tuning the variables might
be challenging though.


A simple stopgap implementation may also be a useful experimentation 
platform for refinements.  It shouldn't be too complex to rip out when 
something better comes along.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Avoiding bad prepared-statement plans.

2010-02-09 Thread Jeroen Vermeulen
I've been discussing this with Josh, Heikki, and Peter E. over the past 
few weeks.


As Peter observed years ago, prepared statements can perform badly 
because their plans are overly generic.  Also, statistics change and 
sometimes plans should change with them.  It would be nice if we could 
avoid users having to worry about these things.


I have some ideas that I'm willing to work on, if people agree that 
they're useful.  These are simple changes; the goal is to avoid 
pathological performance at minimal cost, not to make prepared 
statements faster than parameterized ones for everyone.  The ideas 
interact in various ways.



= Projected-cost threshold =

If a prepared statement takes parameters, and the generic plan has a 
high projected cost, re-plan each EXECUTE individually with all its 
parameter values bound.  It may or may not help, but unless the planner 
is vastly over-pessimistic, re-planning isn't going to dominate 
execution time for these cases anyway.


= Actual-cost threshold =

Also stop using the generic plan if the statement takes a long time to 
run in practice.  Statistics may have gone bad.  It could also be a 
one-off due to a load peak or something, but that's handled by:


= Plan refresh =

Periodically re-plan prepared statements on EXECUTE.  This is also a 
chance for queries that were being re-planned every time to go back to a 
generic plan.



Does that sound useful?  Maybe it's too much engineering for little gain 
compared to re-planning every EXECUTE.  OTOH it doesn't look that much 
harder than going that way either.  And maybe there's some extra 
planning effort that might be worthwhile for a reusable plan but not for 
an individual query.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-09 Thread Jeroen Vermeulen

Yeb Havinga wrote:

I've been discussing this with Josh, Heikki, and Peter E. over the 
past few weeks.

Is this searchable in the archives? I'm interested in ideas discussed.


No, sorry.  These were face-to-face discussions at linux.conf.au and FOSDEM.


If a prepared statement takes parameters, and the generic plan has a 
high projected cost, re-plan each EXECUTE individually with all its 
parameter values bound.  It may or may not help, but unless the 
planner is vastly over-pessimistic, re-planning isn't going to 
dominate execution time for these cases anyway.


This sounds like a really nice to have feature. Maybe it'd also be 
possible to skip replanning between executes if the current bound values 
are 'indexwise-equivalent' to the values used at previous planning, i.e. 
nothing in the statistics indicates that execution cost would be (much) 
different. Are there more ways to cut down on planning time? Obviously 
some plannedstatement/plannerinfo structures could be kept, but maybe 
it'd also be possible to plan only that part of the join tree where the 
params are used in a scan/join qual.


I think we should be careful not to over-think this.  Planning isn't 
*that* costly, so apply Amdahl's Law liberally.  I'm proposing some easy 
things we could do without adding much overhead or maintenance burden; 
I've been assuming that getting intimate with the planner would risk 
those advantages.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-09 Thread Jeroen Vermeulen

Richard Huxton wrote:


= Actual-cost threshold =

Also stop using the generic plan if the statement takes a long time to
run in practice.


Do you mean:
1. Rollback the current query and start again
2. Mark the plan as a bad one and plan again next execute

If you can figure out how to do #1 then you could probably do it for all 
queries, but I'm guessing it's far from simple to implement.


I'm talking about #2.  As a matter of fact #1 did come up in one of 
those discussions, but how do you know you're not killing the query 
juuust before it'd done, and then maybe executing a different plan 
that's no better?




= Plan refresh =

Periodically re-plan prepared statements on EXECUTE. This is also a
chance for queries that were being re-planned every time to go back to a
generic plan.


Presumably some score based on update stats and vacuum activity etc.


I was thinking of something very simple: re-do whatever we'd do if the 
statement were only being prepared at that point.



The good side of all these ideas is good indeed. The bad side is plan 
instability. Someone somewhere will have a generic plan that turns out 
better than the specific plan (due to bad stats or config settings or 
just planner limitations). The question is (I guess): How many more 
winners will there be than losers?


That's a good and surprising point, and therefore I'd like to draw 
attention away to a different point.  :-)


Yes, there will be losers in the sense that people may have optimized 
their use of prepared statements to whatever the current planner does. 
Maybe somebody out there even deliberately uses them to trick the 
planner into a different plan.  But that is always going to happen; 
we're aiming for better plans, not for giving more detailed control over 
them.  If you really can't take a change, don't upgrade.


The competing point is: people out there may currently be forgoing 
prepared statements entirely because of erratic performance.  To those 
people, if we can help them, it's like having a new feature.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-09 Thread Jeroen Vermeulen

Andres Freund wrote:


= Actual-cost threshold =

Also stop using the generic plan if the statement takes a long time to
run in practice.  Statistics may have gone bad.  It could also be a
one-off due to a load peak or something, but that's handled by:


That is not that easy. It means that you have to use savepoints enclosing each 
and every execution of a prepared statement because the query could have 
sideeffects. Which wouldnt be terribly efficient...


This is not within an execution of the statement, but across executions. 
 So the next execution learns from the previous result.  So I'm not 
talking about aborting the ongoing execution.  Sorry for being unclear.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fixes for compiler warnings

2009-01-20 Thread Jeroen Vermeulen

Peter Eisentraut wrote:


-Wformat-security warns about

printf(var);

but not about

printf(var, a);

I don't understand that; the crash or exploit potential is pretty much the 
same in both cases.


Not sure this is the reason, but in the first case any risk is trivially 
avoided by using puts() or printf(%s, var) instead.  So printf(var) is 
almost certainly not what you mean.


I think that's a reasonable warning to have enabled, whereas the other 
one is more of a try it sometime, you might find something kind of 
warning.



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq WSACleanup is not needed

2009-01-20 Thread Jeroen Vermeulen

Merlin Moncure wrote:


I think init/uninit is the answer.  While writing libpqtypes, we noted
that libpq is just plain awkward in a few different ways and probably
deserves a rewrite at some point.  not today though


Would there be any serious harm in:

1. doing the WSAStartup() when the first connection is opened, but
2. cleaning up from either
2a. some kind of on-unload version of DllMain() if possible, or
2b. an explicit new cleanup function

?

(I know it says you can't set the thing up from DllMain, but does it say 
something similar about shutdown?)



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] binary representation of datatypes

2008-10-22 Thread Jeroen Vermeulen

Matthieu Imbert wrote:


scenario 1 - parse the textual representation of all results of requests to the 
database and convert textual timestamps to a binary
format that i choose among those ones (number of microseconds since 2000-01-01, 
or a structure similar to pg_tm (but with
microsecond precision), or a time-format similar to one defined in rfc1305, or 
something else)

or

scenario 2 - directly use pgsql binary timestamp format. I think the latter is 
far more efficient. I'm new to postgresql, but from
what i understand, here are the conversions involved in both scenarios (hopping 
that my ascii art won't be garbled by your mail
clients ;-) :


scenario 1:
.-.  .--.  .-.  .--.  .--.  
.--.  .-.
|timestamp|  |pgsql |  |timestamp|  |pgsql |  |timestamp |  |my 
   |  |my   |
|storage  |-|internal  |-|storage  |-|network   |-|as|-|timestamp 
|-|timestamp|
|in   |  |to|  |in   |  |to|  |textual   |  
|conversion|  |format   |
|database |  |network   |  |network  |  |textual   |  |representation|  
|routines  |  | |
|backend  |  |conversion|  | |  |conversion|  |  |  |   
   |  | |
| |  |function  |  | |  |function  |  |  |  |   
   |  | |
'-'  '--'  '-'  '--'  '--'  
'--'  '-'


I think this scenario has two boxes too many.  Why would the backend 
convert to network representation before converting to text?



Jeroen

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrent VACUUM and ANALYZE

2008-07-24 Thread Jeroen Vermeulen

Jonah H. Harris wrote:

On Mon, Jul 21, 2008 at 10:19 PM, Tom Lane [EMAIL PROTECTED] wrote:

Jonah H. Harris [EMAIL PROTECTED] writes:

The case I'm looking at is a large table which requires a lazy vacuum,
and a zero vacuum cost delay would cause too much I/O.  Yet, this
table has enough insert/delete activity during a vacuum, that it
requires a fairly frequent analysis to maintain proper plans.  I
patched as mentioned above and didn't run across any unexpected
issues; the only one expected was that mentioned by Alvaro.

I don't find this a compelling argument, at least not without proof that
the various vacuum-improvement projects already on the radar screen
(DSM-driven vacuum, etc) aren't going to fix your problem.


Is DSM going to be in 8.4?  The last I had heard, DSM+related
improvements weren't close to being guaranteed for this release.  If
it doesn't make it, waiting another year and a half for something
easily fixed would be fairly unacceptable.  Should I provide a patch
in the event that DSM doesn't make it?


For the immediate term, would it make sense for the ANALYZE to give up 
and simply return if a VACUUM was in progress?


At least that way a client that sees performance degrade quickly between 
vacuums can run the occasional preventative analyze without blocking 
completely on auto-vacuums.



Jeroen


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers