Re: [HACKERS] Multithread Query Planner

2012-01-27 Thread Pierre C



Not to mention palloc, another extremely fundamental and non-reentrant
subsystem.

Possibly we could work on making all that stuff re-entrant, but it would
be a huge amount of work for a distant and uncertain payoff.



Right.  I think it makes more sense to try to get parallelism working
first with the infrastructure we have.  Converting to use threading,
if we ever do it at all, should be something we view as a later
performance optimization.  But I suspect we won't want to do it
anyway; I think there will be easier ways to get where we want to be.


Multithreading got fashionable with the arrival of the Dual-core CPU a few  
years ago. However, multithreading as it is used currently has a huge  
problem : usually, threads share all of their memory. This opens the door  
to an infinite number of hard to find bugs, and more importantly, defeats  
the purpose.


Re-entrant palloc() is nonsense. Suppose you can make a reentrant  
palloc() which scales OK at 2 threads thanks to a cleverly placed atomic  
instruction. How is it going to scale on 64 cores ? On HP's new 1000-core  
ARM server with non-uniform memory access ? Probably it would suck very  
very badly... not to mention the horror of multithreaded exception-safe  
deallocation when 1 thread among many blows up on an error...


For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory  
to wire together his 12000 DSP blocks ? Nope, he's using isolated  
Processes which share nothing and communicate through FIFOs and hardware  
message passing. Like shell pipes, basically. Or Erlang.


Good parallelism = reduce shared state and communicate through  
data/message channels.


Shared-everything multithreading is going to be in a lot of trouble on  
future many-core machines. Incidentally, Postgres, with its Processes,  
sharing only what is needed, has a good head start...


With more and more cores coming, you guys are going to have to fight to  
reduce the quantity of shared state between processes, not augment it by  
using shared memory threads !...


Say you want to parallelize sorting.
Sorting is a black-box with one input data pipe and one output data pipe.
Data pipes are good for parallelism, just like FIFOs. FPGA guys love black  
boxes with FIFOs between them.


Say you manage to send tuples through a FIFO like zeromq. Now you can even  
run the sort on another machine and allow it to use all the RAM if you  
like. Now split the black box in two black boxes (qsort and merge),  
instanciate as many qsort boxes as necessary, and connect that together  
with pipes. Run some boxes on some of this machine's cores, some other  
boxes on another machine, etc. That would be very flexible (and scalable).


Of course the black box has a small backdoor : some comparison functions  
can access shared state, which is basically *the* issue (not reentrant  
stuff, which you do not need).


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Inlining comparators as a performance optimisation

2012-01-13 Thread Pierre C

On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane t...@sss.pgh.pa.us wrote:


Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

On 21.09.2011 18:46, Tom Lane wrote:

The idea that I was toying with was to allow the regular SQL-callable
comparison function to somehow return a function pointer to the
alternate comparison function,



You could have a new function with a pg_proc entry, that just returns a
function pointer to the qsort-callback.


Yeah, possibly.  That would be a much more invasive change, but cleaner
in some sense.  I'm not really prepared to do all the legwork involved
in that just to get to a performance-testable patch though.


A few years ago I had looked for a way to speed up COPY operations, and it  
turned out that COPY TO has a good optimization opportunity. At that time,  
for each datum, COPY TO would :


- test for nullness
- call an outfunc through fmgr
- outfunc pallocs() a bytea or text, fills it with data, and returns it  
(sometimes it uses an extensible string buffer which may be repalloc()d  
several times)
- COPY memcpy()s returned data to a buffer and eventually flushes the  
buffer to client socket.


I introduced a special write buffer with an on-flush callback (ie, a close  
relative of the existing string-buffer), in this case the callback was  
flush to client socket, and several outfuncs (one per type) which took  
that buffer as argument, besides the datum to output, and simply put the  
datum inside the buffer, with appropriate transformations (like converting  
to bytea or text), and flushed if needed.


Then the COPY TO BINARY of a constant-size datum would turn to :
- one test for nullness
- one C function call
- one test to ensure appropriate space available in buffer (flush if  
needed)
- one htonl() and memcpy of constant size, which the compiler turns out  
into a couple of simple instructions


I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, but  
still large gains.


Although eliminating fmgr call and palloc overhead was an important part  
of it, another large part was getting rid of memcpy()'s which the compiler  
turned into simple movs for known-size types, a transformation that can be  
done only if the buffer write functions are inlined inside the outfuncs.  
Compilers love constants...


Additionnally, code size growth was minimal since I moved the old outfuncs  
code into the new outfuncs, and replaced the old fmgr-callable outfuncs  
with create buffer with on-full callback=extend_and_repalloc() - pass to  
new outfunc(buffer,datum) - return buffer. Which is basically equivalent  
to the previous palloc()-based code, maybe with a few extra instructions.


When I submitted the patch for review, Tom rightfully pointed out that my  
way of obtaining the C function pointer sucked very badly (I don't  
remember how I did it, only that it was butt-ugly) but the idea was to get  
a quick measurement of what could be gained, and the result was positive.  
Unfortunately I had no time available to finish it and make it into a real  
patch, I'm sorry about that.


So why do I post in this sorting topic ? It seems, by bypassing fmgr for  
functions which are small, simple, and called lots of times, there is a  
large gain to be made, not only because of fmgr overhead but also because  
of the opportunity for new compiler optimizations, palloc removal, etc.  
However, in my experiment the arguments and return types of the new  
functions were DIFFERENT from the old functions : the new ones do the same  
thing, but in a different manner. One manner was suited to sql-callable  
functions (ie, palloc and return a bytea) and another one to writing large  
amounts of data (direct buffer write). Since both have very different  
requirements, being fast at both is impossible for the same function.


Anyway, all that rant boils down to :

Some functions could benefit having two versions (while sharing almost all  
the code between them) :

- User-callable (fmgr) version (current one)
- C-callable version, usually with different parameters and return type

And it would be cool to have a way to grab a bare function pointer on the  
second one.


Maybe an extra column in pg_proc would do (but then, the proargtypes and  
friends would describe only the sql-callable version) ?

Or an extra table ? pg_cproc ?
Or an in-memory hash : hashtable[ fmgr-callable function ] = C version
- What happens if a C function has no SQL-callable equivalent ?
Or (ugly) introduce an extra per-type function type_get_function_ptr(  
function_kind ) which returns the requested function ptr


If one of those happens, I'll dust off my old copy-optimization patch ;)

Hmm... just my 2c

Regards
Pierre

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pessimal trivial-update performance

2010-07-05 Thread Pierre C



The problem can generally be written as tuples seeing multiple
updates in the same transaction?

I think that every time PostgreSQL is used with an ORM, there is
a certain amount of multiple updates taking place. I have actually
been reworking clientside to get around multiple updates, since they
popped up in one of my profiling runs. Allthough the time I optimized
away ended being both roundtrip time + update time, but having
the database do half of it transparently, might have been sufficient
to get me to have had a bigger problem elsewhere..

To sum up. Yes I think indeed it is a real-world case.

Jesper


On the Python side, elixir and sqlalchemy have an excellent way of  
handling this, basically when you start a transaction, all changes are  
accumulated in a session object and only flushed to the database on  
session commit (which is also generally the transaction commit). This has  
multiple advantages, for instance it is able to issue multiple-line  
statements, updates are only done once, you save a lot of roundtrips, etc.  
Of course it is most of the time not compatible with database triggers, so  
if there are triggers the ORM needs to be told about them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] streaming replication breaks horribly if master crashes

2010-06-16 Thread Pierre C



The real problem here is that we're sending records to the slave which
might cease to exist on the master if it unexpectedly reboots.  I
believe that what we need to do is make sure that the master only
sends WAL it has already fsync'd


How about this :

- pg records somewhere the xlog position of the last record synced to  
disk. I dont remember the variable name, let's just say xlog_synced_recptr
- pg always writes the xlog first, ie. before writing any page it checks  
that the page's xlog recptr  xlog_synced_recptr and if it's not the case  
it has to wait before it can write the page.


Now :

- master sends messages to slave with the xlog_synced_recptr after each  
fsync

- slave gets these messages and records the master_xlog_synced_recptr
- slave doesn't write any page to disk until BOTH the slave's local WAL  
copy AND the master's WAL have reached the recptr of this page


If a master crashes or the slave loses connection, then the in-memory  
pages of the slave could be in a state that is in the future compared to  
the master's state when it comes up.


Therefore when a slave detects that the master has crashed, it could shoot  
itself and recover from WAL, at which point the slave will not be in the  
future anymore from the master, rather it would be in the past, which is  
a lot less problematic...


Of course this wouldn't speed up the failover process !...


I think we should also change the slave to panic and shut down
immediately if its xlog position is ahead of the master.  That can
never be a watertight solution because you can always advance the xlog
position on them master and mask the problem.  But I think we should
do it anyway, so that we at least have a chance of noticing that we're
hosed.  I wish I could think of something a little more watertight...


If a slave is in the future relative to the master, then the only way to  
keep using this slave could be to make it the new master...



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

2010-06-08 Thread Pierre C


The linux kernel also uses it when it's availabe, see e.g.  
http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/crypto/crc32c-intel.c


If you guys are interested I have a Core i7 here, could run a little  
benchmark.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

2010-06-07 Thread Pierre C

On Sunday 30 May 2010 18:29:31 Greg Stark wrote:

On Sun, May 30, 2010 at 4:54 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I read through that thread and couldn't find much discussion of
 alternative CRC implementations --- we spent all our time on arguing
 about whether we needed 64-bit CRC or not.


SSE4.2 has a hardware CRC32 instruction, this might be interesting to  
use...


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel pg_dump for 9.1

2010-03-30 Thread Pierre C
On Tue, 30 Mar 2010 13:01:54 +0200, Peter Eisentraut pete...@gmx.net  
wrote:



On tis, 2010-03-30 at 08:39 +0200, Stefan Kaltenbrunner wrote:

on fast systems pg_dump is completely CPU bottlenecked


Might be useful to profile why that is.  I don't think pg_dump has
historically been developed with CPU efficiency in mind.


Already done that (I had posted some WIP patches to speed up COPY,  
hopefully I'll have time to finish those one day ;)
Most of the time spent in the postmaster process during COPY TO is in  
the datum - string functions.




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL

2010-03-08 Thread Pierre C



My opinion is that PostgreSQL should accept any MySQL syntax and return
warnings. I believe that we should access even innodb syntax and turn it
immediately into PostgreSQL tables. This would allow people with no
interest in SQL to migrate from MySQL to PostgreSQL without any harm.


A solution would be a SQL proxy (a la pgpool) with query rewriting.


PHP developers don't have time to invest in learning deep SQL.


This is true, and it is a big problem IMHO. It results in lots of slow,  
broken, insecure database designs.


ALL the web apps that I've done CPR ressuscitation on follow the same  
schema :

- devs are database noobs
- generous use of MyISAM
- numerous queries, most of them unoptimized and/or useless
- use of Apache/mod_php instead of fastcgi
- sometimes, use of a huge slow bloated CMS/framework which issues even  
more unoptimized and/or useless SQL queries

- site gains popularity
- huge traffic takes an unprepared team by surprise (never heard of stuff  
like concurrency or scaling)

- site fails horribly

That said, I've got a 150.000+ members forum running on MySQL with sub 5  
ms page times on a low-end server, it works if you do it right.


Most opensource PHP apps developers have to expend lots of efforts to work  
on MyISAM that doesn't support foreign keys or constraints.
If those resources could be directed to useful work instead of wasted like  
this, the result would be a lot better.
The irony is that even with all that effort, you can't make a web app work  
without transactions, sooner or later your database integrity will fail.


My theory on this is simple :

- PHP is a very weak language, not suited to implementation of really  
useful frameworks (unlike Python / Ruby)
example : Find an ORM for PHP that is as good as sqlalchemy. It does not  
exist, because it is impossible to do.
- really smart programmers dislike PHP because it is a pretty weak  
language, so they all flee to Python, Ruby, etc
All big PHP applications turn into a huge usine à gaz, impossible to  
understand code, because of language weakness.
- really smart DBAs dislike MySQL (unless they have a nice paying job at  
facebook or flickr)


So, it is very difficult to find good PHP developers, and especially with  
database knowledge.



IMHO, PostgreSQL has to be more flexible (in
psychological terms) to understand MySQL user needs and answer them,
just to give them a choice to migrate to PostgreSQL.


Problem is, as you mentioned above, most PHP developers don't know what  
their needs are because they have little database expertise.


About stuff MySQL does that I would like postgres to implement, I'd focus  
more on features, not syntax :


- some form of index-only scans or equivalent (visibility map would  
probably suffice)
- some form of INSERT ON DUPLICATE KEY UPDATE or equivalent (merge...)  
where the DB, not me, takes care of concurrency
- some way to SELECT a,b,c,d GROUP BY a when it can be determined that  
it is equivalent to GROUP BY a,b,c,d, ie a is UNIQUE NOT NULL
- index skip scans (well, MySQL doesn't really do index skip scans, but  
since it can do index-only scans, it's an approximation)

- simpler syntax for DELETEs using JOINs

And while I'm at it, I'll add my pet feature :

An extremely fast form of temporary storage.

Table main is referenced by tables child1, child2, ... childN

- SELECT ... FROM main
  WHERE (very complex condition involving gist coordinates search etc)
  ORDER BY

Then I want the rows from child tables which reference those results.
If I add a lot of JOINs to my query, it's entirely possible that the (very  
complex condition involving gist coordinates search etc) is mis-estimated  
. This is generally not a problem since it usually uses bitmap index scans  
which can survive lots of abuse. However it causes mis-planning of the  
JOINs which is a problem.


Besides, some of the child tables have few rows, but lots of columns, so  
it complicates the query and returns many times the same data, which the  
ORM doesn't care about since it would rather instanciate 1 object per  
referenced table row instead of 1 object per main table row.


I would like to do :

CREATE TEMP TABLE foo AS SELECT ... FROM main
  WHERE (very complex condition involving gist coordinates search etc);

ANALYZE foo;
SELECT * FROM foo ORDER BY ...
SELECT c.* FROM foo JOIN child1 ON (...)
SELECT c.* FROM foo JOIN child2 ON (...)

etc

This splits the query into much easier to manage fragments, and the  
results are easier to use, too.

I can store in the application only 1 object per child table row.
But I can't do this because it causes an update of system catalogs (slow,  
iowait, and bloat).


Basically it would be nice to have something (temp table, cursor, CTE,  
tuplestore, whatever) that can hold a short-lived result set, can be used  
like a table, can have accurate statistics, and can be used in several  
queries, without disk writes.


Note this would completely solve the 

Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL

2010-03-08 Thread Pierre C

As far as I can tell, we already do index skip scans:


This feature is great but I was thinking about something else, like SELECT  
DISTINCT, which currently does a seq scan, even if x is indexed.


Here is an example. In both cases it could use the index to skip all  
non-interesting rows, pulling only 69 rows from the heap instead of 120K.


EXPLAIN ANALYZE SELECT DISTINCT vente, type_id FROM annonces;
  QUERY PLAN
---
 HashAggregate  (cost=15270.98..15271.82 rows=84 width=3) (actual  
time=113.277..113.288 rows=69 loops=1)
   -  Seq Scan on annonces  (cost=0.00..14682.32 rows=117732 width=3)  
(actual time=0.005..76.069 rows=119655 loops=1)



EXPLAIN ANALYZE SELECT DISTINCT ON( vente, type_id ) * FROM annonces;
   QUERY  
PLAN


 Unique  (cost=0.00..34926.90 rows=84 width=1076) (actual  
time=0.019..107.318 rows=69 loops=1)
   -  Index Scan using annonces_type on annonces  (cost=0.00..34338.24  
rows=117732 width=1076) (actual time=0.017..52.982 rows=119655 loops=1)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL

2010-03-08 Thread Pierre C



Oh, this is what I believe MySQL calls loose index scans.  I'm


Exactly :
http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html


actually looking into this as we speak,


Great ! Will it support the famous top-n by category ?


but there seems to be a
non-trivial amount of work to be done in order for this to work.


Regards,
Marko Tiikkaja



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL

2010-03-08 Thread Pierre C



So, if php dev doesn't have time to learn to do things right then we
have to find time to learn to do things wrong? seems like a nosense
argument to me


The best ever reply I got from phpBB guys on I don't remember which  
question was :


WE DO IT THIS WAY BECAUSE WE WANT TO SUPPORT MYSQL 3.x

You can frame this and put it on your wall.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-18 Thread Pierre C

On Tue, 16 Feb 2010 15:22:00 +0100, Greg Stark gsst...@mit.edu wrote:


There's a second problem though. We don't actually know how long any
given query is going to take to plan or execute. We could just
remember how long it took to plan and execute last time or how long it
took to plan last time and the average execution time since we cached
that plan. Perhaps we should track the stddev of the execution plan,
or the max execution time of the plan? Ie there are still unanswered
questions about the precise heuristic to use but I bet we can come up
with something reasonable.


This could be an occasion to implement plan caching...

Web 2.0 = AJAX means less need for heavy webpage reloads with (usually)  
lots of queries, and more small simple queries like selects returning 1 or  
a few rows every time the user clicks on something.


See benchmark here : (PG 8.4.2, MYSQL 5.1.37)
http://purity.bobfuck.net/posts/postgres/2010-02-Prep/

If prepared statements are used, MySQL is not faster for small, simple  
selects...
However, when not using prepared statements, most of the postmaster CPU  
time is spent parsing  planning.


Problem with prepared statements is they're a chore to use in web apps,  
especially PHP, since after grabbing a connection from the pool, you don't  
know if it has prepared plans in it or not.


The postmaster could keep a hash of already prepared plans, using the  
$-parameterized query as a hash key, and when it receives parse+bind  
message, look up in this cache and fetch plans for the query, avoiding  
planning entirely.


This could be done by the connection pooler too, but it doesn't have the  
information to decide wether it's wise to cache a plan or not.


Of course all the subtility is to determine if the plan is reusable with  
other parameters...


- after planning and executing the query, only cache it if the plan time  
is a significant part of the query time (as said previously).

- only simple queries should be automatically cached like this
- perhaps some measure of plan volatility ? For the examples I give in  
the link above, it's quite easy at least in 2 of the cases : searching  
UNIQUE columns can't return more than 1 row, so volatility is zero. It  
only depends on the table size.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-18 Thread Pierre C
On Thu, 18 Feb 2010 16:09:42 +0100, Dimitri Fontaine  
dfonta...@hi-media.com wrote:



Pierre C li...@peufeu.com writes:

Problem with prepared statements is they're a chore to use in web apps,
especially PHP, since after grabbing a connection from the pool, you  
don't

know if it has prepared plans in it or not.


Have you met preprepare yet?

  http://preprepare.projects.postgresql.org/README.html
  http://packages.debian.org/source/sid/preprepare

Regards,


Hey, this thing is nice.
How hard would it be to put a hook in pg so that, instead of raising an  
error and cancelling the txn when EXECUTing a statement that is not  
prepared, it would call a user function (of the user's choice) which  
would, if possible, prepare said statement, or if not, raise the error ?


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Avoiding bad prepared-statement plans.

2010-02-18 Thread Pierre C



What about catching the error in the application and INSERT'ing into the
current preprepare.relation table? The aim would be to do that in dev or
in pre-prod environments, then copy the table content in production.


Yep, but it's a bit awkward and time-consuming, and not quite suited to  
ORM-generated requests since you got to generate all the plan names, when  
the SQL query itself would be the most convenient unique identifier...


A cool hack would be something like that :

pg_execute( SELECT ..., arguments... )

By inserting a hook which calls a user-specified function on non-existing  
plan instead of raising an error, this could work.
However, this wouldn't work as-is since the plan name must be =  
NAMEDATALEN, but you get the idea ;)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers