Re: [HACKERS] Multithread Query Planner
Not to mention palloc, another extremely fundamental and non-reentrant subsystem. Possibly we could work on making all that stuff re-entrant, but it would be a huge amount of work for a distant and uncertain payoff. Right. I think it makes more sense to try to get parallelism working first with the infrastructure we have. Converting to use threading, if we ever do it at all, should be something we view as a later performance optimization. But I suspect we won't want to do it anyway; I think there will be easier ways to get where we want to be. Multithreading got fashionable with the arrival of the Dual-core CPU a few years ago. However, multithreading as it is used currently has a huge problem : usually, threads share all of their memory. This opens the door to an infinite number of hard to find bugs, and more importantly, defeats the purpose. "Re-entrant palloc()" is nonsense. Suppose you can make a reentrant palloc() which scales OK at 2 threads thanks to a cleverly placed atomic instruction. How is it going to scale on 64 cores ? On HP's new 1000-core ARM server with non-uniform memory access ? Probably it would suck very very badly... not to mention the horror of multithreaded exception-safe deallocation when 1 thread among many blows up on an error... For the ultimate in parallelism, ask a FPGA guy. Is he using shared memory to wire together his 12000 DSP blocks ? Nope, he's using isolated Processes which share nothing and communicate through FIFOs and hardware message passing. Like shell pipes, basically. Or Erlang. Good parallelism = reduce shared state and communicate through data/message channels. Shared-everything multithreading is going to be in a lot of trouble on future many-core machines. Incidentally, Postgres, with its Processes, sharing only what is needed, has a good head start... With more and more cores coming, you guys are going to have to fight to reduce the quantity of shared state between processes, not augment it by using shared memory threads !... Say you want to parallelize sorting. Sorting is a black-box with one input data pipe and one output data pipe. Data pipes are good for parallelism, just like FIFOs. FPGA guys love black boxes with FIFOs between them. Say you manage to send tuples through a FIFO like zeromq. Now you can even run the sort on another machine and allow it to use all the RAM if you like. Now split the black box in two black boxes (qsort and merge), instanciate as many qsort boxes as necessary, and connect that together with pipes. Run some boxes on some of this machine's cores, some other boxes on another machine, etc. That would be very flexible (and scalable). Of course the black box has a small backdoor : some comparison functions can access shared state, which is basically *the* issue (not reentrant stuff, which you do not need). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On Tue, 16 Feb 2010 15:22:00 +0100, Greg Stark wrote: There's a second problem though. We don't actually know how long any given query is going to take to plan or execute. We could just remember how long it took to plan and execute last time or how long it took to plan last time and the average execution time since we cached that plan. Perhaps we should track the stddev of the execution plan, or the max execution time of the plan? Ie there are still unanswered questions about the precise heuristic to use but I bet we can come up with something reasonable. This could be an occasion to implement plan caching... Web 2.0 = AJAX means less need for heavy webpage reloads with (usually) lots of queries, and more small simple queries like selects returning 1 or a few rows every time the user clicks on something. See benchmark here : (PG 8.4.2, MYSQL 5.1.37) http://purity.bobfuck.net/posts/postgres/2010-02-Prep/ If prepared statements are used, MySQL is not faster for "small, simple selects"... However, when not using prepared statements, most of the postmaster CPU time is spent parsing & planning. Problem with prepared statements is they're a chore to use in web apps, especially PHP, since after grabbing a connection from the pool, you don't know if it has prepared plans in it or not. The postmaster could keep a hash of already prepared plans, using the $-parameterized query as a hash key, and when it receives parse+bind message, look up in this cache and fetch plans for the query, avoiding planning entirely. This could be done by the connection pooler too, but it doesn't have the information to decide wether it's wise to cache a plan or not. Of course all the subtility is to determine if the plan is reusable with other parameters... - after planning and executing the query, only cache it if the plan time is a significant part of the query time (as said previously). - only simple queries should be automatically cached like this - perhaps some measure of "plan volatility" ? For the examples I give in the link above, it's quite easy at least in 2 of the cases : searching UNIQUE columns can't return more than 1 row, so volatility is zero. It only depends on the table size. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
On Thu, 18 Feb 2010 16:09:42 +0100, Dimitri Fontaine wrote: "Pierre C" writes: Problem with prepared statements is they're a chore to use in web apps, especially PHP, since after grabbing a connection from the pool, you don't know if it has prepared plans in it or not. Have you met preprepare yet? http://preprepare.projects.postgresql.org/README.html http://packages.debian.org/source/sid/preprepare Regards, Hey, this thing is nice. How hard would it be to put a hook in pg so that, instead of raising an error and cancelling the txn when EXECUTing a statement that is not prepared, it would call a user function (of the user's choice) which would, if possible, prepare said statement, or if not, raise the error ? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding bad prepared-statement plans.
What about catching the error in the application and INSERT'ing into the current preprepare.relation table? The aim would be to do that in dev or in pre-prod environments, then copy the table content in production. Yep, but it's a bit awkward and time-consuming, and not quite suited to ORM-generated requests since you got to generate all the plan names, when the SQL query itself would be the most convenient "unique identifier"... A cool hack would be something like that : pg_execute( "SELECT ...", arguments... ) By inserting a hook which calls a user-specified function on non-existing plan instead of raising an error, this could work. However, this wouldn't work as-is since the plan name must be <= NAMEDATALEN, but you get the idea ;) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL
My opinion is that PostgreSQL should accept any MySQL syntax and return warnings. I believe that we should access even innodb syntax and turn it immediately into PostgreSQL tables. This would allow people with no interest in SQL to migrate from MySQL to PostgreSQL without any harm. A solution would be a SQL proxy (a la pgpool) with query rewriting. PHP developers don't have time to invest in learning deep SQL. This is true, and it is a big problem IMHO. It results in lots of slow, broken, insecure database designs. ALL the web apps that I've done "CPR ressuscitation" on follow the same schema : - devs are database noobs - generous use of MyISAM - numerous queries, most of them unoptimized and/or useless - use of Apache/mod_php instead of fastcgi - sometimes, use of a huge slow bloated CMS/"framework" which issues even more unoptimized and/or useless SQL queries - site gains popularity - huge traffic takes an unprepared team by surprise (never heard of stuff like concurrency or scaling) - site fails horribly That said, I've got a 150.000+ members forum running on MySQL with sub 5 ms page times on a low-end server, it works if you do it right. Most opensource PHP apps developers have to expend lots of efforts to work on MyISAM that doesn't support foreign keys or constraints. If those resources could be directed to useful work instead of wasted like this, the result would be a lot better. The irony is that even with all that effort, you can't make a web app work without transactions, sooner or later your database integrity will fail. My theory on this is simple : - PHP is a very weak language, not suited to implementation of really useful frameworks (unlike Python / Ruby) example : Find an ORM for PHP that is as good as sqlalchemy. It does not exist, because it is impossible to do. -> really smart programmers dislike PHP because it is a pretty weak language, so they all flee to Python, Ruby, etc All big PHP applications turn into a huge "usine à gaz", impossible to understand code, because of language weakness. - really smart DBAs dislike MySQL (unless they have a nice paying job at facebook or flickr) So, it is very difficult to find good PHP developers, and especially with database knowledge. IMHO, PostgreSQL has to be more flexible (in psychological terms) to understand MySQL user needs and answer them, just to give them a choice to migrate to PostgreSQL. Problem is, as you mentioned above, most PHP developers don't know what their "needs" are because they have little database expertise. About stuff MySQL does that I would like postgres to implement, I'd focus more on features, not syntax : - some form of index-only scans or equivalent (visibility map would probably suffice) - some form of INSERT ON DUPLICATE KEY UPDATE or equivalent (merge...) where the DB, not me, takes care of concurrency - some way to "SELECT a,b,c,d GROUP BY a" when it can be determined that it is equivalent to "GROUP BY a,b,c,d", ie a is UNIQUE NOT NULL - index skip scans (well, MySQL doesn't really do index skip scans, but since it can do index-only scans, it's an approximation) - simpler syntax for DELETEs using JOINs And while I'm at it, I'll add my pet feature : An extremely fast form of temporary storage. Table main is referenced by tables child1, child2, ... childN - SELECT ... FROM main WHERE (very complex condition involving gist coordinates search etc) ORDER BY Then I want the rows from child tables which reference those results. If I add a lot of JOINs to my query, it's entirely possible that the (very complex condition involving gist coordinates search etc) is mis-estimated . This is generally not a problem since it usually uses bitmap index scans which can survive lots of abuse. However it causes mis-planning of the JOINs which is a problem. Besides, some of the child tables have few rows, but lots of columns, so it complicates the query and returns many times the same data, which the ORM doesn't care about since it would rather instanciate 1 object per referenced table row instead of 1 object per main table row. I would like to do : CREATE TEMP TABLE foo AS SELECT ... FROM main WHERE (very complex condition involving gist coordinates search etc); ANALYZE foo; SELECT * FROM foo ORDER BY ... SELECT c.* FROM foo JOIN child1 ON (...) SELECT c.* FROM foo JOIN child2 ON (...) etc This splits the query into much easier to manage fragments, and the results are easier to use, too. I can store in the application only 1 object per child table row. But I can't do this because it causes an update of system catalogs (slow, iowait, and bloat). Basically it would be nice to have "something" (temp table, cursor, CTE, tuplestore, whatever) that can hold a short-lived result set, can be used like a table, can have accurate statistics, and can be used in several queries, without disk writes. Note this would completely sol
Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL
As far as I can tell, we already do index skip scans: This feature is great but I was thinking about something else, like SELECT DISTINCT, which currently does a seq scan, even if x is indexed. Here is an example. In both cases it could use the index to skip all non-interesting rows, pulling only 69 rows from the heap instead of 120K. EXPLAIN ANALYZE SELECT DISTINCT vente, type_id FROM annonces; QUERY PLAN --- HashAggregate (cost=15270.98..15271.82 rows=84 width=3) (actual time=113.277..113.288 rows=69 loops=1) -> Seq Scan on annonces (cost=0.00..14682.32 rows=117732 width=3) (actual time=0.005..76.069 rows=119655 loops=1) EXPLAIN ANALYZE SELECT DISTINCT ON( vente, type_id ) * FROM annonces; QUERY PLAN Unique (cost=0.00..34926.90 rows=84 width=1076) (actual time=0.019..107.318 rows=69 loops=1) -> Index Scan using annonces_type on annonces (cost=0.00..34338.24 rows=117732 width=1076) (actual time=0.017..52.982 rows=119655 loops=1) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL
Oh, this is what I believe MySQL calls "loose index scans". I'm Exactly : http://dev.mysql.com/doc/refman/5.0/en/loose-index-scan.html actually looking into this as we speak, Great ! Will it support the famous "top-n by category" ? but there seems to be a non-trivial amount of work to be done in order for this to work. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SQL compatibility reminder: MySQL vs PostgreSQL
So, if php dev doesn't have time to learn to do things right then we have to find time to learn to do things wrong? seems like a nosense argument to me The best ever reply I got from phpBB guys on I don't remember which question was : "WE DO IT THIS WAY BECAUSE WE WANT TO SUPPORT MYSQL 3.x" You can frame this and put it on your wall. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel pg_dump for 9.1
On Tue, 30 Mar 2010 13:01:54 +0200, Peter Eisentraut wrote: On tis, 2010-03-30 at 08:39 +0200, Stefan Kaltenbrunner wrote: on fast systems pg_dump is completely CPU bottlenecked Might be useful to profile why that is. I don't think pg_dump has historically been developed with CPU efficiency in mind. Already done that (I had posted some WIP patches to speed up COPY, hopefully I'll have time to finish those one day ;) Most of the time spent in the postmaster process during "COPY TO" is in the datum -> string functions. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
On Sunday 30 May 2010 18:29:31 Greg Stark wrote: On Sun, May 30, 2010 at 4:54 AM, Tom Lane wrote: > I read through that thread and couldn't find much discussion of > alternative CRC implementations --- we spent all our time on arguing > about whether we needed 64-bit CRC or not. SSE4.2 has a hardware CRC32 instruction, this might be interesting to use... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
The linux kernel also uses it when it's availabe, see e.g. http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/arch/x86/crypto/crc32c-intel.c If you guys are interested I have a Core i7 here, could run a little benchmark. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] streaming replication breaks horribly if master crashes
The real problem here is that we're sending records to the slave which might cease to exist on the master if it unexpectedly reboots. I believe that what we need to do is make sure that the master only sends WAL it has already fsync'd How about this : - pg records somewhere the xlog position of the last record synced to disk. I dont remember the variable name, let's just say xlog_synced_recptr - pg always writes the xlog first, ie. before writing any page it checks that the page's xlog recptr < xlog_synced_recptr and if it's not the case it has to wait before it can write the page. Now : - master sends messages to slave with the xlog_synced_recptr after each fsync - slave gets these messages and records the master_xlog_synced_recptr - slave doesn't write any page to disk until BOTH the slave's local WAL copy AND the master's WAL have reached the recptr of this page If a master crashes or the slave loses connection, then the in-memory pages of the slave could be in a state that is "in the future" compared to the master's state when it comes up. Therefore when a slave detects that the master has crashed, it could shoot itself and recover from WAL, at which point the slave will not be "in the future" anymore from the master, rather it would be in the past, which is a lot less problematic... Of course this wouldn't speed up the failover process !... I think we should also change the slave to panic and shut down immediately if its xlog position is ahead of the master. That can never be a watertight solution because you can always advance the xlog position on them master and mask the problem. But I think we should do it anyway, so that we at least have a chance of noticing that we're hosed. I wish I could think of something a little more watertight... If a slave is "in the future" relative to the master, then the only way to keep using this slave could be to make it the new master... -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pessimal trivial-update performance
The problem can generally be written as "tuples seeing multiple updates in the same transaction"? I think that every time PostgreSQL is used with an ORM, there is a certain amount of multiple updates taking place. I have actually been reworking clientside to get around multiple updates, since they popped up in one of my profiling runs. Allthough the time I optimized away ended being both "roundtrip time" + "update time", but having the database do half of it transparently, might have been sufficient to get me to have had a bigger problem elsewhere.. To sum up. Yes I think indeed it is a real-world case. Jesper On the Python side, elixir and sqlalchemy have an excellent way of handling this, basically when you start a transaction, all changes are accumulated in a "session" object and only flushed to the database on session commit (which is also generally the transaction commit). This has multiple advantages, for instance it is able to issue multiple-line statements, updates are only done once, you save a lot of roundtrips, etc. Of course it is most of the time not compatible with database triggers, so if there are triggers the ORM needs to be told about them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Inlining comparators as a performance optimisation
On Wed, 21 Sep 2011 18:13:07 +0200, Tom Lane wrote: Heikki Linnakangas writes: On 21.09.2011 18:46, Tom Lane wrote: The idea that I was toying with was to allow the regular SQL-callable comparison function to somehow return a function pointer to the alternate comparison function, You could have a new function with a pg_proc entry, that just returns a function pointer to the qsort-callback. Yeah, possibly. That would be a much more invasive change, but cleaner in some sense. I'm not really prepared to do all the legwork involved in that just to get to a performance-testable patch though. A few years ago I had looked for a way to speed up COPY operations, and it turned out that COPY TO has a good optimization opportunity. At that time, for each datum, COPY TO would : - test for nullness - call an outfunc through fmgr - outfunc pallocs() a bytea or text, fills it with data, and returns it (sometimes it uses an extensible string buffer which may be repalloc()d several times) - COPY memcpy()s returned data to a buffer and eventually flushes the buffer to client socket. I introduced a special write buffer with an on-flush callback (ie, a close relative of the existing string-buffer), in this case the callback was "flush to client socket", and several outfuncs (one per type) which took that buffer as argument, besides the datum to output, and simply put the datum inside the buffer, with appropriate transformations (like converting to bytea or text), and flushed if needed. Then the COPY TO BINARY of a constant-size datum would turn to : - one test for nullness - one C function call - one test to ensure appropriate space available in buffer (flush if needed) - one htonl() and memcpy of constant size, which the compiler turns out into a couple of simple instructions I recall measuring speedups of 2x - 8x on COPY BINARY, less for text, but still large gains. Although eliminating fmgr call and palloc overhead was an important part of it, another large part was getting rid of memcpy()'s which the compiler turned into simple movs for known-size types, a transformation that can be done only if the buffer write functions are inlined inside the outfuncs. Compilers love constants... Additionnally, code size growth was minimal since I moved the old outfuncs code into the new outfuncs, and replaced the old fmgr-callable outfuncs with "create buffer with on-full callback=extend_and_repalloc() - pass to new outfunc(buffer,datum) - return buffer". Which is basically equivalent to the previous palloc()-based code, maybe with a few extra instructions. When I submitted the patch for review, Tom rightfully pointed out that my way of obtaining the C function pointer sucked very badly (I don't remember how I did it, only that it was butt-ugly) but the idea was to get a quick measurement of what could be gained, and the result was positive. Unfortunately I had no time available to finish it and make it into a real patch, I'm sorry about that. So why do I post in this sorting topic ? It seems, by bypassing fmgr for functions which are small, simple, and called lots of times, there is a large gain to be made, not only because of fmgr overhead but also because of the opportunity for new compiler optimizations, palloc removal, etc. However, in my experiment the arguments and return types of the new functions were DIFFERENT from the old functions : the new ones do the same thing, but in a different manner. One manner was suited to sql-callable functions (ie, palloc and return a bytea) and another one to writing large amounts of data (direct buffer write). Since both have very different requirements, being fast at both is impossible for the same function. Anyway, all that rant boils down to : Some functions could benefit having two versions (while sharing almost all the code between them) : - User-callable (fmgr) version (current one) - C-callable version, usually with different parameters and return type And it would be cool to have a way to grab a bare function pointer on the second one. Maybe an extra column in pg_proc would do (but then, the proargtypes and friends would describe only the sql-callable version) ? Or an extra table ? pg_cproc ? Or an in-memory hash : hashtable[ fmgr-callable function ] => C version - What happens if a C function has no SQL-callable equivalent ? Or (ugly) introduce an extra per-type function type_get_function_ptr( function_kind ) which returns the requested function ptr If one of those happens, I'll dust off my old copy-optimization patch ;) Hmm... just my 2c Regards Pierre -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers