[PERFORM] Index creation time and distribution
Hi -performance, I experienced this morning a performance problem when we imported a dump in a 8.1 database. The table is 5 millions rows large and when the dump creates an index on a specific text column called clazz it takes 27 minutes while on the other columns, it only takes a couple of seconds: LOG: duration: 1636301.317 ms statement: CREATE INDEX index_journal_clazz ON journal USING btree (clazz); LOG: duration: 20613.009 ms statement: CREATE INDEX index_journal_date ON journal USING btree (date); LOG: duration: 10653.290 ms statement: CREATE INDEX index_journal_modifieur ON journal USING btree (modifieur); LOG: duration: 15031.579 ms statement: CREATE INDEX index_journal_objectid ON journal USING btree (objectid); The only weird thing about this column is that 4.7 millions of rows have the exact same value. A partial index excluding this value is really fast to create but, as the database is used via JDBC and prepared statements, this index is totally useless (the plan is created before the BIND so it can't use the partial index). FWIW we can't use ?protocolVersion=2 with this application so it's not an option. As part of the deployment process of this application, we often need to drop/create/restore the database and 25 minutes is really longer than we can afford. So my questions are: - is the index creation time so correlated with the distribution? I was quite surprised by this behaviour. The time is essentially CPU time. - if not, what can I check to diagnose this problem? - IIRC, 8.3 could allow me to use the partial index as the query should be planned after the BIND (plans are unnamed). Am I right? Thanks for any input. -- Guillaume -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
"Guillaume Smet" <[EMAIL PROTECTED]> writes: > I experienced this morning a performance problem when we imported a > dump in a 8.1 database. > The table is 5 millions rows large and when the dump creates an index > on a specific text column called clazz it takes 27 minutes while on > the other columns, it only takes a couple of seconds: > The only weird thing about this column is that 4.7 millions of rows > have the exact same value. Do you have maintenance_work_mem set large enough that the index creation sort is done in-memory? 8.1 depends on the platform's qsort and a lot of them are kinda pessimal for input like this. 8.2 (which uses our own qsort) seems to perform better in a quick test. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
On Thu, May 22, 2008 at 3:14 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Do you have maintenance_work_mem set large enough that the index > creation sort is done in-memory? 8.1 depends on the platform's qsort > and a lot of them are kinda pessimal for input like this. FWIW, it's a 32 bits CentOS 4.6 box. maintenance_work_mem is set to 256 MB and the size of the index is 400 MB. Should I try to raise it up to 512 MB? The server only has 2GB of RAM so it seems a bit high. > 8.2 (which uses our own qsort) seems to perform better in a quick > test. Mmmmh OK. I was considering an upgrade to 8.3 in the next months anyway. Do we agree that in the case of unnamed prepared statement, 8.3 plans the query after the BIND? The partial index seems to be a better solution anyway, considering that it's 12 MB vs 400 MB. Thanks. -- Guillaume -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
On Thu, 22 May 2008, Tom Lane wrote: Do you have maintenance_work_mem set large enough that the index creation sort is done in-memory? 8.1 depends on the platform's qsort and a lot of them are kinda pessimal for input like this. Looking at the fact that other indexes on the same table are created quickly, it seems that the maintenance_work_mem isn't the issue - the sort algorithm is. Having lots of elements the same value is a worst-case-scenario for a naive quicksort. I am in the middle of testing sorting algorithms for a performance lecture I'm going to give, and one of the best algorithms I have seen yet is that used in Java's java.util.Arrays.sort(). I haven't been able to beat it with any other comparison sort yet (although I have beaten it with a bucket sort, but I wouldn't recommend such an algorithm for a database). From the JavaDoc: The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance. Matthew -- First law of computing: Anything can go wro sig: Segmentation fault. core dumped. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
On Thu, May 22, 2008 at 6:32 AM, Guillaume Smet <[EMAIL PROTECTED]> wrote: > Hi -performance, > > > LOG: duration: 1636301.317 ms statement: CREATE INDEX > index_journal_clazz ON journal USING btree (clazz); > LOG: duration: 20613.009 ms statement: CREATE INDEX > index_journal_date ON journal USING btree (date); Just curious, what happens if you create the date index first, then the clazz one? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
On Thu, May 22, 2008 at 6:50 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote: > Just curious, what happens if you create the date index first, then > the clazz one? It's not due to any cache effect if it's your question. It's mostly CPU time and changing the order doesn't change the behaviour. I'll make some tests with 8.3 in a few weeks (I'll be out of town next week) to see if using PostgreSQL qsort reduces the problem. -- Guillaume -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
"Guillaume Smet" <[EMAIL PROTECTED]> writes: > On Thu, May 22, 2008 at 3:14 PM, Tom Lane <[EMAIL PROTECTED]> wrote: >> Do you have maintenance_work_mem set large enough that the index >> creation sort is done in-memory? 8.1 depends on the platform's qsort >> and a lot of them are kinda pessimal for input like this. > maintenance_work_mem is set to 256 MB and the size of the index is 400 MB. > Should I try to raise it up to 512 MB? The server only has 2GB of RAM > so it seems a bit high. Hmm, that's most likely not going to be enough to get it to do an in-memory sort ... try turning on trace_sort to see. But anyway, if you are in the on-disk sort regime, 8.3 is only going to be marginally faster for such a case --- it's going to have to write all the index entries out and read 'em back in anyway. >> 8.2 (which uses our own qsort) seems to perform better in a quick >> test. > Mmmmh OK. I was considering an upgrade to 8.3 in the next months anyway. > Do we agree that in the case of unnamed prepared statement, 8.3 plans > the query after the BIND? The partial index seems to be a better > solution anyway, considering that it's 12 MB vs 400 MB. Ermm .. this is in fact mostly broken in 8.3.0 and 8.3.1. If you don't want to wait for 8.3.2, you need this patch: http://archives.postgresql.org/pgsql-committers/2008-03/msg00566.php regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index creation time and distribution
On Thu, May 22, 2008 at 9:18 PM, Tom Lane <[EMAIL PROTECTED]> wrote: > Ermm .. this is in fact mostly broken in 8.3.0 and 8.3.1. If you don't > want to wait for 8.3.2, you need this patch: > http://archives.postgresql.org/pgsql-committers/2008-03/msg00566.php That's what I had in mind. We have to test a lot of things before even considering an upgrade so that's not really a problem for us to wait for 8.3.2. Thanks. -- Guillaume -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] "Big O" notation for postgres?
"Richard Huxton" <[EMAIL PROTECTED]> writes: > Jonah H. Harris wrote: >> On Wed, May 21, 2008 at 10:10 AM, H. Hall <[EMAIL PROTECTED]> wrote: >>> Does anyone know if there is a source that provides "Big O" notation for >>> postgres's aggregate functions and operations? For example is count(*) = >>> O(1) or O(n)? >> >> I don't know of any document containing the complexity of each >> aggregate, but it's sometimes left as a comment in the souce code. > > Recent max() and min() can be O(n) or O(1) depending on the where-clause and > presence of an index too, just to muddy the waters. Hm, true. But excluding those special cases all Postgres aggregate functions will be O(n) unless they're doing something very odd. None of the built-in functions (except min/max as noted) do anything odd like that. The reason way is because of the basic design of Postgres aggregate functions. They are fed every tuple one by one and have to keep their state in a single variable. Most of the aggregate functions like count(*) etc just keep a static non-growing state and the state transition function is a simple arithmetic operation which is O(1). So the resulting operation is O(n). Actually one exception would be something like CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}'); Since the state variable has to keep accumulating more and more stuff the array_append becomes more and more expensive (it has to generate a new array so it has to copy the existing stuff). So actually it woul dbe O(n^2). The only builtin aggregate which looks like it falls in this category would be xmlagg() -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O on select count(*)
Hi Hannu, Interesting suggestion on the partial index! I'll find out if we can extract our code that did the work. It was simple but scattered in a few routines. In concept it worked like this: 1 - Ignore if hint bits are unset, use them if set. This affects heapam and vacuum I think. 2 - implement a cache for clog lookups based on the optimistic assumption that the data was inserted in bulk. Put the cache one call away from heapgetnext() I forget the details of (2). As I recall, if we fall off of the assumption, the penalty for long scans get large-ish (maybe 2X), but since when do people full table scan when they're updates/inserts are so scattered across TIDs? It's an obvious big win for DW work. We also have a GUC to turn it off if needed, in which case a vacuum will write the hint bits. - Luke - Original Message - From: Hannu Krosing <[EMAIL PROTECTED]> To: Luke Lonergan Cc: Pavan Deolasee <[EMAIL PROTECTED]>; Greg Smith <[EMAIL PROTECTED]>; Alvaro Herrera <[EMAIL PROTECTED]>; pgsql-performance@postgresql.org Sent: Thu May 22 12:10:02 2008 Subject: Re: [PERFORM] I/O on select count(*) On Thu, 2008-05-15 at 10:52 +0800, Luke Lonergan wrote: > BTW – we’ve removed HINT bit checking in Greenplum DB and improved the > visibility caching which was enough to provide performance at the same > level as with the HINT bit optimization, but avoids this whole “write > the data, write it to the log also, then write it again just for good > measure” behavior. > > For people doing data warehousing work like the poster, this Postgres > behavior is miserable. It should be fixed for 8.4 for sure > (volunteers?) I might try it. I think I have told you about my ideas ;) I plan to first do "cacheing" (for being able to doi index only scans among other things) and then if the cache works reliably, use the "cacheing" code as the main visibility / MVCC mechanism. Is Greenplums code available, or should I roll my own ? > BTW – for the poster’s benefit, you should implement partitioning by > date, then load each partition and VACUUM ANALYZE after each load. > You probably won’t need the date index anymore – so your load times > will vastly improve (no indexes), you’ll store less data (no indexes) > and you’ll be able to do simpler data management with the partitions. > > You may also want to partition AND index if you do a lot of short > range selective date predicates. Example would be: partition by day, > index on date field, queries selective on date ranges by hour will > then select out only the day needed, then index scan to get the > hourly values. If your queries allow it, you may try indexing on int2::extract('HOUR' from date) so the index may be smaller storing the date as type abstime is another way to reduce index size. > Typically time-oriented data is nearly time sorted anyway, so you’ll > also get the benefit of a clustered index. Hannu