[PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
Hi -performance,

I experienced this morning a performance problem when we imported a
dump in a 8.1 database.

The table is 5 millions rows large and when the dump creates an index
on a specific text column called clazz it takes 27 minutes while on
the other columns, it only takes a couple of seconds:
LOG:  duration: 1636301.317 ms  statement: CREATE INDEX
index_journal_clazz ON journal USING btree (clazz);
LOG:  duration: 20613.009 ms  statement: CREATE INDEX
index_journal_date ON journal USING btree (date);
LOG:  duration: 10653.290 ms  statement: CREATE INDEX
index_journal_modifieur ON journal USING btree (modifieur);
LOG:  duration: 15031.579 ms  statement: CREATE INDEX
index_journal_objectid ON journal USING btree (objectid);

The only weird thing about this column is that 4.7 millions of rows
have the exact same value. A partial index excluding this value is
really fast to create but, as the database is used via JDBC and
prepared statements, this index is totally useless (the plan is
created before the BIND so it can't use the partial index). FWIW we
can't use ?protocolVersion=2 with this application so it's not an
option.

As part of the deployment process of this application, we often need
to drop/create/restore the database and 25 minutes is really longer
than we can afford.

So my questions are:
- is the index creation time so correlated with the distribution? I
was quite surprised by this behaviour. The time is essentially CPU
time.
- if not, what can I check to diagnose this problem?
- IIRC, 8.3 could allow me to use the partial index as the query
should be planned after the BIND (plans are unnamed). Am I right?

Thanks for any input.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Tom Lane
"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> I experienced this morning a performance problem when we imported a
> dump in a 8.1 database.
> The table is 5 millions rows large and when the dump creates an index
> on a specific text column called clazz it takes 27 minutes while on
> the other columns, it only takes a couple of seconds:
> The only weird thing about this column is that 4.7 millions of rows
> have the exact same value.

Do you have maintenance_work_mem set large enough that the index
creation sort is done in-memory?  8.1 depends on the platform's qsort
and a lot of them are kinda pessimal for input like this.

8.2 (which uses our own qsort) seems to perform better in a quick
test.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
On Thu, May 22, 2008 at 3:14 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Do you have maintenance_work_mem set large enough that the index
> creation sort is done in-memory?  8.1 depends on the platform's qsort
> and a lot of them are kinda pessimal for input like this.

FWIW, it's a 32 bits CentOS 4.6 box.

maintenance_work_mem is set to 256 MB and the size of the index is 400 MB.

Should I try to raise it up to 512 MB? The server only has 2GB of RAM
so it seems a bit high.

> 8.2 (which uses our own qsort) seems to perform better in a quick
> test.

Mmmmh OK. I was considering an upgrade to 8.3 in the next months anyway.

Do we agree that in the case of unnamed prepared statement, 8.3 plans
the query after the BIND? The partial index seems to be a better
solution anyway, considering that it's 12 MB vs 400 MB.

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Matthew Wakeling

On Thu, 22 May 2008, Tom Lane wrote:

Do you have maintenance_work_mem set large enough that the index
creation sort is done in-memory?  8.1 depends on the platform's qsort
and a lot of them are kinda pessimal for input like this.


Looking at the fact that other indexes on the same table are created 
quickly, it seems that the maintenance_work_mem isn't the issue - the sort 
algorithm is.


Having lots of elements the same value is a worst-case-scenario for a 
naive quicksort. I am in the middle of testing sorting algorithms for a 
performance lecture I'm going to give, and one of the best algorithms I 
have seen yet is that used in Java's java.util.Arrays.sort(). I haven't 
been able to beat it with any other comparison sort yet (although I have 
beaten it with a bucket sort, but I wouldn't recommend such an algorithm 
for a database).



From the JavaDoc:


The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley 
and M. Douglas McIlroy's "Engineering a Sort Function", 
Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 
1993). This algorithm offers n*log(n) performance on many data sets that 
cause other quicksorts to degrade to quadratic performance.


Matthew

--
First law of computing:  Anything can go wro
sig: Segmentation fault.  core dumped.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Scott Marlowe
On Thu, May 22, 2008 at 6:32 AM, Guillaume Smet
<[EMAIL PROTECTED]> wrote:
> Hi -performance,
>
>
> LOG:  duration: 1636301.317 ms  statement: CREATE INDEX
> index_journal_clazz ON journal USING btree (clazz);
> LOG:  duration: 20613.009 ms  statement: CREATE INDEX
> index_journal_date ON journal USING btree (date);

Just curious, what happens if you create the date index first, then
the clazz one?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
On Thu, May 22, 2008 at 6:50 PM, Scott Marlowe <[EMAIL PROTECTED]> wrote:
> Just curious, what happens if you create the date index first, then
> the clazz one?

It's not due to any cache effect if it's your question. It's mostly
CPU time and changing the order doesn't change the behaviour.

I'll make some tests with 8.3 in a few weeks (I'll be out of town next
week) to see if using PostgreSQL qsort reduces the problem.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Tom Lane
"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> On Thu, May 22, 2008 at 3:14 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> Do you have maintenance_work_mem set large enough that the index
>> creation sort is done in-memory?  8.1 depends on the platform's qsort
>> and a lot of them are kinda pessimal for input like this.

> maintenance_work_mem is set to 256 MB and the size of the index is 400 MB.

> Should I try to raise it up to 512 MB? The server only has 2GB of RAM
> so it seems a bit high.

Hmm, that's most likely not going to be enough to get it to do an
in-memory sort ... try turning on trace_sort to see.  But anyway,
if you are in the on-disk sort regime, 8.3 is only going to be
marginally faster for such a case --- it's going to have to write
all the index entries out and read 'em back in anyway.

>> 8.2 (which uses our own qsort) seems to perform better in a quick
>> test.

> Mmmmh OK. I was considering an upgrade to 8.3 in the next months anyway.

> Do we agree that in the case of unnamed prepared statement, 8.3 plans
> the query after the BIND? The partial index seems to be a better
> solution anyway, considering that it's 12 MB vs 400 MB.

Ermm .. this is in fact mostly broken in 8.3.0 and 8.3.1.  If you don't
want to wait for 8.3.2, you need this patch:
http://archives.postgresql.org/pgsql-committers/2008-03/msg00566.php

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index creation time and distribution

2008-05-22 Thread Guillaume Smet
On Thu, May 22, 2008 at 9:18 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Ermm .. this is in fact mostly broken in 8.3.0 and 8.3.1.  If you don't
> want to wait for 8.3.2, you need this patch:
> http://archives.postgresql.org/pgsql-committers/2008-03/msg00566.php

That's what I had in mind. We have to test a lot of things before even
considering an upgrade so that's not really a problem for us to wait
for 8.3.2.

Thanks.

-- 
Guillaume

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] "Big O" notation for postgres?

2008-05-22 Thread Gregory Stark
"Richard Huxton" <[EMAIL PROTECTED]> writes:

> Jonah H. Harris wrote:
>> On Wed, May 21, 2008 at 10:10 AM, H. Hall <[EMAIL PROTECTED]> wrote:
>>> Does anyone know if there is a source that provides "Big O" notation for
>>> postgres's aggregate functions and operations?  For example is count(*) =
>>> O(1) or O(n)?
>>
>> I don't know of any document containing the complexity of each
>> aggregate, but it's sometimes left as a comment in the souce code.
>
> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
> presence of an index too, just to muddy the waters.

Hm, true. But excluding those special cases all Postgres aggregate functions
will be O(n) unless they're doing something very odd. None of the built-in
functions (except min/max as noted) do anything odd like that.

The reason way is because of the basic design of Postgres aggregate functions.
They are fed every tuple one by one and have to keep their state in a single
variable. Most of the aggregate functions like count(*) etc just keep a static
non-growing state and the state transition function is a simple arithmetic
operation which is O(1). So the resulting operation is O(n).

Actually one exception would be something like

CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, 
INITCOND='{}');

Since the state variable has to keep accumulating more and more stuff the
array_append becomes more and more expensive (it has to generate a new array
so it has to copy the existing stuff). So actually it woul dbe O(n^2).

The only builtin aggregate which looks like it falls in this category would be
xmlagg()

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] I/O on select count(*)

2008-05-22 Thread Luke Lonergan
Hi Hannu,

Interesting suggestion on the partial index!

I'll find out if we can extract our code that did the work.  It was simple but 
scattered in a few routines.

In concept it worked like this:

1 - Ignore if hint bits are unset, use them if set.  This affects heapam and 
vacuum I think.
2 - implement a cache for clog lookups based on the optimistic assumption that 
the data was inserted in bulk.  Put the cache one call away from heapgetnext()

I forget the details of (2).  As I recall, if we fall off of the assumption, 
the penalty for long scans get large-ish (maybe 2X), but since when do people 
full table scan when they're updates/inserts are so scattered across TIDs?  
It's an obvious big win for DW work.

We also have a GUC to turn it off if needed, in which case a vacuum will write 
the hint bits.

- Luke

- Original Message -
From: Hannu Krosing <[EMAIL PROTECTED]>
To: Luke Lonergan
Cc: Pavan Deolasee <[EMAIL PROTECTED]>; Greg Smith <[EMAIL PROTECTED]>; Alvaro 
Herrera <[EMAIL PROTECTED]>; pgsql-performance@postgresql.org 

Sent: Thu May 22 12:10:02 2008
Subject: Re: [PERFORM] I/O on select count(*)

On Thu, 2008-05-15 at 10:52 +0800, Luke Lonergan wrote:
> BTW – we’ve removed HINT bit checking in Greenplum DB and improved the
> visibility caching which was enough to provide performance at the same
> level as with the HINT bit optimization, but avoids this whole “write
> the data, write it to the log also, then write it again just for good
> measure” behavior.
> 
> For people doing data warehousing work like the poster, this Postgres
> behavior is miserable.  It should be fixed for 8.4 for sure
> (volunteers?)

I might try it. I think I have told you about my ideas ;)
I plan to first do "cacheing" (for being able to doi index only scans
among other things) and then if the cache works reliably, use the
"cacheing" code as the main visibility / MVCC mechanism.

Is Greenplums code available, or should I roll my own ?

> BTW – for the poster’s benefit, you should implement partitioning by
> date, then load each partition and VACUUM ANALYZE after each load.
>  You probably won’t need the date index anymore – so your load times
> will vastly improve (no indexes), you’ll store less data (no indexes)
> and you’ll be able to do simpler data management with the partitions.
> 
> You may also want to partition AND index if you do a lot of short
> range selective date predicates.  Example would be: partition by day,
> index on date field, queries selective on date ranges by hour will
> then select out only the day needed, then index scan to get the 
> hourly values.

If your queries allow it, you may try indexing on 
int2::extract('HOUR' from date)
so the index may be smaller

storing the date as type abstime is another way to reduce index size.

> Typically time-oriented data is nearly time sorted anyway, so you’ll
> also get the benefit of a clustered index.


Hannu