Re: [PERFORM] From Simple to Complex

2012-02-02 Thread Merlin Moncure
On Wed, Feb 1, 2012 at 12:48 PM, Alessandro Gagliardi
 wrote:
> LIMIT 65536; Total query runtime: 14846 ms.
> - http://explain.depesz.com/s/I3E
> LIMIT 69632: Total query runtime: 80141 ms.
> - http://explain.depesz.com/s/9hp
>
> So it looks like when the limit crosses a certain threshold (somewhere north
> of 2^16), Postgres decides to do a Seq Scan instead of an Index Scan.
> I've already lowered random_page_cost to 2. Maybe I should lower it to 1.5?
> Actually 60K should be plenty for my purposes anyway.


also, is effective_cache_size set to a reasonable value?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] *really* bad insert performance on table with unique index

2012-02-02 Thread Jon Nelson
I created a table with two columns: an id SERIAL (primary key) and a
text (not null), and then added a unique index on the text field.
Then I ran the following query (with a huge work_mem - 20GB):

insert into tableA (text_field) select distinct other_text_field from
some_huge_set_of_tables

After 36 hours it had only written 3 GB (determined by looking at what
files it was writing to).
I started over with a TRUNCATE, and then removed the index and tried again.
This time it took 3807270.780 ms (a bit over an hour).
Total number of records: approx 227 million, comprising 16GB of storage.

Why the huge discrepancy?

-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] From Simple to Complex

2012-02-02 Thread Alessandro Gagliardi
On Thu, Feb 2, 2012 at 6:52 AM, Merlin Moncure  wrote:

> also, is effective_cache_size set to a reasonable value?
>
> Yeah, it's 153kB


Re: [PERFORM] Index with all necessary columns - Postgres vs MSSQL

2012-02-02 Thread Merlin Moncure
On Thu, Feb 2, 2012 at 10:41 AM, Gudmundur Johannesson
 wrote:
> Do you think I should try using the latest build of the source for 9.2 since
> index-only-scan is "ready" according to
> http://www.depesz.com/index.php/2011/10/08/waiting-for-9-2-index-only-scans/
> ?

hm, interesting.

You are simply welcome to try that and we would definitely like to see
your results.   I looked around and didn't see any binaries for the
development snapshots for windows to test.  That means you have to
compile postgres in order to test 9.2 at this point in time.  Testing
and feedback of index only scan feature would be very much
appreciated.

Generally speaking, postgresql source tree is very high quality --
stuff should mostly work.  The biggest annoyance is that you get lots
of catalog version bumps when pulling new versions of the sources
forcing a dump/reload.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] *really* bad insert performance on table with unique index

2012-02-02 Thread Jeff Janes
On Thu, Feb 2, 2012 at 9:28 AM, Jon Nelson  wrote:
> I created a table with two columns: an id SERIAL (primary key) and a
> text (not null), and then added a unique index on the text field.
> Then I ran the following query (with a huge work_mem - 20GB):
>
> insert into tableA (text_field) select distinct other_text_field from
> some_huge_set_of_tables

I bet the distinct is being implemented by a hashAggregate.  So then
you are inserting the records in a random order, causing the index to
have terrible locality of reference.

Try adding "order by other_text_field" to the select.  Or don't create
the index until afterwards

>
> After 36 hours it had only written 3 GB (determined by looking at what
> files it was writing to).
> I started over with a TRUNCATE, and then removed the index and tried again.
> This time it took 3807270.780 ms (a bit over an hour).
> Total number of records: approx 227 million, comprising 16GB of storage.
>
> Why the huge discrepancy?

Maintaining indices when rows are inserted in a random order generates
a huge amount of scattered I/O.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance