date:20101012

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Greg Smith


Mladen Gogala wrote:
I agree, but I am afraid that after the demise of SGI, XFS isn't being 
developed.


It's back to being well maintained again; see 
http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html 
for some history here and why it's become relevant to RedHat in 
particular recently.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, PostgreSQL 9.0 High PerformancePre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Greg Smith


Samuel Gendler wrote:
I spent some time going through the various tuning docs on the wiki 
whie bringing some new hardware up and I can't remember seeing any 
discussion of tweaking read-ahead at all in the normal 
performance-tuning references.  Do you have any documentation of the 
kinds of tweaking you have done and its effects on different types of 
workloads?


Much of my recent research has gone into the book you'll see plugged 
below rather than the wiki.  The basics of read-ahead tuning is that you 
can see it increase bonnie++ sequential read results when you increase 
it, to a point.  Get to that point and stop and you should be in good shape.


--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, PostgreSQL 9.0 High PerformancePre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Vitalii Tymchyshyn


11.10.10 20:46, Craig James написав(ла):


First of all, it's not true. There are plenty of applications that 
need an exact answer. Second, even if it is only 1%, that means it's 
1% of the queries, not 1% of people. Sooner or later a large fraction 
of developers will run into this. It's probably been the most-asked 
question I've seen on this forum in the four years I've been here. 
It's a real problem, and it needs a real solution.


I know it's a hard problem to solve, but can we stop hinting that 
those of us who have this problem are somehow being dense?



BTW: There is a lot of talk about MVCC, but is next solution possible:
1) Create a page information map that for each page in the table will 
tell you how may rows are within and if any write (either successful or 
not) were done to this page. This even can be two maps to make second 
one really small (a bit per page) - so that it could be most time in-memory.
2) When you need to to count(*) or index check - first check if there 
were no writes to the page. If not - you can use count information from 
page info/index data without going to the page itself
3) Let vacuum clear the bit after frozing all the tuples in the page (am 
I using terminology correctly?).


In this case all read-only (archive) data will be this bit off and 
index/count(*) will be really fast.

Am I missing something?

Best regards, Vitalii Tymchyshyn.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Craig Ringer


On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote:


BTW: There is a lot of talk about MVCC, but is next solution possible:
1) Create a page information map that for each page in the table will
tell you how may rows are within and if any write (either successful or
not) were done to this page. This even can be two maps to make second
one really small (a bit per page) - so that it could be most time
in-memory.
2) When you need to to count(*) or index check - first check if there
were no writes to the page. If not - you can use count information from
page info/index data without going to the page itself
3) Let vacuum clear the bit after frozing all the tuples in the page (am
I using terminology correctly?).


Part of this already exists. It's called the visibility map, and is 
present in 8.4 and above. It's not currently used for queries, but can 
potentially be used to aid some kinds of query.


http://www.postgresql.org/docs/8.4/static/storage-vm.html


In this case all read-only (archive) data will be this bit off and
index/count(*) will be really fast.


A count with any joins or filter criteria would still have to scan all 
pages with visible tuples in them. So the visibility map helps speed up 
scanning of bloated tables, but doesn't provide a magical fast count 
except in the utterly trivial select count(*) from tablename; case, 
and can probably only be used for accurate results when there are no 
read/write transactions currently open. Even if you kept a count of 
tuples in each page along with the mvcc transaction ID information 
required to determine for which transactions that count is valid, it'd 
only be useful if you didn't have to do any condition checks, and it'd 
be yet another thing to update with every insert/delete/update.


Perhaps for some users that'd be worth having, but it seems to me like 
it'd have pretty narrow utility. I'm not sure that's the answer.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david


On Tue, 12 Oct 2010, Craig Ringer wrote:




BTW: There is a lot of talk about MVCC, but is next solution possible:
1) Create a page information map that for each page in the table will
tell you how may rows are within and if any write (either successful or
not) were done to this page. This even can be two maps to make second
one really small (a bit per page) - so that it could be most time
in-memory.
2) When you need to to count(*) or index check - first check if there
were no writes to the page. If not - you can use count information from
page info/index data without going to the page itself
3) Let vacuum clear the bit after frozing all the tuples in the page (am
I using terminology correctly?).


Part of this already exists. It's called the visibility map, and is present 
in 8.4 and above. It's not currently used for queries, but can potentially be 
used to aid some kinds of query.


http://www.postgresql.org/docs/8.4/static/storage-vm.html


In this case all read-only (archive) data will be this bit off and
index/count(*) will be really fast.


A count with any joins or filter criteria would still have to scan all pages 
with visible tuples in them. So the visibility map helps speed up scanning of 
bloated tables, but doesn't provide a magical fast count except in the 
utterly trivial select count(*) from tablename; case, and can probably only 
be used for accurate results when there are no read/write transactions 
currently open. Even if you kept a count of tuples in each page along with 
the mvcc transaction ID information required to determine for which 
transactions that count is valid, it'd only be useful if you didn't have to 
do any condition checks, and it'd be yet another thing to update with every 
insert/delete/update.


Perhaps for some users that'd be worth having, but it seems to me like it'd 
have pretty narrow utility. I'm not sure that's the answer.


from a PR point of view, speeding up the trivil count(*) case could be 
worth it, just to avoid people complaining about it not being fast.


in the case where you are doing a count(*) where query and the where is on 
an indexed column, could the search just look at the index + the 
visibility mapping rather than doing an sequential search through the 
table?


as for your worries about the accuracy of a visibility based count in the 
face of other transactions, wouldn't you run into the same issues if you 
are doing a sequential scan with the same transactions in process?


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Vitalii Tymchyshyn


12.10.10 11:14, Craig Ringer написав(ла):

On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote:


BTW: There is a lot of talk about MVCC, but is next solution possible:
1) Create a page information map that for each page in the table will
tell you how may rows are within and if any write (either successful or
not) were done to this page. This even can be two maps to make second
one really small (a bit per page) - so that it could be most time
in-memory.
2) When you need to to count(*) or index check - first check if there
were no writes to the page. If not - you can use count information from
page info/index data without going to the page itself
3) Let vacuum clear the bit after frozing all the tuples in the page (am
I using terminology correctly?).


Part of this already exists. It's called the visibility map, and is 
present in 8.4 and above. It's not currently used for queries, but can 
potentially be used to aid some kinds of query.


http://www.postgresql.org/docs/8.4/static/storage-vm.html


In this case all read-only (archive) data will be this bit off and
index/count(*) will be really fast.


A count with any joins or filter criteria would still have to scan all 
pages with visible tuples in them. 
If one don't use parittioning. With proper partitioning, filter can 
simply select a partitions.


Also filtering can be mapped on the index lookup. And if one could join 
index hash and visibility map, much like two indexes can be bit joined 
now, count can be really fast for all but non-frozen tuples.
So the visibility map helps speed up scanning of bloated tables, but 
doesn't provide a magical fast count except in the utterly trivial 
select count(*) from tablename; case, and can probably only be used 
for accurate results when there are no read/write transactions 
currently open. 
Why so? You simply has to recount the pages that are marked dirty in 
usual way. But count problem usually occurs when there are a lot of 
archive data (you need to count over 100K records) that is not modified.


Best regards, Vitalii Tymchyshyn

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Craig Ringer


On 10/12/2010 04:22 PM, da...@lang.hm wrote:


from a PR point of view, speeding up the trivil count(*) case could be
worth it, just to avoid people complaining about it not being fast.


At the cost of a fair bit more complexity, though, and slowing 
everything else down.


The proper solution here remains, IMO, support for visibility 
information in indexes, whether by storing it once in the index and once 
in the heap (ouch!), storing it out-of-line, or using a covering index 
where one or more columns are stored wholly in the index not in the 
table heap at all.


Here are a few of the many past discussions about this that have already 
covered some of the same ground:


http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr

http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html
(and the rest of the thread)

A decent look with Google will find many, many more.


in the case where you are doing a count(*) where query and the where is
on an indexed column, could the search just look at the index + the
visibility mapping rather than doing an sequential search through the
table?


Nope, because the visibility map, which is IIRC only one bit per page, 
doesn't record how many tuples there are on the page, or enough 
information about them to determine how many of them are visible to the 
current transaction*.



as for your worries about the accuracy of a visibility based count in
the face of other transactions, wouldn't you run into the same issues if
you are doing a sequential scan with the same transactions in process?


No. Every tuple in a table heap in postgresql has hidden fields, some of 
which are used to determine whether the current transaction* can see 
the tuple - it may have been inserted after this transaction started, or 
deleted before this transaction started, so it's not visible to this 
transaction but may still be to others.


http://www.postgresql.org/docs/current/static/ddl-system-columns.html

This information isn't available in the visibility map, or in indexes. 
That's why PostgreSQL has to hit the heap to find it.


* current transaction should really be current snapshot. The snapshot 
is taken at the start of the whole transaction for SERIALIZABLE 
isolation, and at the start of each statement for READ COMMITTED isolation.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala


da...@lang.hm wrote:
from a PR point of view, speeding up the trivil count(*) case could be 
worth it, just to avoid people complaining about it not being fast.


  
Fixing PR stuff is not the approach that I would take. People are 
complaining about select count(*) because they're using it in all the 
wrong places.  My assessment that there is a problem with sequential 
scan was wrong. Now, let's again take Oracle as the measure.

Someone asked me about caching the data.  Here it is:

SQL connect system/*
Connected.
SQL alter system flush buffer_cache;

System altered.

Elapsed: 00:00:12.68
SQL connect adbase/*
Connected.
SQL alter session set db_file_multiblock_read_Count=128;

Session altered.

Elapsed: 00:00:00.41
SQL select count(*) from ni_occurrence;

 COUNT(*)
--
402062638

Elapsed: 00:02:37.77

SQL select bytes/1048576 MB from user_segments
 2  where segment_name='NI_OCCURRENCE';

   MB
--
35329

Elapsed: 00:00:00.20
SQL


So, the results weren't cached the first time around. The explanation is 
the fact that Oracle, as of the version 10.2.0, reads the table in the 
private process memory, not in the shared buffers.  This table alone is  
35GB in size,  Oracle took 2 minutes 47 seconds to read it using the 
full table scan. If I do the same thing with PostgreSQL and a comparable 
table, Postgres is, in fact, faster:


psql (9.0.1)
Type help for help.

news= \timing
Timing is on.
news= select count(*) from moreover_documents_y2010m09;
 count  
--

17242655
(1 row)

Time: 113135.114 ms
news= select pg_size_pretty(pg_table_size('moreover_documents_y2010m09'));
pg_size_pretty

27 GB
(1 row)

Time: 100.849 ms
news=

The number of rows is significantly smaller, but the table contains 
rather significant text field which consumes quite a bit of TOAST 
storage and the sizes are comparable. Postgres read through 27GB in 113 
seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to 
read through 35GB.  I stand corrected: there is nothing wrong with the 
speed of the Postgres sequential scan.



--
Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Robert Haas

On Mon, Oct 11, 2010 at 11:11 PM,  gnuo...@rcn.com wrote:
 An approach that works can be found in DB2, and likely elsewhere.

 The key is that tablespaces/tables/indexes/buffers are all attached through 
 the bufferpool (the DB2 term).  A tablespace/bufferpool match is defined.  
 Then tables and indexes are assigned to the tablespace (and implicitly, the 
 bufferpool).  As a result, one can effectively pin data in memory.  This is 
 very useful, but not low hanging fruit to implement.

 The introduction of rudimentary tablespaces is a first step.  I assumed that 
 the point was to get to a DB2-like structure at some point.  Yes?

We already have tablespaces, and our data already is accessed through
the buffer pool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Runtime dependency from size of a bytea field

2010-10-12 Thread Robert Haas

On Fri, Oct 8, 2010 at 12:53 AM, Sander, Ingo (NSN - DE/Munich)
ingo.san...@nsn.com wrote:
 The difference to my test is that we use the ODBC interface in our C program. 
 Could it be that the difference in the runtimes is caused by the ODBC?

I've heard tell that ODBC is substantially slower than a native libpq
connection, but I don't know that for a fact, not being an ODBC user.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jon Nelson

On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala
mladen.gog...@vmsinfo.com wrote:

 So, the results weren't cached the first time around. The explanation is the
 fact that Oracle, as of the version 10.2.0, reads the table in the private
 process memory, not in the shared buffers.  This table alone is  35GB in
 size,  Oracle took 2 minutes 47 seconds to read it using the full table
 scan. If I do the same thing with PostgreSQL and a comparable table,
 Postgres is, in fact, faster:


Well, I didn't quite mean that - having no familiarity with Oracle I
don't know what the alter system statement does, but I was talking
specifically about the linux buffer and page cache. The easiest way to
drop the linux caches in one fell swoop is:

echo 3  /proc/sys/vm/drop_caches

Is there a command to tell postgresql to drop/clear/reset it's buffer_cache?

Clearing/dropping both the system (Linux) and the DB caches is
important when doing benchmarks that involve I/O.



-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Greg Smith


Jon Nelson wrote:

Is there a command to tell postgresql to drop/clear/reset it's buffer_cache?
  


No.  Usually the sequence used to remove all cached data from RAM before 
a benchmark is:


pg_ctl stop
sync
echo 3  /proc/sys/vm/drop_caches
pg_ctl start

--
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, PostgreSQL 9.0 High PerformancePre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Luca Tettamanti

On Tue, Oct 12, 2010 at 3:07 PM, Jon Nelson jnelson+pg...@jamponi.net wrote:
 On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala
 mladen.gog...@vmsinfo.com wrote:

 So, the results weren't cached the first time around. The explanation is the
 fact that Oracle, as of the version 10.2.0, reads the table in the private
 process memory, not in the shared buffers.  This table alone is  35GB in
 size,  Oracle took 2 minutes 47 seconds to read it using the full table
 scan. If I do the same thing with PostgreSQL and a comparable table,
 Postgres is, in fact, faster:

 Well, I didn't quite mean that - having no familiarity with Oracle I
 don't know what the alter system statement does, but I was talking
 specifically about the linux buffer and page cache. The easiest way to
 drop the linux caches in one fell swoop is:

 echo 3  /proc/sys/vm/drop_caches

AFAIK this won't affect Oracle when using direct IO (which bypasses
the page cache).

Luca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jon Nelson

On Tue, Oct 12, 2010 at 8:18 AM, Greg Smith g...@2ndquadrant.com wrote:
 No.  Usually the sequence used to remove all cached data from RAM before a
 benchmark is:

All cached data (as cached in postgresql - *not* the Linux system
caches)..., right?


-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Tom Lane

Mladen Gogala mladen.gog...@vmsinfo.com writes:
 The number of rows is significantly smaller, but the table contains 
 rather significant text field which consumes quite a bit of TOAST 
 storage and the sizes are comparable. Postgres read through 27GB in 113 
 seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to 
 read through 35GB.  I stand corrected: there is nothing wrong with the 
 speed of the Postgres sequential scan.

Um ... the whole point of TOAST is that the data isn't in-line.
So what Postgres was actually reading through was probably quite a
lot less than 27Gb.  It's probably hard to make a completely
apples-to-apples comparison because the two databases are so different,
but I don't think this one proves that PG is faster than Oracle.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Kevin Grittner

Neil Whelchel neil.whelc...@gmail.com wrote:
 
 What is the best method to make a page of results and a list of
 links to other pages of results?
 
For our most heavily used web app we decided to have the renderer
just read the list of cases and render the pages to disk, and then
present the first one.  We set a limit of 500 entries on the list;
if we get past 500 we put up a page telling them to refine their
search criteria.  That won't work for all circumstances, but it
works well for out web app.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala


Tom Lane wrote:

Mladen Gogala mladen.gog...@vmsinfo.com writes:
  
The number of rows is significantly smaller, but the table contains 
rather significant text field which consumes quite a bit of TOAST 
storage and the sizes are comparable. Postgres read through 27GB in 113 
seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to 
read through 35GB.  I stand corrected: there is nothing wrong with the 
speed of the Postgres sequential scan.



Um ... the whole point of TOAST is that the data isn't in-line.
So what Postgres was actually reading through was probably quite a
lot less than 27Gb.  It's probably hard to make a completely
apples-to-apples comparison because the two databases are so different,
but I don't think this one proves that PG is faster than Oracle.

regards, tom lane
  


As is usually the case, you're right. I will try copying the table to 
Postgres over the weekend, my management would not look kindly upon my 
copying 35GB of the production data during the working hours, for the 
scientific reasons. I have the storage and I can test, I will post the 
result. I developed quite an efficient Perl script which does copying 
without the intervening CSV file, so that the copy should not take more 
than 2 hours. I will be able to impose a shared lock on the table over 
the weekend, so that I don't blow away the UNDO segments.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread gnuoytr

The discussions I've seen indicated that, in use, tablespaces were at the 
database level, but, yes, the docs do say that a table can be assigned to a 
defined tablespace.  What I still can't find is syntax which establishes 
buffers/caches/whatever and assigns them to tablespaces.  Without that, I'm not 
sure what benefit there is to tablespaces, other than a sort of RAID-lite.

Robert


 Original message 
Date: Tue, 12 Oct 2010 08:34:23 -0400
From: pgsql-performance-ow...@postgresql.org (on behalf of Robert Haas 
robertmh...@gmail.com)
Subject: Re: [PERFORM] How does PG know if data is in memory?  
To: gnuo...@rcn.com
Cc: pgsql-performance@postgresql.org

On Mon, Oct 11, 2010 at 11:11 PM,  gnuo...@rcn.com wrote:
 An approach that works can be found in DB2, and likely elsewhere.

 The key is that tablespaces/tables/indexes/buffers are all attached through 
 the bufferpool (the DB2 term).  A tablespace/bufferpool match is defined.  
 Then tables and indexes are assigned to the tablespace (and implicitly, the 
 bufferpool).  As a result, one can effectively pin data in memory.  This is 
 very useful, but not low hanging fruit to implement.

 The introduction of rudimentary tablespaces is a first step.  I assumed that 
 the point was to get to a DB2-like structure at some point.  Yes?

We already have tablespaces, and our data already is accessed through
the buffer pool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Joe Uhl


The biggest single problem with select count(*) is that it is
seriously overused. People use that idiom to establish existence, which
usually leads to a performance disaster in the application using it,
unless the table has no more than few hundred records. SQL language, of
which PostgreSQL offers an excellent implementation,  offers [NOT]
EXISTS clause since its inception in the Jurassic era. The problem is
with the sequential scan, not with counting. I'd even go as far as to
suggest that 99% instances of the select count(*) idiom are probably
bad use of the SQL language.


I agree, I have seen many very bad examples of using count(*). I will go so
far as to question the use of count(*) in my examples here. It there a better
way to come up with a page list than using count(*)? What is the best method
to make a page of results and a list of links to other pages of results? Am I
barking up the wrong tree here?
One way I have dealt with this on very large tables is to cache the 
count(*) at the application level (using memcached, terracotta, or 
something along those lines) and then increment that cache whenever you 
add a row to the relevant table.  On application restart that cache is 
re-initialized with a regular old count(*).  This approach works really 
well and all large systems in my experience need caching in front of the 
DB eventually.  If you have a simpler system with say a single 
application/web server you can simply store the value in a variable, the 
specifics would depend on the language and framework you are using.


Another more all-DB approach is to create a statistics tables into which 
you place aggregated statistics rows (num deleted, num inserted, totals, 
etc) at an appropriate time interval in your code.  So you have rows 
containing aggregated statistics information for the past and some tiny 
portion of the new data happening right now that hasn't yet been 
aggregated.  Queries then look like a summation of the aggregated values 
in the statistics table plus a count(*) over just the newest portion of 
the data table and are generally very fast.


Overall I have found that once things get big the layers of your app 
stack start to blend together and have to be combined in clever ways to 
keep speed up.  Postgres is a beast but when you run into things it 
can't do well just find a way to cache it or make it work together with 
some other persistence tech to handle those cases.

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Kevin Grittner

gnuo...@rcn.com wrote:
 
 An approach that works can be found in DB2, and likely elsewhere.
 
 The key is that tablespaces/tables/indexes/buffers are all
 attached through the bufferpool (the DB2 term).  A tablespace/
 bufferpool match is defined.  Then tables and indexes are assigned
 to the tablespace (and implicitly, the bufferpool).  As a result,
 one can effectively pin data in memory. This is very useful, but
 not low hanging fruit to implement.
 
This sounds similar to Sybase named caches.  You can segment off
portions of the memory for specific caches, break that up into space
reserved for different I/O buffer sizes, and bind specific database
objects (tables and indexes) to specific caches.  On the few
occasions where someone had failed to configure the named caches
when setting up a machine, it was caught almost immediately after
deployment because of end-user complaints about poor performance. 
This was so critical to performance for us when we were using
Sybase, that one of my first reactions on finding it missing in
PostgreSQL was distress over the inability to tune as I had.
 
When I posted to the list about it, the response was that LRU
eviction was superior to any tuning any human would do.  I didn't
and don't believe that, but have found it's close enough in the
PostgreSQL environment to be *way* down my list of performance
issues.  In fact, when looking at the marginal benefits it would
generate in PostgreSQL when done right, versus the number of people
who would shoot themselves in the foot with it, even I have come
around to feeling it's probably not a good idea.
 
FWIW, the four main reasons for using it were:
 
(1) Heavily used data could be kept fully cached in RAM and not
driven out by transient activity.
 
(2) You could flag a cache used for (1) above as using relaxed LRU
accounting -- it saved a lot of time tracking repeated references,
leaving more CPU for other purposes.
 
(3) Each named cache had its own separate set of locks, reducing
contention.
 
(4) Large tables for which the heap was often were scanned in its
entirety or for a range on the clustered index could be put in a
relatively small cache with large I/O buffers.  This avoided blowing
out the default cache space for situations which almost always
required disk I/O anyway.
 
None of that is anything for amateurs to play with.  You need to set
up caches like that based on evidence from monitoring and do careful
benchmarking of the results to actually achieve improvements over
LRU logic.
 
 The introduction of rudimentary tablespaces is a first step.  I
 assumed that the point was to get to a DB2-like structure at some
 point.  Yes?
 
As far as I can tell, there is nobody with that intent.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread gnuoytr

Couldn't have said it better myself; covered all the bases.  If PG wants to 
become an industrial strength database, worthy of replacing DB2/etc., then 
these are the sorts of knobs and switches it will need. 


-- None of that is anything for amateurs to play with.

Not jam a stick in anybody's eye, but shouldn't database pros not be amateurs?  
Or are most PG-ers coders who don't really want to design and tune a database?

Robert

 Original message 
Date: Tue, 12 Oct 2010 09:35:56 -0500
From: pgsql-performance-ow...@postgresql.org (on behalf of Kevin Grittner 
kevin.gritt...@wicourts.gov)
Subject: Re: [PERFORM] How does PG know if data is in memory?  
To: pgsql-performance@postgresql.org,gnuo...@rcn.com

gnuo...@rcn.com wrote:
 
 An approach that works can be found in DB2, and likely elsewhere.
 
 The key is that tablespaces/tables/indexes/buffers are all
 attached through the bufferpool (the DB2 term).  A tablespace/
 bufferpool match is defined.  Then tables and indexes are assigned
 to the tablespace (and implicitly, the bufferpool).  As a result,
 one can effectively pin data in memory. This is very useful, but
 not low hanging fruit to implement.
 
This sounds similar to Sybase named caches.  You can segment off
portions of the memory for specific caches, break that up into space
reserved for different I/O buffer sizes, and bind specific database
objects (tables and indexes) to specific caches.  On the few
occasions where someone had failed to configure the named caches
when setting up a machine, it was caught almost immediately after
deployment because of end-user complaints about poor performance. 
This was so critical to performance for us when we were using
Sybase, that one of my first reactions on finding it missing in
PostgreSQL was distress over the inability to tune as I had.
 
When I posted to the list about it, the response was that LRU
eviction was superior to any tuning any human would do.  I didn't
and don't believe that, but have found it's close enough in the
PostgreSQL environment to be *way* down my list of performance
issues.  In fact, when looking at the marginal benefits it would
generate in PostgreSQL when done right, versus the number of people
who would shoot themselves in the foot with it, even I have come
around to feeling it's probably not a good idea.
 
FWIW, the four main reasons for using it were:
 
(1) Heavily used data could be kept fully cached in RAM and not
driven out by transient activity.
 
(2) You could flag a cache used for (1) above as using relaxed LRU
accounting -- it saved a lot of time tracking repeated references,
leaving more CPU for other purposes.
 
(3) Each named cache had its own separate set of locks, reducing
contention.
 
(4) Large tables for which the heap was often were scanned in its
entirety or for a range on the clustered index could be put in a
relatively small cache with large I/O buffers.  This avoided blowing
out the default cache space for situations which almost always
required disk I/O anyway.
 
None of that is anything for amateurs to play with.  You need to set
up caches like that based on evidence from monitoring and do careful
benchmarking of the results to actually achieve improvements over
LRU logic.
 
 The introduction of rudimentary tablespaces is a first step.  I
 assumed that the point was to get to a DB2-like structure at some
 point.  Yes?
 
As far as I can tell, there is nobody with that intent.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Kevin Grittner

gnuo...@rcn.com wrote:
 
 -- None of that is anything for amateurs to play with.
 
 Not jam a stick in anybody's eye, but shouldn't database pros not
 be amateurs?
 
While many PostgreSQL installations are managed by professional
DBAs, or programmers or consultants with a deep enough grasp of the
issues to tune a knob like that appropriately, PostgreSQL is also
used in environments without such staff.  In fact, there is pressure
to make PostgreSQL easier to configure for exactly that reason.  If
we add more knobs which are this hard to tune correctly, we would
risk inundation with complaints from people to tried to use it and
made things worse.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread bricklen

On Sat, Oct 9, 2010 at 4:26 PM, Neil Whelchel neil.whelc...@gmail.com wrote:
 Maybe an
 estimate(*) that works like count but gives an answer from the index without
 checking visibility? I am sure that this would be good enough to make a page
 list, it is really no big deal if it errors on the positive side, maybe the
 list of pages has an extra page off the end. I can live with that. What I
 can't live with is taking 13 seconds to get a page of results from 850,000
 rows in a table.
 -Neil-


FWIW, Michael Fuhr wrote a small function to parse the EXPLAIN plan a
few years ago and it works pretty well assuming your stats are up to
date.

http://markmail.org/message/gknqthlwry2eoqey

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Kevin Grittner

Jon Nelson jnelson+pg...@jamponi.net wrote:
 Greg Smith g...@2ndquadrant.com wrote:
 
 Usually the sequence used to remove all cached data from RAM
 before a benchmark is:
 
 All cached data (as cached in postgresql - *not* the Linux system
 caches)..., right?
 
No.  The stop and start of PostgreSQL causes empty PostgreSQL
caches.  These lines, in between the stop and start, force the Linux
cache to be empty (on recent kernel versions):
 
sync
echo 3  /proc/sys/vm/drop_caches
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Dan Harris


 On 10/11/10 8:02 PM, Scott Carey wrote:

would give you a 1MB read-ahead.  Also, consider XFS and its built-in 
defragmentation.  I have found that a longer lived postgres DB will get extreme
file fragmentation over time and sequential scans end up mostly random.  
On-line file defrag helps tremendously.

We just had a corrupt table caused by an XFS online defrag.  I'm scared 
to use this again while the db is live.  Has anyone else found this to 
be safe?  But, I can vouch for the fragmentation issue, it happens very 
quickly in our system.


-Dan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david


On Tue, 12 Oct 2010, Joe Uhl wrote:


The biggest single problem with select count(*) is that it is
seriously overused. People use that idiom to establish existence, which
usually leads to a performance disaster in the application using it,
unless the table has no more than few hundred records. SQL language, of
which PostgreSQL offers an excellent implementation,  offers [NOT]
EXISTS clause since its inception in the Jurassic era. The problem is
with the sequential scan, not with counting. I'd even go as far as to
suggest that 99% instances of the select count(*) idiom are probably
bad use of the SQL language.


I agree, I have seen many very bad examples of using count(*). I will go so
far as to question the use of count(*) in my examples here. It there a 
better
way to come up with a page list than using count(*)? What is the best 
method
to make a page of results and a list of links to other pages of results? Am 
I

barking up the wrong tree here?
One way I have dealt with this on very large tables is to cache the count(*) 
at the application level (using memcached, terracotta, or something along 
those lines) and then increment that cache whenever you add a row to the 
relevant table.  On application restart that cache is re-initialized with a 
regular old count(*).  This approach works really well and all large systems 
in my experience need caching in front of the DB eventually.  If you have a 
simpler system with say a single application/web server you can simply store 
the value in a variable, the specifics would depend on the language and 
framework you are using.


this works if you know ahead of time what the criteria of the search is 
going to be.


so it will work for

select count(*) from table;

what this won't work for is cases wher the criteria of the search is 
unpredictable, i.e.


ask the user for input

select count(*) from table where field=$input;

David Lang

Another more all-DB approach is to create a statistics tables into which you 
place aggregated statistics rows (num deleted, num inserted, totals, etc) at 
an appropriate time interval in your code.  So you have rows containing 
aggregated statistics information for the past and some tiny portion of the 
new data happening right now that hasn't yet been aggregated.  Queries then 
look like a summation of the aggregated values in the statistics table plus a 
count(*) over just the newest portion of the data table and are generally 
very fast.


Overall I have found that once things get big the layers of your app stack 
start to blend together and have to be combined in clever ways to keep speed 
up.  Postgres is a beast but when you run into things it can't do well just 
find a way to cache it or make it work together with some other persistence 
tech to handle those cases.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david


On Tue, 12 Oct 2010, Mladen Gogala wrote:


da...@lang.hm wrote:
from a PR point of view, speeding up the trivil count(*) case could be 
worth it, just to avoid people complaining about it not being fast.



Fixing PR stuff is not the approach that I would take. People are complaining 
about select count(*) because they're using it in all the wrong places.


that may be the case, but if it's possible to make it less painful it will 
mean more people use postgres, both because it works better for them when 
they are using the suboptimal programs, but also because when people do 
their trivial testing of databases to decide which one they will use, they 
won't rule out postgres because it's so slow


the fact of the matter is that people do use count(*), and even though 
there are usually ways to avoid doing so, having the programmer have to do 
something different for postgres than they do for other databases is 
raising a barrier against postgres untilization in anything.


David Lang

 My 
assessment that there is a problem with sequential scan was wrong. Now, let's 
again take Oracle as the measure.

Someone asked me about caching the data.  Here it is:

SQL connect system/*
Connected.
SQL alter system flush buffer_cache;

System altered.

Elapsed: 00:00:12.68
SQL connect adbase/*
Connected.
SQL alter session set db_file_multiblock_read_Count=128;

Session altered.

Elapsed: 00:00:00.41
SQL select count(*) from ni_occurrence;

COUNT(*)
--
402062638

Elapsed: 00:02:37.77

SQL select bytes/1048576 MB from user_segments
2  where segment_name='NI_OCCURRENCE';

  MB
--
   35329

Elapsed: 00:00:00.20
SQL


So, the results weren't cached the first time around. The explanation is the 
fact that Oracle, as of the version 10.2.0, reads the table in the private 
process memory, not in the shared buffers.  This table alone is  35GB in 
size,  Oracle took 2 minutes 47 seconds to read it using the full table scan. 
If I do the same thing with PostgreSQL and a comparable table, Postgres is, 
in fact, faster:


psql (9.0.1)
Type help for help.

news= \timing
Timing is on.
news= select count(*) from moreover_documents_y2010m09;
count  --
17242655
(1 row)

Time: 113135.114 ms
news= select pg_size_pretty(pg_table_size('moreover_documents_y2010m09'));
pg_size_pretty

27 GB
(1 row)

Time: 100.849 ms
news=

The number of rows is significantly smaller, but the table contains rather 
significant text field which consumes quite a bit of TOAST storage and the 
sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2 
minutes and oracle took 2 minutes 37 seconds to read through 35GB.  I stand 
corrected: there is nothing wrong with the speed of the Postgres sequential 
scan.






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread david


On Tue, 12 Oct 2010, Craig Ringer wrote:


On 10/12/2010 04:22 PM, da...@lang.hm wrote:


from a PR point of view, speeding up the trivil count(*) case could be
worth it, just to avoid people complaining about it not being fast.


At the cost of a fair bit more complexity, though, and slowing everything 
else down.


complexity probably, although given how complex the planner is already is 
this significant?


as far as slowing everything else down, why would it do that? (beyond the 
simple fact that any new thing the planner can do makes the planner take a 
little longer)


David Lang

The proper solution here remains, IMO, support for visibility information in 
indexes, whether by storing it once in the index and once in the heap 
(ouch!), storing it out-of-line, or using a covering index where one or more 
columns are stored wholly in the index not in the table heap at all.


Here are a few of the many past discussions about this that have already 
covered some of the same ground:


http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr

http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html
(and the rest of the thread)

A decent look with Google will find many, many more.


in the case where you are doing a count(*) where query and the where is
on an indexed column, could the search just look at the index + the
visibility mapping rather than doing an sequential search through the
table?


Nope, because the visibility map, which is IIRC only one bit per page, 
doesn't record how many tuples there are on the page, or enough information 
about them to determine how many of them are visible to the current 
transaction*.



as for your worries about the accuracy of a visibility based count in
the face of other transactions, wouldn't you run into the same issues if
you are doing a sequential scan with the same transactions in process?


No. Every tuple in a table heap in postgresql has hidden fields, some of 
which are used to determine whether the current transaction* can see the 
tuple - it may have been inserted after this transaction started, or deleted 
before this transaction started, so it's not visible to this transaction but 
may still be to others.


http://www.postgresql.org/docs/current/static/ddl-system-columns.html

This information isn't available in the visibility map, or in indexes. That's 
why PostgreSQL has to hit the heap to find it.


* current transaction should really be current snapshot. The snapshot is 
taken at the start of the whole transaction for SERIALIZABLE isolation, and 
at the start of each statement for READ COMMITTED isolation.


--
Craig Ringer



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey


On Oct 11, 2010, at 9:21 PM, Samuel Gendler wrote:



On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey 
sc...@richrelevance.commailto:sc...@richrelevance.com wrote:
I can't speak to documentation, but it is something that helps as your I/O 
subsystem gets more powerful, and how much it helps depends more on your 
hardware, which may have adaptive read ahead on its own, and your file system 
which may be more or less efficient at sequential I/O.  For example ext3 out of 
the box gets a much bigger gain from tuning read-ahead than XFS on a DELL PERC6 
RAID card (but still ends up slower).


Geez.  I wish someone would have written something quite so bold as 'xfs is 
always faster than ext3' in the standard tuning docs.  I couldn't find anything 
that made a strong filesystem recommendation.  How does xfs compare to ext4?  I 
wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure 
on a production system caused my test system to get thrown into production 
before I could do any serious testing of xfs.  If there is a strong consensus 
that xfs is simply better, I could afford the downtime to switch.

As it happens, this is a system where all of the heavy workload is in the form 
of sequential scan type load. The OLTP workload is very minimal (tens of 
queries per minute on a small number of small tables), but there are a lot of 
reporting queries that wind up doing sequential scans of large partitions 
(millions to tens of millions of rows).  We've sized the new hardware so that 
the most commonly used partitions fit into memory, but if we could speed the 
queries that touch less frequently used partitions, that would be good.  I'm 
the closest thing our team has to a DBA, which really only means that I'm the 
one person on the dev team or the ops team to have read all of the postgres 
docs and wiki and the mailing lists.  I claim no actual DBA experience or 
expertise and have limited cycles to devote to tuning and testing, so if there 
is an established wisdom for filesystem choice and read ahead tuning, I'd be 
very interested in hearing it.


ext4 is a very fast file system.  Its faster than ext2, but has many more 
features and has the all-important journaling.

However, for large reporting queries and sequential scans, XFS will win in the 
long run if you use the online defragmenter.  Otherwise, your sequential scans 
won't be all that sequential on any file system over time if your tables aren't 
written once, forever, serially.   Parallel restore will result in a system 
that is fragmented -- ext4 will do best at limiting this on the restore, but 
only xfs has online defragmentation.  We schedule ours daily and it noticeably 
improves sequential scan I/O.

Supposedly, an online defragmenter is in the works for ext4 but it may be years 
before its available.

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Samuel Gendler

On Tue, Oct 12, 2010 at 9:02 AM, Scott Carey sc...@richrelevance.comwrote:


 However, for large reporting queries and sequential scans, XFS will win in
 the long run if you use the online defragmenter.  Otherwise, your sequential
 scans won't be all that sequential on any file system over time if your
 tables aren't written once, forever, serially.   Parallel restore will
 result in a system that is fragmented -- ext4 will do best at limiting this
 on the restore, but only xfs has online defragmentation.  We schedule ours
 daily and it noticeably improves sequential scan I/O.


Our reporting tables are written sequentially and left unmodified until
entire partitions are dropped.  However, equivalent partitions tend to get a
little bit larger over time, so newer partitions won't necessarily fit into
the gaps left by prior partition drops, so it is possible that partitions
will be split into two sections, but should still be very sequential, if not
perfectly so.  It would seem that we stumbled into an ideal architecture for
doing this kind of work - mostly by virtue of starting with 8.2.x and having
huge problems with autovacuum and vacuum taking forever and dragging the db
to halt, which caused us to move to an architecture which aggregates and
then drops older data in entire partitions instead of updating aggregates
individually and then deleting rows. Partitions are sized such that most
reporting queries run over entire partitions, too (which was completely
accidental since I had not yet delved into individual query optimization at
the time), so even though we are doing sequential scans, we at least run as
few of them as possible and are able to keep hot data in memory.

--sam

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Chris Browne

sgend...@ideasculptor.com (Samuel Gendler) writes:
 Geez.  I wish someone would have written something quite so bold as
 'xfs is always faster than ext3' in the standard tuning docs.  I
 couldn't find anything that made a strong filesystem
 recommendation.  How does xfs compare to ext4?  I wound up on ext4 on
 a dell perc6 raid card when an unexpected hardware failure on a
 production system caused my test system to get thrown into production
 before I could do any serious testing of xfs.  If there is a strong
 consensus that xfs is simply better, I could afford the downtime to
 switch.

It's news to me (in this thread!) that XFS is actually getting some
developer love, which is a pretty crucial factor to considering it
relevant.

XFS was an SGI creation, and, with:

 a) the not-scintillating performance of the company,

 b) the lack of a lot of visible work going into the filesystem,

 c) the paucity of support by Linux vendors (for a long time, if you 
told RHAT you were having problems, and were using XFS, the next
step would be to park the ticket awaiting your installing a
supported filesystem)

it didn't look like XFS was a terribly good bet.  Those issues were
certainly causing concern a couple of years ago.

Faster raw performance isn't much good if it comes with a risk of:
 - Losing data
 - Losing support from vendors

If XFS now *is* getting support from both the development and support
perspectives, then the above concerns may have been invalidated.  It
would be very encouraging, if so.
-- 
output = (cbbrowne @ gmail.com)
Rules of  the Evil Overlord  #228.  If the  hero claims he  wishes to
confess  in public  or to  me  personally, I  will remind  him that  a
notarized deposition will serve just as well.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey

 
 
 A count with any joins or filter criteria would still have to scan all 
 pages with visible tuples in them. So the visibility map helps speed up 
 scanning of bloated tables, but doesn't provide a magical fast count 
 except in the utterly trivial select count(*) from tablename; case, 
 and can probably only be used for accurate results when there are no 
 read/write transactions currently open.

select count(*) from tablename where [condition or filter that can use an 
index] [group by on columns in the index]

will also work, I think.

Additionally, I think it can work if other open transactions exist, provided 
they haven't written to the table being scanned.  If they have, then only those 
pages that have been altered and marked in the visibility map need to be 
cracked open the normal way.

 Even if you kept a count of 
 tuples in each page along with the mvcc transaction ID information 
 required to determine for which transactions that count is valid, it'd 
 only be useful if you didn't have to do any condition checks, and it'd 
 be yet another thing to update with every insert/delete/update.
 

Yes, lots of drawbacks and added complexity.

 Perhaps for some users that'd be worth having, but it seems to me like 
 it'd have pretty narrow utility. I'm not sure that's the answer.
 
 --
 Craig Ringer
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jesper Krogh


On 2010-10-12 18:02, Scott Carey wrote:

 However, for large reporting queries and sequential scans, XFS will
 win in the long run if you use the online defragmenter.  Otherwise,
 your sequential scans won't be all that sequential on any file system
 over time if your tables aren't written once, forever, serially.
 Parallel restore will result in a system that is fragmented -- ext4
 will do best at limiting this on the restore, but only xfs has online
 defragmentation.  We schedule ours daily and it noticeably improves
 sequential scan I/O.

 Supposedly, an online defragmenter is in the works for ext4 but it
 may be years before its available.


If some clever postgres hacker could teach postgres to allocate blocks
using posix_fallocate in quite large batches, say .. something like:
fallocate(min(current_relation_size *0.1,1073741824));

So if you have a relations filling 10GB allready, they the next file 
for the

relations is just fully allocated on the first byte by the filesystem. That
would ensure that large table is sitting efficiently on the filesystem 
level with
a minimum of fragmentation on ext4(and other FS's supporting 
posix_fallocate)

and for small systems it would only fill 10% more of diskspace...   ..

.. last night I spend an hour looking for where its done but couldnt 
find the

source-file where extention of an existing relation takes place.. can
someone give directions?

--
Jesper

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey


On Oct 12, 2010, at 8:39 AM, Dan Harris wrote:

  On 10/11/10 8:02 PM, Scott Carey wrote:
 would give you a 1MB read-ahead.  Also, consider XFS and its built-in 
 defragmentation.  I have found that a longer lived postgres DB will get 
 extreme
 file fragmentation over time and sequential scans end up mostly random.  
 On-line file defrag helps tremendously.
 
 We just had a corrupt table caused by an XFS online defrag.  I'm scared 
 to use this again while the db is live.  Has anyone else found this to 
 be safe?  But, I can vouch for the fragmentation issue, it happens very 
 quickly in our system.
 

What version?  I'm using the latest CentoOS extras build.

We've been doing online defrag for a while now on a very busy database with  
8TB of data.  Not that that means there are no bugs... 

It is a relatively simple thing in xfs -- it writes a new file to temp in a way 
that allocates contiguous space if available, then if the file has not been 
modified since it was re-written it is essentially moved on top of the other 
one.  This should be safe provided the journaling and storage is safe, etc.


 -Dan
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey


On Oct 12, 2010, at 8:54 AM, da...@lang.hm wrote:

 On Tue, 12 Oct 2010, Craig Ringer wrote:
 
 On 10/12/2010 04:22 PM, da...@lang.hm wrote:
 
 from a PR point of view, speeding up the trivil count(*) case could be
 worth it, just to avoid people complaining about it not being fast.
 
 At the cost of a fair bit more complexity, though, and slowing everything 
 else down.
 
 complexity probably, although given how complex the planner is already is 
 this significant?
 
 as far as slowing everything else down, why would it do that? (beyond the 
 simple fact that any new thing the planner can do makes the planner take a 
 little longer)
 
 David Lang
 
I wouldn't even expect the planner to do more work.  An Index Scan can simply 
avoid going to the tuples for visibility under some circumstances.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Scott Carey


On Oct 12, 2010, at 9:46 AM, Scott Carey wrote:

 
 On Oct 12, 2010, at 8:54 AM, da...@lang.hm wrote:
 
 On Tue, 12 Oct 2010, Craig Ringer wrote:
 
 On 10/12/2010 04:22 PM, da...@lang.hm wrote:
 
 from a PR point of view, speeding up the trivil count(*) case could be
 worth it, just to avoid people complaining about it not being fast.
 
 At the cost of a fair bit more complexity, though, and slowing everything 
 else down.
 
 complexity probably, although given how complex the planner is already is 
 this significant?
 
 as far as slowing everything else down, why would it do that? (beyond the 
 simple fact that any new thing the planner can do makes the planner take a 
 little longer)
 
 David Lang
 
 I wouldn't even expect the planner to do more work.  An Index Scan can simply 
 avoid going to the tuples for visibility under some circumstances.
 
 
Of course, the planner has to   Otherwise it won't choose the Index Scan 
over the sequential scan.  So the cost of index scans when all the info other 
than visibility is in the index would need to be lowered.


 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne

cr...@postnewspapers.com.au (Craig Ringer) writes:
 Hey, maybe I should try posting YouTube video answers to a few
 questions for kicks, see how people react ;-)

And make sure it uses the same voice as is used in the MongoDB is web
scale video, to ensure that people interpret it correctly :-).
-- 
output = (cbbrowne @ gmail.com)
http://linuxdatabases.info/info/nonrdbms.html
The *Worst* Things  to Say to a  Police Officer: Hey, is that  a 9 mm?
That's nothing compared to this .44 magnum.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] large dataset with write vs read clients

2010-10-12 Thread Chris Browne

mladen.gog...@vmsinfo.com (Mladen Gogala) writes:
 I have a logical problem with asynchronous commit. The commit
 command should instruct the database to make the outcome of the
 transaction permanent. The application should wait to see whether the
 commit was successful or not. Asynchronous behavior in the commit
 statement breaks the ACID rules and should not be used in a RDBMS
 system. If you don't need ACID, you may not need RDBMS at all. You may
 try with MongoDB. MongoDB is web scale:
 http://www.youtube.com/watch?v=b2F-DItXtZs

The client always has the option of connecting to a set of databases,
and stowing parts of the data hither and thither.  That often leads to
the relaxation called BASE.  (And IBM has been selling that relaxation
as MQ-Series since the early '90s!)

There often *ARE* cases where it is acceptable for some of the data to
not be as durable, because that data is readily reconstructed.  This is
particularly common for calculated/cached/aggregated data.

Many things can get relaxed for a data warehouse data store, where the
database is not authoritative, but rather aggregates data drawn from
other authoritative sources.  In such applications, neither the A, C, I,
nor the D are pointedly crucial, in the DW data store.

- We don't put the original foreign key constraints into the DW
  database; they don't need to be enforced a second time.  Ditto for
  constraints of all sorts.

- Batching of the loading of updates is likely to break several of the
  letters.  And I find it *quite* acceptable to lose D if the data may
  be safely reloaded into the DW database.

I don't think this is either cavalier nor that it points to MongoDB is
web scale.
-- 
cbbrowne,@,gmail.com
Rules  of the  Evil Overlord  #181.  I  will decree  that all  hay be
shipped in tightly-packed bales. Any wagonload of loose hay attempting
to pass through a checkpoint will be set on fire.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Dan Harris


 On 10/12/10 10:44 AM, Scott Carey wrote:

On Oct 12, 2010, at 8:39 AM, Dan Harris wrote:


  On 10/11/10 8:02 PM, Scott Carey wrote:

would give you a 1MB read-ahead.  Also, consider XFS and its built-in 
defragmentation.  I have found that a longer lived postgres DB will get extreme
file fragmentation over time and sequential scans end up mostly random.  
On-line file defrag helps tremendously.


We just had a corrupt table caused by an XFS online defrag.  I'm scared
to use this again while the db is live.  Has anyone else found this to
be safe?  But, I can vouch for the fragmentation issue, it happens very
quickly in our system.


What version?  I'm using the latest CentoOS extras build.

We've been doing online defrag for a while now on a very busy database with  
8TB of data.  Not that that means there are no bugs...

It is a relatively simple thing in xfs -- it writes a new file to temp in a way 
that allocates contiguous space if available, then if the file has not been 
modified since it was re-written it is essentially moved on top of the other 
one.  This should be safe provided the journaling and storage is safe, etc.

I'm not sure how to figure out what version of XFS we're on.. but it's 
Linux kernel 2.6.24-24 x86_64 on Ubuntu Server 8.04.3.  Postgres version 8.3


We're due for an upgrade on that server soon so we'll do some more 
testing once we upgrade.  Right now we are just living with the 
fragmentation.  I'm glad to hear the regular on-line defrag is working 
successfully, at least that gives me hope we can rely on it in the future.


-Dan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Tom Lane

So I spent a bit of quality time with oprofile this morning, and found
once again that there's no substitute for having actual data before
theorizing.

Test case software: current Git HEAD (plus one code change explained
below), compiled with --enable-debug to support oprofile, cassert off;
no other special configure options.  Running on current Fedora 13 (gcc
4.4.4 in particular).  All postgresql.conf options are out-of-the-box.

Test case hardware: recently purchased mid-grade desktop, dual Xeon
E5503 processors (Nehalem cores, 2GHZ), 4GB DDR3-800 RAM, no-name
SATA disk.

Test query: select count(*) from t where t has 4 nonnull integer
columns and 8192 rows, occupying 3459MB.  I chose that size
specifically to fit into available RAM, so that on repeated executions
no physical I/O will occur.

On this setup I find that select count(*) runs in about 7.5sec when
the data is fully cached in RAM, for a scanning speed of 460MB/sec.
This is well in excess of what the machine's disk hardware can do:
bonnie++ rates the machine's disk read speed at 152MB/sec.  So in theory
PG should be able to completely saturate the disk when processing a table
bigger than RAM.  In reality the test case run time if I've just flushed
cache is about 28sec, working out to a scan rate of 123MB/sec.  I expect
if I'd bothered to tune the kernel readahead parameters as outlined
earlier in this thread, I could get to 150MB/sec.

Now of course this disk setup is far from industrial strength, but the
processor isn't what you'd put in a serious database server either (in
particular, its available memory bandwidth is well behind the curve).
Also, the table is pretty narrow (only 16 payload bytes per row), and
any wider test table would show a pretty much linear scaling of achievable
scan rate versus table width.  So I don't see much support here at all
for the notion that we scan slower than available disk bandwidth.

Further details from poking at it with oprofile: in the fully-cached
case the CPU time is about 80% Postgres and 20% kernel.  That kernel
time is of course all to do with moving pages from kernel disk buffers
into Postgres shared memory.  Although I've not bothered to increase
shared_buffers from the default 32MB, it wouldn't matter on this benchmark
unless I were able to make shared_buffers hold the entire table ... and
even then I'd only save 20%.

oprofile further shows that (with stock Postgres sources) the userspace
runtime breaks down like this:

samples  %symbol name
141267   13.0810  heapgettup_pagemode
85947 7.9585  advance_aggregates
83031 7.6885  ExecProject
78975 7.3129  advance_transition_function
75060 6.9504  heapgetpage
73540 6.8096  ExecClearTuple
69355 6.4221  ExecProcNode
59288 5.4899  heap_getnext
57745 5.3470  ExecScan
55618 5.1501  HeapTupleSatisfiesMVCC
47057 4.3574  MemoryContextReset
41904 3.8802  ExecStoreTuple
37146 3.4396  SeqNext
32206 2.9822  ExecAgg
22135 2.0496  XidInMVCCSnapshot
21142 1.9577  int8inc
19280 1.7853  AllocSetReset
18211 1.6863  hash_search_with_hash_value
16285 1.5079  TransactionIdPrecedes

I also looked at the source-line-level breakdown, though that's too bulky
to post here.  The most interesting fact here is that tuple visibility
testing (MVCC) overhead is simply nonexistent: it'd be in heapgetpage()
if it were being done, which it isn't because all the pages of the table
have the PageIsAllVisible bit set.  In a previous run where those bits
weren't set but the per-tuple hint bits were, visibility testing still
only ate a percent or two of the runtime.  So the theory some people have
espoused in this thread that visibility testing is the bottleneck doesn't
hold water either.  If you go back and look at previous pgsql-hackers
discussions about that, what people have been worried about is not the CPU
cost of visibility testing but the need for indexscan queries to visit
the heap for no other purpose than to check the visibility flags.  In a
seqscan it's not going to matter.

I looked a bit more closely at the heapgettup_pagemode timing.  The
lines shown by opannotate as more than 0.1 percent of the runtime are

 22545  2.2074 :{ /* heapgettup_pagemode total: 153737 15.0528 */
  5685  0.5566 :boolbackward = ScanDirectionIsBackward(dir);
  5789  0.5668 :if (!scan-rs_inited)
  5693  0.5574 :lineindex = scan-rs_cindex + 1;
 11429  1.1190 :dp = (Page) BufferGetPage(scan-rs_cbuf);
  5693  0.5574 :linesleft = lines - lineindex;
  5766  0.5646 :while (linesleft  0)
  5129  0.5022 :lineoff = scan-rs_vistuples[lineindex];
 44461  4.3533 :tuple-t_data = (HeapTupleHeader) 
PageGetItem((Page) dp, lpp);
 11135  1.0903 :tuple-t_len = ItemIdGetLength(lpp);
  5692  0.5573 :if (key != NULL)
  5773  0.5653 :

[PERFORM] read only transactions

2010-10-12 Thread Jon Nelson

Are there any performance implications (benefits) to executing queries
in a transaction where
SET TRANSACTION READ ONLY;
has been executed?


-- 
Jon

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala


da...@lang.hm wrote:

On Tue, 12 Oct 2010, Mladen Gogala wrote:

  

da...@lang.hm wrote:

from a PR point of view, speeding up the trivil count(*) case could be 
worth it, just to avoid people complaining about it not being fast.



  
Fixing PR stuff is not the approach that I would take. People are complaining 
about select count(*) because they're using it in all the wrong places.



that may be the case, but if it's possible to make it less painful it will 
mean more people use postgres, both because it works better for them when 
they are using the suboptimal programs, but also because when people do 
their trivial testing of databases to decide which one they will use, they 
won't rule out postgres because it's so slow
  



There is no free lunch. If the count field is maintained somewhere, the 
concurrency will suffer. I find the idea of fixing the count delusion 
ridiculous, may Richard Dawkins forgive me for this pun. Saying that 
something is slow without testing and a proper
consideration is ridiculous. As a DBA, I usually get complaints like 
the database is slow today 3 times before lunch, every day.  The 
database is never slow, the database is a warehouse where you keep your 
data. What is slow is the access to the data, and that is done by, guess 
what, the application program. Almost always, it's the application 
that's slow, not the database. As for the select count(*), idiom, what 
are you trying to do? Where are you using it? If you are using it for 
pagination, consider the possibility of not specifying
the number of pages on the website, just the links next -- and prev 
--. Alternatively, you can fetch a small amount into the web page and 
direct the users who would like to see the complete information to a 
background reporting too. Mixing batch reports and online reports is a 
very easy thing to do. If you are using it to establish existence, 
you're doing it wrong. I've had a problem like that this morning. A 
developer came to me with the usual phrase that the database is slow.  
It was a PHP form which should write an output file and let the user 
know where the file is. The function looks like this:


function put_xls($sth) {
   global $FNAME;
   $FNAME=$FNAME..xls;
   $lineno=0;
   $ncols=$sth-FieldCount();
   for ($i = 0;$i = $ncols;$i++) {
   $cols[$i] = $sth-FetchField($i);
   $colnames[$i]=$cols[$i]-name;
   }
   $workbook = new Spreadsheet_Excel_Writer(/software$FNAME);
   $format_bold = $workbook-addFormat();
   $format_bold-setBold();
   $format_bold-setAlign('left');
   $format_left = $workbook-addFormat();
   $format_left-setAlign('left');
   $worksheet =  $workbook-addWorksheet('Moreover Search');
   $worksheet-writeRow($lineno++,0,$colnames,$format_bold);
   while($row=$sth-FetchRow()) {
   $worksheet-writeRow($lineno++,0,$row,$format_left);
   }
   $workbook-close();
   $cnt=$sth-Recordcount();
   return($cnt);
}

The relevant includes are here:

require ('Date.php');
require ('adodb5/tohtml.inc.php');
require_once ('adodb5/adodb.inc.php');
require_once ('adodb5/adodb-exceptions.inc.php');
require_once 'Spreadsheet/Excel/Writer.php';
$ADODB_FETCH_MODE = ADODB_FETCH_NUM;

So, what is the problem here? Why was the database slow? As it turns 
out, the PEAR module for writing Excel spreadsheets, which is the tool 
used here, creates the entire spreadsheet in  memory and writes it out 
on the close command. What was spinning was httpd process, the 
database was completely and utterly idle, rolling thumbs and awaiting 
orders. Using the fputcsv instead, made the function fly. The only 
thing that was lost were the bold column titles. Changing little things 
can result in the big performance gains. Making select count(*) 
unnaturally  fast would be tending to  bad programming practices. I am 
not sure that this is a desirable development. You can't expect people 
to adjust the database software to your application. Applications are 
always database specific. Writing  an application that will access a 
PostgreSQL database is not the same as writing an application that will 
access an Oracle database.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Jesper Krogh


On 2010-10-12 19:07, Tom Lane wrote:

Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
I don't think any of the previous discussion in this thread is on-point
at all, except for the parts where people suggested avoiding it.
   


I would have to say that allthough it is nice to get count(*) faster I
think your testing is way too simple.

It pretty much proves that in terms of the code involved in the
count(*) process there is not much to be achieved. But your table
has way to little payload. As PG currently is it will start by pushing
data off to TOAST when the tuple size reaches 1KB
and the speed of count(*) is very much dominated by the amount
of dead weight it has to draw in together with the heap-access for the
row on accessing the table. Creating a case where the table is this
slim is (in my viewpoint) very much to the extreme on the small side.

Just having 32 bytes bytes of payload would more or less double
you time to count if I read you test results correctly?. .. and in the
situation where diskaccess would be needed .. way more.

Dividing by pg_relation_size by the amout of tuples in our production
system I end up having no avg tuple size less than 100bytes.

.. without having complete insigt.. a visibillity map that could be used in
conjunction with indices would solve that. What the cost would be
of maintaining it is also a factor.

Jesper

--
Jesper

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Tom Lane

Jesper Krogh jes...@krogh.cc writes:
 On 2010-10-12 19:07, Tom Lane wrote:
 Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.

 Just having 32 bytes bytes of payload would more or less double
 you time to count if I read you test results correctly?. .. and in the
 situation where diskaccess would be needed .. way more.

 Dividing by pg_relation_size by the amout of tuples in our production
 system I end up having no avg tuple size less than 100bytes.

Well, yeah.  I deliberately tested with a very narrow table so as to
stress the per-row CPU costs as much as possible.  With any wider table
you're just going to be I/O bound.

 .. without having complete insigt.. a visibillity map that could be used in
 conjunction with indices would solve that. What the cost would be
 of maintaining it is also a factor.

I'm less than convinced that that approach will result in a significant
win.  It's certainly not going to do anything to convert COUNT(*) into
an O(1) operation, which frankly is what the complainants are expecting.
There's basically no hope of solving the PR problem without somehow
turning COUNT(*) into a materialized-view reference.  We've discussed
that in the past, and know how to do it in principle, but the complexity
and distributed overhead are daunting.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] read only transactions

2010-10-12 Thread Kevin Grittner

Jon Nelson jnelson+pg...@jamponi.net wrote:
 
 Are there any performance implications (benefits) to executing
 queries in a transaction where
 SET TRANSACTION READ ONLY;
 has been executed?
 
I don't think it allows much optimization in any current release.
 
It wouldn't be a bad idea to use it where appropriate, though, as
future releases might do something with it.  If you include this on
the BEGIN statement, that will save a round trip.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] read only transactions

2010-10-12 Thread Tom Lane

Jon Nelson jnelson+pg...@jamponi.net writes:
 Are there any performance implications (benefits) to executing queries
 in a transaction where
 SET TRANSACTION READ ONLY;
 has been executed?

No.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] read only transactions

2010-10-12 Thread Chris Browne

jnelson+pg...@jamponi.net (Jon Nelson) writes:
 Are there any performance implications (benefits) to executing queries
 in a transaction where
 SET TRANSACTION READ ONLY;
 has been executed?

Directly?  No.

Indirectly, well, a *leetle* bit...

Transactions done READ ONLY do not generate actual XIDs, which reduces
the amount of XID generation (pretty tautological!), which reduces the
need to do VACUUM to protect against XID wraparound.

  
http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-BASICS

If you process 50 million transactions, that chews thru 50 million XIDs.

If 45 million of those were processed via READ ONLY transactions, then
the same processing only chews thru 5 million XIDs, meaning that the
XID-relevant vacuums can be done rather less frequently.

This only terribly much matters if:
  a) your database is so large that there are tables on which VACUUM
 would run for a very long time, and

  b) you are chewing through XIDs mighty quickly.

If either condition isn't true, then the indirect effect isn't important
either.  
-- 
let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;;
I'm not switching from slrn.   I'm quite confident that anything that
*needs* to be posted in HTML is fatuous garbage not worth my time.
-- David M. Cook davec...@home.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] bulk load performance question

2010-10-12 Thread Samuel Gendler

I've got a few tables that periodically get entirely refreshed via a COPY. I
don't really have a good mechanism for detecting only rows which have
changed so even though the differences are small, a full copy is easiest.
 However, the data includes a primary key column, so I can't simply load
into the existing table and then drop older rows.  So we load into a table
with a different name and then, within a transaction, drop the old and
rename the new.  However, while the transaction will cause a query against
that table to block until the transaction commits, when the transaction
commits, the blocked query will fail with an error message like: ERROR:
 could not open relation with OID 17556

Is there some way to do the drop+rename in a manner which will preserve the
OID or otherwise allow blocked queries to execute correctly once they
unblock?

A secondary issue is that if permissions were granted to a role on the old
table, the new table does not acquire those permissions and they must be
granted again.

The biggest table that gets updated like this is a couple hundred thousand
rows, with maybe a few thousand rows actually changing or being added with
each load.  Suggestions for alternative mechanisms for doing the loading are
welcome.  I'd really rather avoid updating every row in a several hundred
thousand row table, especially without easy upsert functionality. The data
is small enough that selecting everything and then comparing in memory
before updating modified rows is doable, but sure seems like a lot of work
if it can be avoided.

Writing this caused me to think of a possible solution, which appears to
work correctly, but I'd like to confirm it with folks in the know:

Instead of this:

CREATE TABLE mytable_temp...;
COPY INTO mytable_temp...;
BEGIN;
DROP TABLE mytable;
ALTER TABLE mytable_temp RENAME TO mytable;
COMMIT;

Which will cause any overlapping queries to pick up the wrong OID for
mytable and then fail when the transaction commits, I tested this:

COPY INTO mytable_temp;
BEGIN;
ALTER TABLE mytable RENAME TO mytable_old;
ALTER TABLE mytable_temp RENAME TO mytable;
COMMIT;
DROP TABLE mytable_old;

It would appear that any query that uses mytable which overlaps with the
transaction will pick up the OID of the original mytable and then block
until the transaction commits.  WHen the transaction commits, those queries
will successfully run against the original OID (no queries write to this
table except for the bulk load) and will complete, at which time, the table
drop will finally complete.  Meanwhile, any queries which don't overlap (or
perhaps any queries which start after the rename from mytable_temp to
mytable has occurred) will successfully complete against the new table.

The net result appears to be that I will no longer suffer the missing OID
error, which seemed to periodically completely hose a db connection,
requiring that the connection be closed since no subequent queries would
ever succeed, whether they touched the table in question or not. I've only
seen that erroneous behaviour on 8.3 (so far - we only recently upgraded to
8.4.4), but it was fairly mysterious because I've never been able to
replicate it in testing.  I could get a single missing OID error, but never
one that would break all subsequent queries.

Are my assumptions about this correct?

Re: [PERFORM] read only transactions

2010-10-12 Thread Tom Lane

Chris Browne cbbro...@acm.org writes:
 jnelson+pg...@jamponi.net (Jon Nelson) writes:
 Are there any performance implications (benefits) to executing queries
 in a transaction where
 SET TRANSACTION READ ONLY;
 has been executed?

 Directly?  No.

 Indirectly, well, a *leetle* bit...

 Transactions done READ ONLY do not generate actual XIDs, which reduces
 the amount of XID generation (pretty tautological!), which reduces the
 need to do VACUUM to protect against XID wraparound.

You're right that a read-only transaction doesn't generate an XID.
But that is not a function of whether you do SET TRANSACTION READ ONLY;
it's a function of refraining from attempting any database changes.
The SET might be useful for clarifying and enforcing your intent, but
it's not a performance boost to use it, versus just doing the read-only
transaction without it.

Also, I believe that SET TRANSACTION READ ONLY isn't a hard read only
restriction anyway --- it'll still allow writes to temp tables for
example, which will cause assignment of an XID.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] bulk load performance question

2010-10-12 Thread Tom Lane

Samuel Gendler sgend...@ideasculptor.com writes:
 Is there some way to do the drop+rename in a manner which will preserve the
 OID or otherwise allow blocked queries to execute correctly once they
 unblock?

No, but you could consider 
begin;
truncate original_table;
insert into original_table select * from new_data;
commit;

 A secondary issue is that if permissions were granted to a role on the old
 table, the new table does not acquire those permissions and they must be
 granted again.

Not to mention foreign keys ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Pierre C




suggest that 99% instances of the select count(*) idiom are probably

bad use of the SQL language.


Well, suppose you paginate results. If the user sees that the search query  
returns 500 pages, there are two options :


- you're google, and your sorting algorithms are so good that the answer  
the user wants is in the first page
- or the user will refine his search by entering more keywords tu get a  
manageable result set


So, in both cases, the count(*) was useless anyway. And the slowest ones  
are the most useless, since the user will immediatey discard the result  
and refine his query.


If your full text search is slow, try Xapian or Lucene.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Neil Whelchel

On Tuesday 12 October 2010 07:19:57 you wrote:
  The biggest single problem with select count(*) is that it is
  seriously overused. People use that idiom to establish existence, which
  usually leads to a performance disaster in the application using it,
  unless the table has no more than few hundred records. SQL language, of
  which PostgreSQL offers an excellent implementation,  offers [NOT]
  EXISTS clause since its inception in the Jurassic era. The problem is
  with the sequential scan, not with counting. I'd even go as far as to
  suggest that 99% instances of the select count(*) idiom are probably
  bad use of the SQL language.
  
  I agree, I have seen many very bad examples of using count(*). I will go
  so far as to question the use of count(*) in my examples here. It there
  a better way to come up with a page list than using count(*)? What is
  the best method to make a page of results and a list of links to other
  pages of results? Am I barking up the wrong tree here?
 
 One way I have dealt with this on very large tables is to cache the
 count(*) at the application level (using memcached, terracotta, or
 something along those lines) and then increment that cache whenever you
 add a row to the relevant table.  On application restart that cache is
 re-initialized with a regular old count(*).  This approach works really
 well and all large systems in my experience need caching in front of the
 DB eventually.  If you have a simpler system with say a single
 application/web server you can simply store the value in a variable, the
 specifics would depend on the language and framework you are using.

I use this method when ever possible. I talked about it in my first post.
I generally keep a table around I call counts. It has many rows that store 
count numbers from frequently used views.
The one that I can't do anything about is the case where you nave no control 
over the WHERE clause, (or where there may be simply too many options to count 
everything ahead of time without making things even slower). That is the point 
of this entire thread, or was... ;)
-Neil-


 
 Another more all-DB approach is to create a statistics tables into which
 you place aggregated statistics rows (num deleted, num inserted, totals,
 etc) at an appropriate time interval in your code.  So you have rows
 containing aggregated statistics information for the past and some tiny
 portion of the new data happening right now that hasn't yet been
 aggregated.  Queries then look like a summation of the aggregated values
 in the statistics table plus a count(*) over just the newest portion of
 the data table and are generally very fast.
 
 Overall I have found that once things get big the layers of your app
 stack start to blend together and have to be combined in clever ways to
 keep speed up.  Postgres is a beast but when you run into things it
 can't do well just find a way to cache it or make it work together with
 some other persistence tech to handle those cases.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Mladen Gogala


Pierre C wrote:




Well, suppose you paginate results. If the user sees that the search query  
returns 500 pages, there are two options :
  
With Google, I usually lose patience on the page 3. All that I, as an 
end user, need to know is whether there are  more than 10 pages. The 
fact that there are 1776 pages in the result set is not particularly 
useful to me. I couldn't care less whether the number of returned pages  
is 1492, 1776 or 1861, I'm going to look at, at most, the first 5 of them.


- you're google, and your sorting algorithms are so good that the answer  
the user wants is in the first page
- or the user will refine his search by entering more keywords tu get a  
manageable result set


So, in both cases, the count(*) was useless anyway. And the slowest ones  
are the most useless, since the user will immediatey discard the result  
and refine his query.


If your full text search is slow, try Xapian or Lucene.

  
May I also recommend Sphinx? It's a very nice text search engine, with 
the price equal to that of Lucene.



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Neil Whelchel

On Tuesday 12 October 2010 14:35:01 you wrote:
  suggest that 99% instances of the select count(*) idiom are probably
  
  bad use of the SQL language.
 
 Well, suppose you paginate results. If the user sees that the search query
 returns 500 pages, there are two options :
 
 - you're google, and your sorting algorithms are so good that the answer
 the user wants is in the first page
 - or the user will refine his search by entering more keywords tu get a
 manageable result set
 
 So, in both cases, the count(*) was useless anyway. And the slowest ones
 are the most useless, since the user will immediatey discard the result
 and refine his query.
 
 If your full text search is slow, try Xapian or Lucene.

I guess I have to comment here again and point out that while I am having this 
issue with text searches, I avoid using count(*) in such cases, I just use 
next and previous links. Where the real problem (for me) is that when someone 
searches a date or time range. My application keeps track of huge amounts of 
realtime transactional data. So an administrator might want a report as to 
what some data point did yesterday between 3 and 4 PM. Under normal conditions 
the range of records that match can be between 0 and over 5,000. This is 
really killing me especially when the reporting people want a list of how many 
transactions each that were on points in a given zipcode had this morning 
between 8 and 9 AM, it takes about 5 minutes to run on a server that has 
enough ram to hold the entire table!

Pseudo query:
Show how many transactions per node in zipcode 92252 between 8:00 and 9:00 
today:

point_number | number_of_transactions
65889   |   31
34814   |   4865
28349   |   0
3358|   364
...

24 total rows,  5 minutes.

Then they want every node to be a link to a list of actual data within the 
specified timeframe.
This is where I have to to the same query twice to first find out how many for 
the page links, then again to get a page of results.
Sure, I could keep tables around that have numbers by the hour, minute, day or 
whatever to cache up results for speeding things, then the problem is that 
when the data is put into the server, there are so many statistics tables to 
update, the front end becomes a huge problem. Also, it makes for a huge mess 
of tables to think about when I need to make a report.

-Neil-



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Dan Harris


 On 10/12/10 4:33 PM, Neil Whelchel wrote:

On Tuesday 12 October 2010 08:39:19 Dan Harris wrote:

   On 10/11/10 8:02 PM, Scott Carey wrote:

would give you a 1MB read-ahead.  Also, consider XFS and its built-in
defragmentation.  I have found that a longer lived postgres DB will get
extreme file fragmentation over time and sequential scans end up mostly
random.  On-line file defrag helps tremendously.

We just had a corrupt table caused by an XFS online defrag.  I'm scared
to use this again while the db is live.  Has anyone else found this to
be safe?  But, I can vouch for the fragmentation issue, it happens very
quickly in our system.

-Dan

I would like to know the details of what was going on that caused your
problem. I have been using XFS for over 9 years, and it has never caused any
trouble at all in a production environment. Sure, I had many problems with it
on the test bench, but in most cases the issues were very clear and easy to
avoid in production. There were some (older) XFS tools that caused some
problems, but that is in the past, and as time goes on, it seems take less and
less planning to make it work properly.
-Neil-

There were roughly 50 transactions/sec going on at the time I ran it.  
xfs_db reported 99% fragmentation before it ran ( we haven't been 
running it via cron ).  The operation completed in about 15 minutes ( 
360GB of used data on the file system ) with no errors.  Everything 
seemed fine until the next morning when a user went to query a table we 
got a message about a missing file inside the pg cluster.  We were 
unable to query the table at all via psql.  It was a bit of a panic 
situation so we restored that table from backup immediately and the 
problem was solved without doing more research.


This database has been running for years with no problem ( and none 
since ), that was the first time I tried to do an on-line defrag and 
that was the only unusual variable introduced into the system at that 
time so it was a strong enough correlation for me to believe that caused 
it.  Hopefully this was just a corner case..


-Dan


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query much faster with enable_seqscan=0

2010-10-12 Thread Ogden


On Sep 21, 2010, at 6:30 PM, Tom Lane wrote:

 Ogden li...@darkstatic.com writes:
 SELECT tr.id, tr.sid
FROM
test_registration tr,
INNER JOIN test_registration_result r on (tr.id = 
 r.test_registration_id)
WHERE.

 tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
GROUP BY tr.id, tr.sid
 
 Seeing that tr.id is a primary key, I think you might be a lot better
 off if you avoided the inner join and group by.  I think what you really
 want here is something like
 
 SELECT tr.id, tr.sid
FROM
test_registration tr
WHERE

 tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
AND EXISTS(SELECT 1 FROM test_registration_result r
   WHERE tr.id = r.test_registration_id)
 
   regards, tom lane
 

Thank you for this suggestion, however, what if I wanted some columns from 
test_registration_result - this wouldn't work, for example if I wanted 
test_registration_result.answer to be fetched. Hence, I had to have a JOIN with 
test_registration_result and a GROUP BY. I still am not happy with my query - 
the EXISTS executes in great speed however I cannot retrieve any of the columns 
from that table. 

Thank you

Ogden


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Ron Mayer

Kevin Grittner wrote:
  
 ...Sybase named caches...segment off portions of the memory for
 specific caches... bind specific database
 objects (tables and indexes) to specific caches. ...
  
 When I posted to the list about it, the response was that LRU
 eviction was superior to any tuning any human would do.  I didn't
 and don't believe that
  
 FWIW, the four main reasons for using it were:
 (1) Heavily used data could be kept fully cached in RAM...

Lightly-used-but-important data seems like another use case.

LRU's probably far better than me at optimizing for the total
throughput and/or average response time.  But if there's a
requirement:
 Even though this query's very rare, it should respond
  ASAP, even at the expense of the throughput of the rest
  of the system.
it sounds like this kind of hand-tuning might be useful.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

57 matches

Mail list logo