Re: [PERFORM] Using Between

2010-09-23 Thread mark
 The question is how can we make it faster.

If there's just one region ID for any given postal code, you might try
adding a column to vehicleused and storing the postal codes there.
You could possibly populate that column using a trigger; probably it
doesn't change unless the postalcode changes.  Then you could index
that column and query against it directly, rather than joining to
PostalCodeRegionCountyCity.  Short of that, I don't see any obvious
way to avoid reading most of the vehicleused table.  There may or may
not be an index that can speed that up slightly and of course you can
always throw hardware at the problem, but fundamentally reading half a
million or more rows isn't going to be instantaneous.

Incidentally, it would probably simplify things to store postal codes
in the same case throughout the system. If you can avoid the need to
write lower(x) = lower(y) and just write x = y you may get better
plans.  I'm not sure that's the case in this particular example but
it's something to think about.

Something else you might test is bumping the read-ahead value. Most linux
installs have this at 256, might try bumping the value to ~8Meg and tune
from there . this may help you slightly for seq scan performance. As always:
YMMV. It's not going to magically fix low performing I/O subsystems and it
won't help many applications of PG but there are a few outlying instances
where this change can help a little bit. 


I am sure someone will step in and tell you it is a bad idea - AND they will
probably have perfectly valid reasons for why it is, so you will need to
consider the ramifications.. if at all possible test and tune to see. 
..: Mark



-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Useless sort by

2010-09-23 Thread gnuoytr
I can't tell if you meant for this to be insulting or my reading it that way is 
wrong, but it certainly wasn't put in a helpful tone.  Let me summarize for 
you.  You've been told that putting ORDER BY into a view is a generally poor 
idea anyway, that it's better to find ways avoid this class of concern 
altogether.  There are significant non-obvious technical challenges behind 
actually implementing the behavior you'd like to see; the concerns raised by 
Tom and Maciek make your idea impractical even if it were desired.  And for 
every person like yourself who'd see the benefit you're looking for, there are 
far more that would find a change in this area a major problem.  The concerns 
around breakage due to assumed but not required aspects of the relational model 
are the ones the users of the software will be confused by, not the developers 
of it.  You have the classification wrong; the feedback you've gotten here is 
from the developers being user oriented, not theory oriented or 
 c!
ode oriented.
-- 
Greg Smith, 2ndQuadrant US g...@2ndquadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us
Author, PostgreSQL 9.0 High PerformancePre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


Not insulting, just amused bemusement.  PG portrays itself as the best OS 
database, which it may well be.  But it does so by stressing the 
row-by-agonizing-row approach to data.  In other words, as just a record 
paradigm filestore for COBOL/java/C coders.  I was expecting more Relational 
oomph.  As Dr. Codd says:  A Relational Model of Data for Large Shared Data 
Banks.  Less code, more data.

robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] how to enforce index sub-select over filter+seqscan

2010-09-23 Thread Dmitry Teslenko
Hello!

I have this table:

create table test (
s1 varchar(255),
s2 varchar(255),
i1 integer,
i2 integer,

... over 100 other fields

);

table contains over 8 million records

there's these indexes:

create index is1 on test (s1);
create index is2 on test (s2);
create index ii1 on test (i1);
create index ii2 on test (i2);
create index ii3 on test (i1, i2);

and then i run this query:

select
*
from (
select  *
from test
where
is1 = 'aa' or is2 = 'aa'
)
where
is1 = 1
or (is1  = 1
and is2 = 1)
or (is1  = 2
and is2 = 2)
or (is1  = 3
and is2 = 3)


where part of outer query can have different count of
or (is1  = N
and is2 = M)
expressions, lets name this number X.

When X is low planner chooses index scan using is1 and is2,
then BitmapAnd that with index scan using  ii1, ii2 or ii3.

But when X is big enough ( 15) planner chooses seqscan and filter on
i1, i2, s1, s2.
Seqscan is very slow and I want to avoid it. Subquery is very fast
and i don't know why postgres chooses that plan.

I know I can set enable_seqscan = off.
Is there other ways to enforce index usage?

postgres pg_class have right estimate of rowcount.

-- 
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to enforce index sub-select over filter+seqscan

2010-09-23 Thread Kevin Grittner
Dmitry Teslenko dtesle...@gmail.com wrote:
 
 Seqscan is very slow and I want to avoid it. Subquery is very fast
 and i don't know why postgres chooses that plan.
 
 I know I can set enable_seqscan = off.
 Is there other ways to enforce index usage?
 
If you come at it from that angle, you probably won't get the best
resolution.  PostgreSQL can see the alternative plans, and develops
estimated costs of running each.  It uses the one that it thinks
will be fastest.  If it's wrong, there's probably something wrong
with the statistics it uses for estimating, or with the costing
information.  (There are some cases where it's not able to
accurately estimate costs even if these are right, but let's check
the more common cases first.)
 
Please provide a little more information, like PostgreSQL version,
the postgresql.conf contents (excluding comments), OS, hardware, and
the EXPLAIN ANALYZE output of the query with and without
enable_seqscan = off.
 
Other useful ideas here:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Useless sort by

2010-09-23 Thread Scott Marlowe
On Thu, Sep 23, 2010 at 7:51 AM,  gnuo...@rcn.com wrote:
 Not insulting, just amused bemusement.  PG portrays itself as the best OS 
 database, which it may well be.  But it does so by stressing the 
 row-by-agonizing-row approach to data.  In other words, as just a record 
 paradigm filestore for COBOL/java/C coders.  I was expecting more Relational 
 oomph.  As Dr. Codd says:  A Relational Model of Data for Large Shared Data 
 Banks.  Less code, more data.

So what, exactly, would give pgsql more relationally oomph?

Your assertion feels pretty hand wavy right now.

-- 
To understand recursion, one must first understand recursion.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to enforce index sub-select over filter+seqscan

2010-09-23 Thread Merlin Moncure
On Thu, Sep 23, 2010 at 10:26 AM, Dmitry Teslenko dtesle...@gmail.com wrote:
 Hello!

 I have this table:

 create table test (
        s1 varchar(255),
        s2 varchar(255),
        i1 integer,
        i2 integer,

 ... over 100 other fields

 );

 table contains over 8 million records

 there's these indexes:

 create index is1 on test (s1);
 create index is2 on test (s2);
 create index ii1 on test (i1);
 create index ii2 on test (i2);
 create index ii3 on test (i1, i2);

 and then i run this query:

 select
 *
 from (
        select  *
        from test
        where
                is1 = 'aa' or is2 = 'aa'
        )
 where
        is1 = 1
        or (is1  = 1
                and is2 = 1)
        or (is1  = 2
                and is2 = 2)
        or (is1  = 3
                and is2 = 3)

hm, I think you meant to say:
s1 = 'aa' or s2 = 'aa', i1 = 1 ... etc. details are important!

Consider taking the combination of 'correct' pair of i1 and i2 and
building a table with 'values' and joining to that:

select  * from test
  join
  (
values (2,2),  (3,3), ...
  ) q(i1, i2) using(i1,i2)
  where  s1 = 'aa' or s2 = 'aa' or i1=1

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] locking issue on simple selects?

2010-09-23 Thread Tobias Brox
On 15 September 2010 12:05, Tobias Brox tobi...@gmail.com wrote:
 Recently we've frequently encountered issues where some simple selects
 (meaning, selects doing an index lookup and fetching one row) have
 become stuck for several minutes.  Apparently all requests on one
 exact table gets stuck, all requests not related to said table are
 going through without any problems.

Now I've set up all kind of logging regarding locks, so it seems like
we're having issues that aren't lock-related.  I just did a bit of
research into one situation today.

All while having this problem, there was one heavy query running in
parallell ... not sure if that's relevant.

Then comes one query that requires a seq scan on the problem table
(that won't happen again - I just added a new index).  Four seconds
later comes another query requiring a simple index lookup.  Still more
queries comes in, most of them simple index lookups, but on different
indexes.  After one minute there are 25 queries in the
pg_stat_activity view towards this table.  It's not a particularly
huge table.  Moments later all 25 queries have been executed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] locking issue on simple selects?

2010-09-23 Thread Kevin Grittner
Tobias Brox tobi...@gmail.com wrote:
 
 All while having this problem, there was one heavy query running
 in parallell ... not sure if that's relevant.
 
Have you turned on checkpoint logging?  You might want to see if
these are happening at some particular point in the checkpoint
processing.  If so, look through the archives for posts from Greg
Smith on how to tune that -- he's worked out a nice methodology to
iteratively improve your configuration in this regard.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Memory usage - indexes

2010-09-23 Thread Tobias Brox
We've come to a tipping point with one of our database servers, it's
generally quite loaded but up until recently it was handling the load
well - but now we're seeing that it struggles to process all the
selects fast enough.  Sometimes we're observing some weird lock-like
behaviour (see my other post on that), but most of the time the
database server is just not capable of handling the load fast enough
(causing the queries to pile up in the pg_stat_activity-view).

My main hypothesis is that all the important indexes would fit snuggly
into the memory before, and now they don't.  We'll eventually get the
server moved over to new and improved hardware, but while waiting for
that to happen we need to do focus on reducing the memory footprint of
the database.  I have some general questions now ...

1) Are there any good ways to verify my hypothesis?  Some months ago I
thought of running some small memory-gobbling program on the database
server just to see how much memory I could remove before we would see
indications of the database being overloaded.  It seems a bit radical,
but I think the information learned from such an experiment would be
very useful ... and we never managed to set up any testing environment
that faithfully replicates production traffic.  Anyway, it's sort of
too late now that we're already observing performance problems even
without the memory gobbling script running.

2) I've seen it discussed earlier on this list ... shared_buffers vs
OS caches.  Some claims that it has very little effect to adjust the
size of the shared buffers.  Anyway, isn't it a risk that memory is
wasted because important data is stored both in the OS cache and the
shared buffers?  What would happen if using almost all the available
memory for shared buffers?  Or turn it down to a bare minimum and let
the OS do almost all the cache handling?

3) We're discussing to drop some overlapping indexes ... i.e. to drop
one out of two indexes looking like this:

some_table(a)
some_table(a,b)

Would the query select * from some_table where a=? run slower if we
drop the first index?  Significantly?

(in our situation I found that the number of distinct b's for each a
is low and that the usage stats on the second index is quite low
compared with the first one, so I think we'll drop the second index).

4) We're discussing to drop other indexes.  Does it make sense at all
as long as we're not experiencing problems with inserts/updates?  I
suppose that if the index isn't used it will remain on disk and won't
affect the memory usage ... but what if the index is rarely used ...
wouldn't it be better to do a seqscan on a table that is frequently
accessed and mostly in memory than to consult an index that is stored
on the disk?

Sorry for all the stupid questions ;-)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Memory usage - indexes

2010-09-23 Thread Mark Kirkwood

On 24/09/10 09:50, Tobias Brox wrote:

We've come to a tipping point with one of our database servers, it's
generally quite loaded but up until recently it was handling the load
well - but now we're seeing that it struggles to process all the
selects fast enough.  Sometimes we're observing some weird lock-like
behaviour (see my other post on that), but most of the time the
database server is just not capable of handling the load fast enough
(causing the queries to pile up in the pg_stat_activity-view).

My main hypothesis is that all the important indexes would fit snuggly
into the memory before, and now they don't.  We'll eventually get the
server moved over to new and improved hardware, but while waiting for
that to happen we need to do focus on reducing the memory footprint of
the database.  I have some general questions now ...

1) Are there any good ways to verify my hypothesis?  Some months ago I
thought of running some small memory-gobbling program on the database
server just to see how much memory I could remove before we would see
indications of the database being overloaded.  It seems a bit radical,
but I think the information learned from such an experiment would be
very useful ... and we never managed to set up any testing environment
that faithfully replicates production traffic.  Anyway, it's sort of
too late now that we're already observing performance problems even
without the memory gobbling script running.

2) I've seen it discussed earlier on this list ... shared_buffers vs
OS caches.  Some claims that it has very little effect to adjust the
size of the shared buffers.  Anyway, isn't it a risk that memory is
wasted because important data is stored both in the OS cache and the
shared buffers?  What would happen if using almost all the available
memory for shared buffers?  Or turn it down to a bare minimum and let
the OS do almost all the cache handling?

3) We're discussing to drop some overlapping indexes ... i.e. to drop
one out of two indexes looking like this:

some_table(a)
some_table(a,b)

Would the query select * from some_table where a=? run slower if we
drop the first index?  Significantly?

(in our situation I found that the number of distinct b's for each a
is low and that the usage stats on the second index is quite low
compared with the first one, so I think we'll drop the second index).

4) We're discussing to drop other indexes.  Does it make sense at all
as long as we're not experiencing problems with inserts/updates?  I
suppose that if the index isn't used it will remain on disk and won't
affect the memory usage ... but what if the index is rarely used ...
wouldn't it be better to do a seqscan on a table that is frequently
accessed and mostly in memory than to consult an index that is stored
on the disk?

Sorry for all the stupid questions ;-)

   



All good questions! Before (or maybe as well as) looking at index sizes 
vs memory I'd check to see if any of your commonly run queries have 
suddenly started to use different plans due to data growth, e.g:


- index scan to seq scan (perhaps because effective_cache_size is too 
small now)

- hash agg to sort (work_mem too small now)

We had a case of the 1st point happen here a while ago, symptoms looked 
very like what you are describing.


Re index size, you could try indexes like:

some_table(a)
some_table(b)

which may occupy less space, and the optimizer can bitmap and/or them to 
work like the compound index some_table(a,b).


regards

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] locking issue on simple selects?

2010-09-23 Thread Tobias Brox
On 23 September 2010 22:55, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Have you turned on checkpoint logging?

Yes ... it seems so:

13:19:13.840 - LOG: checkpoint complete: wrote 3849 buffers (0.2%); 0
transaction log file(s) added, 0 removed, 5 recycled; write=269.551 s,
sync=0.103 s, total=269.953 s
13:19:13.841 - LOG:  checkpoint starting: xlog
13:19:33 - the seq scan query towards the affected table started
13:20:31.454 - one of the index lookup queries towards the affected
table was finished
13:20:43.176 - LOG:  checkpoint complete: wrote 108199 buffers (6.9%);
0 transaction log file(s) added, 0 removed, 16 recycled; write=11.521
s, sync=77.533 s, total=89.335 s

 You might want to see if
 these are happening at some particular point in the checkpoint
 processing.  If so, look through the archives for posts from Greg
 Smith on how to tune that -- he's worked out a nice methodology to
 iteratively improve your configuration in this regard.

Thank you, I will ... hmm ... I found this blog post:

http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html

Of course I'm doing it my own way:

select *,now() as snapshot into tmp_pg_stat_bgwriter from pg_stat_bgwriter ;

create view tmp_delta_pg_stat_bgwriter as
 select a.checkpoints_timed-b.checkpoints_timed as
checkpoints_timed,a.checkpoints_req-b.checkpoints_req as
checkpoints_req,a.buffers_checkpoint-b.buffers_checkpoint as
buffers_checkpoint,a.buffers_clean-b.buffers_clean as
buffers_clean,a.maxwritten_clean-b.maxwritten_clean as
maxwritten_clean,a.buffers_backend-b.buffers_backend as
buffers_backend,a.buffers_alloc-b.buffers_alloc as buffers_alloc,
now()-b.snapshot as interval
 from pg_stat_bgwriter a ,
 (select * from tmp_pg_stat_bgwriter order by snapshot desc limit 1) as b;

Checkpoint timeout is set to 5 minutes.  Right now we're having
relatively low activity.  I'm not sure how to read the stats below,
but they look OK to me:

select * from tmp_delta_pg_stat_bgwriter ;
 checkpoints_timed | checkpoints_req | buffers_checkpoint |
buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc |
 interval
---+-++---+--+-+---+-
 3 |   0 |   8277 |
15 |0 | 185 | 18691 |
00:12:02.988842
(1 row)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] locking issue on simple selects?

2010-09-23 Thread Kevin Grittner
Tobias Brox tobi...@gmail.com wrote:
 
 13:19:13.840 - LOG: checkpoint complete
 
 13:19:13.841 - LOG:  checkpoint starting
 
 13:20:43.176 - LOG:  checkpoint complete
 
There wasn't a lot of time between the completion of one checkpoint
and the start of the next.  And the two checkpoints finished a
minute and a half apart.  Perhaps you need to boost your
checkpoint_segments setting?  What is it now?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance