Re: [PERFORM] PostgreSQL-9.0 Monitoring System to improve performance

2011-09-30 Thread Gregg Jaskiewicz
Looks like this is generally an area that can be targeted by some
businesses. Or an open source enthusiast.
One centre that captures all the information and produces a report
based on it would be a great thing. Especially in cases like mine,
where I have tens of postgresql installations on different hardware
and with different use patterns (but schemas and queries are the
same).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with order by and limit is very slow - wrong index used

2011-10-03 Thread Gregg Jaskiewicz
How many rows do you have in that table?

I think , that planner thinks that the element you are looking for is
so common - that it will be to expensive to use index to fetch it.
Perhaps try increasing default_statistics_target , and revacuuming the table.

You could also try changing it just for the column:

ALTER TABLE records ALTER id SET source_id 1000; vacuum analyze verbose records;

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with order by and limit is very slow - wrong index used

2011-10-03 Thread Gregg Jaskiewicz
2011/10/3 Nowak Michał :
>> How many rows do you have in that table?
>
> a9-dev=> select count(*) from records;
>  count
> -
> 3620311
> (1 row)


>
> a9-dev=> select source_id, count(*) from records where source_id = 
> 'http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml' or source_id = 
> 'http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml' group by source_id;
>                      source_id                        | count
> +
> http://ebuw.uw.edu.pl/dlibra/oai-pmh-repository.xml    | 427254
> http://www.wbc.poznan.pl/dlibra/oai-pmh-repository.xml | 989184
> (2 rows)

So the second one is roughly 27% of the table.  I don't know the
actual condition under which planner changes over the seqscan, but
that value seems quite common it seems.
The other thing planner is going to look at is the correlation, most
common values, most common frequencies.
In other words, if the value is 27% of all values, but is evenly
spread across - I think planner will go for seq scan regardless.

At the end of the day (afaik), index scan only pics pages for narrowed
down seqscan really. So imagine if your index scan returned all the
pages, you would still have to do a seqscan on all of them. Planner is
trying to avoid that by weighting the costs of both operations.
If it is too slow to run the current queries, you could try
normalizing the table by splitting source_id into separate one and
referencing it by an id. Very often what you'd find is that doing so
lowers I/O required, hence saves a lot of time in queries. Downside
is, that it is bit harder to add/update the tables. But that's where
triggers and procedures come handy.


>
>> ALTER TABLE records ALTER id SET source_id 1000; vacuum analyze verbose 
>> records;
> Did you mean ALTER TABLE records ALTER id SET STATISTICS 1000;?

Yup, that's what I meant. Sorry.


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with order by and limit is very slow - wrong index used

2011-10-04 Thread Gregg Jaskiewicz
2011/10/4 Nowak Michał :
>
> a9-dev=> select  attname, null_frac, avg_width, n_distinct, correlation from 
> pg_stats where tablename = 'records';
>               attname                | null_frac | avg_width | n_distinct | 
> correlation
> --+---+---++-
>  source_id                            |         0 |        54 |         69 |  
>   0.303059

http://www.postgresql.org/docs/9.0/interactive/view-pg-stats.html

"Statistical correlation between physical row ordering and logical
ordering of the column values. This ranges from -1 to +1. When the
value is near -1 or +1, an index scan on the column will be estimated
to be cheaper than when it is near zero, due to reduction of random
access to the disk. (This column is null if the column data type does
not have a < operator.)"

Kind of like I and Tom said, 0.3 correlation there sounds like the cause.
Seriously, try normalisation as well, before discarding it.


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance problem with a table with 38928077 record

2011-10-07 Thread Gregg Jaskiewicz
Do you need left join ?
Can you further normalize the tables? (to lower the I/O)
Can you upgrade to at least 8.3 ? It has huuge performance
improvements over 8.3.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow cursor

2011-10-26 Thread Gregg Jaskiewicz
Do you really need to query the catalogues ? That on its own is not a
good idea if you want something to run fast and frequently.


-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL 9.0.4 blocking in lseek?

2011-10-27 Thread Gregg Jaskiewicz
What does 'select * from pg_stat_activity' say, more precisely - the
"waiting" column.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance Problem with postgresql 9.03, 8GB RAM,Quadcore Processor Server--Need help!!!!!!!

2011-10-28 Thread Gregg Jaskiewicz
what sort of queries you are running against it ? the select * from..
is not really (hopefully) a query you are running from your php app.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] heavly load system spec

2012-04-05 Thread Gregg Jaskiewicz
I know this is a very general question. But if you guys had to specify
system (could be one server or cluster), with sustainable transaction
rate of 1.5M tps running postgresql, what configuration and hardware
would you be looking for ?
The transaction distribution there is 90% writes/updates and 10% reads.
We're talking 64 linux, Intel/IBM system.

I'm trying to see how that compares with Oracle system.

Thanks.

-- 
GJ

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MemSQL the "world's fastest database"?

2012-07-03 Thread Gregg Jaskiewicz
It sounds like a lot of marketing BS :)

But I like the fact that they use modern language like C++. It is a
pain to try doing any development on postgresql. Transition to c++
would be nice (I know it's been debated on #hackers a looot).

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Paged Query

2012-07-09 Thread Gregg Jaskiewicz
Use cursors.
By far the most flexible. offset/limit have their down sides.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-09 Thread Gregg Jaskiewicz
In my recent experience PgPool2 performs pretty badly as a pooler. I'd
avoid it if possible, unless you depend on other features.
It simply doesn't scale.



On 5 March 2013 21:59, Jeff Janes  wrote:

> On Tue, Mar 5, 2013 at 10:27 AM, Niels Kristian Schjødt <
> nielskrist...@autouncle.com> wrote:
>
>> Okay, thanks - but hey - if I put it at session pooling, then it says in
>> the documentation: "default_pool_size: In session pooling it needs to be
>> the number of max clients you want to handle at any moment". So as I
>> understand it, is it true that I then have to set default_pool_size to 300
>> if I have up to 300 client connections?
>>
>
> If those 300 client connections are all long-lived, then yes you need that
> many in the pool.  If they are short-lived connections, then you can have a
> lot less as any ones over the default_pool_size will simply block until an
> existing connection is closed and can be re-assigned--which won't take long
> if they are short-lived connections.
>
>
> And then what would the pooler then help on my performance - would that
>> just be exactly like having the 300 clients connect directly to the
>> database???
>>
>
> It would probably be even worse than having 300 clients connected
> directly.  There would be no point in using a pooler under those conditions.
>
>
> Cheers,
>
> Jeff
>



-- 
GJ


Re: [PERFORM] New server setup

2013-03-12 Thread Gregg Jaskiewicz
On 10 March 2013 15:58, Greg Smith  wrote:

> On 3/1/13 6:43 AM, Niels Kristian Schjødt wrote:
>
>> Hi, I'm going to setup a new server for my postgresql database, and I am
>> considering one of these: http://www.hetzner.de/hosting/**
>> produkte_rootserver/poweredge-**r720with
>>  four SAS drives in a RAID 10 array. Has any of you any particular
>> comments/pitfalls/etc. to mention on the setup? My application is very
>> write heavy.
>>
>
> The Dell PERC H710 (actually a LSI controller) works fine for write-heavy
> workloads on a RAID 10, as long as you order it with a battery backup unit
> module.  Someone must install the controller management utility and do
> three things however:
>
> We're going to go with either HP or IBM (customer's preference, etc).



> 1) Make sure the battery-backup unit is working.
>
> 2) Configure the controller so that the *disk* write cache is off.
>
> 3) Set the controller cache to "write-back when battery is available".
> That will use the cache when it is safe to do so, and if not it will bypass
> it.  That will make the server slow down if the battery fails, but it won't
> ever become unsafe at writing.
>
> See 
> http://wiki.postgresql.org/**wiki/Reliable_Writesfor
>  more information about this topic.  If you'd like some consulting help
> with making sure the server is working safely and as fast as it should be,
> 2ndQuadrant does offer a hardware benchmarking service to do that sort of
> thing: 
> http://www.2ndquadrant.com/en/**hardware-benchmarking/
>  I think we're even generating those reports in German now.



Thanks Greg. I will follow advice there, and also the one in your book. I
do always make sure they order battery backed cache (or flash based, which
seems to be what people use these days).

I think subject of using external help with setting things up did came up,
but more around connection pooling subject then hardware itself (shortly,
pgpool2 is crap, we will go with dns based solution and apps connection
directly to nodes).
I will let my clients (doing this on a contract) know that there's an
option to get you guys to help us. Mind you, this database is rather small
in grand scheme of things (30-40GB). Just possibly a lot of occasional
writes.

We wouldn't need German. But Proper English (i.e. british english) would
always be nice ;)


Whilst on the hardware subject, someone mentioned throwing ssd into the
mix. I.e. combining spinning HDs with SSD, apparently some raid cards can
use small-ish (80GB+) SSDs as external caches. Any experiences with that ?


Thanks !




-- 
GJ