Re: [PERFORM] Duplicate deletion optimizations

2012-01-08 Thread Pierre C

That's almost identical to my tables.


You explained your problem very well ;)


I certainly will. Many thanks for those great lines of SQL!


You're welcome !
Strangely I didn't receive the mail I posted to the list (received yours  
though).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Duplicate deletion optimizations

2012-01-07 Thread Pierre C



It's a fairly tricky problem. I have a number of sensors producing
energy data about every 5 minutes, but at random times between 1 and
15 minutes. I can't change that as that's the way the hardware of the
sensors works. These feed into another unit, which accumulates them
and forwards them in batches over the Internet to my PostgreSQL
database server every few minutes (again at random times outside my
control and with random batch sizes). To make things worse, if the
Internet connection between the unit and the database server fails, it
will send the latest data first to provide a quick update to the
current values and then send the backlog of stored values. Thus, data
do not always arrive in correct time order.


I'm stuck home with flu, so I'm happy to help ;)

I'll build an example setup to make it clearer...

-- A list of all sensors
create table sensors( sensor_id integer primary key );
insert into sensors select generate_series(1,100);

-- A table to contain raw sensor data
create table log(
  sensor_id integer not null references sensors(sensor_id),
  time integer not null,
  value float not null
);

-- Fill it up with test data
insert into log
select sensor_id, time, time from (
  select distinct sensor_id,
(n+random()*10)::INTEGER as time
  from generate_series(0,5,5) n
   cross join sensors
) d;

-- index it
alter table log add primary key( time, sensor_id );
create index log_sensor_time on log( sensor_id, time );

select * from log where sensor_id=1 order by time;
 sensor_id | time  | value
---+---+---
 1 |12 |12
 1 |14 |14
 1 |21 |21
 1 |29 |29
 1 |30 |30
()
 1 | 49996 | 49996
 1 | 50001 | 50001

-- create a table which will contain the time ticks
-- which will be used as x-axis for interpolation
-- (in this example, one tick every 10 time units)

create table ticks( time integer primary key,
   check( time%10 = 0 ) );
insert into ticks select
  generate_series( 0, (select max(time) from log), 10 );

-- create interpolated values table
create table interp(
  sensor_id integer not null references sensors( sensor_id ),
  time  integer not null references ticks( time ),
  value float,
  distance  integer not null
);

-- fill interpolated values table
-- (pretty slow)

insert into interp
select
sensor_id,
t.time,
start_value +  
(end_value-start_value)*(t.time-start_time)/(end_time-start_time),

greatest( t.time - start_time, end_time-t.time )
  from
(select
  sensor_id,
  lag(time) over (partition by sensor_id order by time) as start_time,
  time as end_time,
  lag(value) over (partition by sensor_id order by time) as  
start_value,

  value as end_value
from log
) as l
  join ticks t on (t.time = start_time and t.time  end_time);

-- alternate query if you don't like the ticks table (same sesult) :
insert into interp
select
sensor_id,
time,
start_value +  
(end_value-start_value)*(time-start_time)/(end_time-start_time),

greatest( time - start_time, end_time-time )
  from
  (select
*,
generate_series( ((start_time+9)/10)*10, ((end_time-1)/10)*10, 10 ) AS  
time

from
  (select
sensor_id,
lag(time) over (partition by sensor_id order by time) as  
start_time,

time as end_time,
lag(value) over (partition by sensor_id order by time) as  
start_value,

value as end_value
  from log
  ) as l
) l;

alter table interp add primary key( time,sensor_id );
create index interp_sensor_time on interp( sensor_id, time );

For each interval in the log table that contains a time tick, this query  
generates the interpolated data at that tick.


Note that the distance field represents the distance (in time) between  
the interpolated value and the farthest real data point that was used to  
calculate it. Therefore, it can be used as a measure of the quality of the  
interpolated point ; if the distance is greater than some threshold, the  
value might not be that precise.


Now, suppose we receive a bunch of data. The data isn't ordered according  
to time.

There are two possibilities :

- the new data starts right where we left off (ie, just after the last  
time for each sensor in table log)
- the new data starts later in time, and we want to process the results  
right away, expecting to receive, at some later point, older data to fill  
the holes


The second one is hairier, lets' do that.

Anyway, let's create a packet :

-- A table to contain raw sensor data
create temporary table packet(
  sensor_id integer not null,
  time integer not null,
  value float not null
);

-- Fill it up with test data
insert into packet
select sensor_id, time, time from (
  select distinct sensor_id,
(n+random()*10)::INTEGER as time
  from generate_series(50200,50400) n
   cross join sensors
) d;

Note that I deliberately inserted a hole : the log table 

Re: [PERFORM] Duplicate deletion optimizations

2012-01-07 Thread Pierre C



It's a fairly tricky problem. I have a number of sensors producing
energy data about every 5 minutes, but at random times between 1 and
15 minutes. I can't change that as that's the way the hardware of the
sensors works. These feed into another unit, which accumulates them
and forwards them in batches over the Internet to my PostgreSQL
database server every few minutes (again at random times outside my
control and with random batch sizes). To make things worse, if the
Internet connection between the unit and the database server fails, it
will send the latest data first to provide a quick update to the
current values and then send the backlog of stored values. Thus, data
do not always arrive in correct time order.


I'm stuck home with flu, so I'm happy to help ;)

I'll build an example setup to make it clearer...

-- A list of all sensors
create table sensors( sensor_id integer primary key );
insert into sensors select generate_series(1,100);

-- A table to contain raw sensor data
create table log(
sensor_id integer not null references sensors(sensor_id),
time integer not null,
value float not null
);

-- Fill it up with test data
insert into log
select sensor_id, time, time from (
select distinct sensor_id,
  (n+random()*10)::INTEGER as time
from generate_series(0,5,5) n
 cross join sensors
) d;

-- index it
alter table log add primary key( time, sensor_id );
create index log_sensor_time on log( sensor_id, time );

select * from log where sensor_id=1 order by time;
   sensor_id | time  | value
---+---+---
   1 |12 |12
   1 |14 |14
   1 |21 |21
   1 |29 |29
   1 |30 |30
()
   1 | 49996 | 49996
   1 | 50001 | 50001

-- create a table which will contain the time ticks
-- which will be used as x-axis for interpolation
-- (in this example, one tick every 10 time units)

create table ticks( time integer primary key,
 check( time%10 = 0 ) );
insert into ticks select
generate_series( 0, (select max(time) from log), 10 );

-- create interpolated values table
create table interp(
sensor_id integer not null references sensors( sensor_id ),
time  integer not null references ticks( time ),
value float,
distance  integer not null
);

-- fill interpolated values table
-- (pretty slow)

insert into interp
select
  sensor_id,
  t.time,
  start_value +
(end_value-start_value)*(t.time-start_time)/(end_time-start_time),
  greatest( t.time - start_time, end_time-t.time )
from
  (select
sensor_id,
lag(time) over (partition by sensor_id order by time) as  
start_time,

time as end_time,
lag(value) over (partition by sensor_id order by time) as
start_value,
value as end_value
  from log
  ) as l
join ticks t on (t.time = start_time and t.time  end_time);

-- alternate query if you don't like the ticks table (same sesult) :
insert into interp
select
  sensor_id,
  time,
  start_value +
(end_value-start_value)*(time-start_time)/(end_time-start_time),
  greatest( time - start_time, end_time-time )
from
(select
  *,
  generate_series( ((start_time+9)/10)*10, ((end_time-1)/10)*10, 10 )  
AS

time
  from
(select
  sensor_id,
  lag(time) over (partition by sensor_id order by time) as
start_time,
  time as end_time,
  lag(value) over (partition by sensor_id order by time) as
start_value,
  value as end_value
from log
) as l
  ) l;

alter table interp add primary key( time,sensor_id );
create index interp_sensor_time on interp( sensor_id, time );

For each interval in the log table that contains a time tick, this query
generates the interpolated data at that tick.

Note that the distance field represents the distance (in time) between
the interpolated value and the farthest real data point that was used to
calculate it. Therefore, it can be used as a measure of the quality of the
interpolated point ; if the distance is greater than some threshold, the
value might not be that precise.

Now, suppose we receive a bunch of data. The data isn't ordered according
to time.
There are two possibilities :

- the new data starts right where we left off (ie, just after the last
time for each sensor in table log)
- the new data starts later in time, and we want to process the results
right away, expecting to receive, at some later point, older data to fill
the holes

The second one is hairier, lets' do that.

Anyway, let's create a packet :

-- A table to contain raw sensor data
create temporary table packet(
sensor_id integer not null,
time integer not null,
value float not null
);

-- Fill it up with test data
insert into packet
select sensor_id, time, time from (
select distinct sensor_id,
  (n+random()*10)::INTEGER as time
from generate_series(50200,50400) n
 

Re: [PERFORM] unlogged tables

2011-12-04 Thread Pierre C
My table is a statistics counters table, so I can live with a partial  
data

loss, but not with a full data loss because many counters are weekly and
monthly.

Unlogged table can increase speed, this table has about 1.6 millions of
update per hour, but unlogged with a chance of loss all information on a
crash are not a good idea for this.


You could use an unlogged table for hourly updates, and periodically,  
accumulate those counters to a (logged) daily/weekly table...


The hourly table could be rebuilt by examining only 1 hour's worth of  
data, so it isn't too much of a problem if it's lost. The other tables  
would get much less updates.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Degrading PostgreSQL 8.4 write performance

2011-06-19 Thread Pierre C



Load testing of postgresql 8.4 for OLTP application
suitability showed that throughput of the
database significantly degraded over time from thousands of write
transactions per second to almost zero.


A typical postgres benchmarking gotcha is :

- you start with empty tables
- the benchmark fills them
- query plans which were prepared based on stats of empty (or very small)  
tables become totally obsolete when the table sizes grow

- therefore everything becomes very slow as the tables grow

So you should disconnect/reconnect or issue a DISCARD ALL periodically on  
each connection, and of course periodically do some VACUUM ANALYZE (or  
have autovacuum do that for you).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Delete performance

2011-05-31 Thread Pierre C
If i run 30,000 prepared DELETE FROM xxx WHERE ID = ? commands it  
takes close to 10 minutes.


Do you run those in a single transaction or do you use one transaction per  
DELETE ?


In the latter case, postgres will ensure each transaction is commited to  
disk, at each commit. Since this involves waiting for the physical I/O to  
happen, it is slow. If you do it 30.000 times, it will be 30.000 times  
slow.


Note that you should really do :

DELETE FROM table WHERE id IN (huge list of ids).

or

DELETE FROM table JOIN VALUES (list of ids) ON (...)

Also, check your foreign keys using cascading deletes have indexes in the  
referencing tables. Without an index, finding the rows to cascade-delete  
will be slow.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-25 Thread Pierre C

You wrote

Try to create a btree index on (bench_hstore-bench_id) WHERE
(bench_hstore-bench_id) IS NOT NULL.


What  do you mean exactly?
= CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps) WHERE
??? IS NOT NULL;

My table's def is:

CREATE TABLE myhstore ( id bigint PRIMARY KEY, kvps hstore NOT NULL );

So I'm doing something like:
CREATE INDEX myhstore_kps_gin_idx ON myhstore USING gin(kvps);


Hello ;

I meant a plain old btree index like this :

CREATE INDEX foo ON myhstore((kvps-'yourkeyname')) WHERE
(kvps-'yourkeyname') IS NOT NULL;

The idea is that :

- The reason to use hstore is to have an arbitrary number of keys and use
the keys you want, not have a fixed set of columns like in a table
- Therefore, no hstore key is present in all rows (if it was, you'd make
it a table column, and maybe index it)
- You'll probably only want to index some of the keys/values (avoiding to
index values that contain serialized data or other stuff that never
appears in a WHERE clause)

So, for each key that corresponds to a searchable attribute, I'd use a
conditional index on that key, which only indexes the relevant rows. For
keys that never appear in a WHERE, no index is needed.

gist is good if you want the intersecton of a hstore with another one (for
instance), btree is good if you want simple search or range search.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: FW: [PERFORM] KVP table vs. hstore - hstore performance (Was: Postgres NoSQL emulation)

2011-05-23 Thread Pierre C



Hi Merlin

The analyze command gave the following result:

On the KVP table:
Index Scan using kvpidx on bench_kvp (cost=0.00..8.53 rows=1 width=180)  
(actual time=0.037..0.038 rows=1 loops=1)

Index Cond: (bench_id = '20_20'::text)
Total runtime: 0.057 ms

And on the Hstore table:
Bitmap Heap Scan on bench_hstore (cost=32.22..3507.54 rows=1000  
width=265) (actual time=145.040..256.173 rows=1 loops=1)

Recheck Cond: (bench_hstore @ 'bench_id=20_20'::hstore)
- Bitmap Index Scan on hidx (cost=0.00..31.97 rows=1000 width=0)  
(actual time=114.748..114.748 rows=30605 loops=1)

Index Cond: (bench_hstore @ 'bench_id=20_20'::hstore)
Total runtime: 256.211 ms

For Hstore I'm using a GIST index.



Try to create a btree index on (bench_hstore-bench_id) WHERE  
(bench_hstore-bench_id) IS NOT NULL.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use 1 core

2011-05-11 Thread Pierre C



I suspect your app is doing lots of tiny single-row queries instead of
efficiently batching things. It'll be wasting huge amounts of time
waiting for results. Even if every query is individually incredibly
fast, with the number of them you seem to be doing you'll lose a LOT of
time if you loop over lots of little SELECTs.


Using unix sockets, you can expect about 10-20.000 queries/s on small  
simple selects per core, which is quite a feat. TCP adds overhead, so it's  
slower. Over a network, add ping time.


In plpgsql code, you avoid roundtrips, data serializing, and context  
switches, it can be 2-4x faster.


But a big SQL query can process millions of rows/s, it is much more  
efficient.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres NoSQL emulation

2011-05-11 Thread Pierre C



why even  have multiple rows? just jam it all it there! :-D


LOL

But seriously, when using an ORM to stuff an object hierarchy into a  
database, you usually get problems with class inheritance, and all  
solutions suck more or less (ie, you get a zillion tables, with assorted  
pile of JOINs, or stinky key/attributes schemes where all attributes end  
up as TEXT, or a table with 200 columns, most of them being NULL for a  
given line).


NoSQL guys say hey just use NoSQL !.

In a (common) case where the classes have some fields in common and othen  
searched, and that the DB needs to know about and access easily, those  
become columns, with indexes. Then the other fields which only occur in  
some derived class and are not very interesting to the DB get shoved into  
a hstore. The big bonus being that you use only one table, and the extra  
fields can still be accessed and indexed (but a little slower than a  
normal column). However I believe hstore can only store TEXT values...


Could be interesting. Or not.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres refusing to use 1 core

2011-05-11 Thread Pierre C


This is a problem I encounter constantly wherever I go. Programmer  
selects millions of rows from giant table. Programmer loops through  
results one by one doing some magic on them. Programmer submits queries  
back to the database. Even in batches, that's going to take ages.


Reminds me of a recent question on stackoverflow :

http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations

And the answer :

http://stackoverflow.com/questions/5952020/how-to-optimize-painfully-slow-mysql-query-that-finds-correlations/5954041#5954041

OP was thinking row-based, with subqueries in the role of doing some  
magicm.
Using a set-based solution with cascading WITH CTEs (and using the  
previous CTE as a source in the next one for aggregation) = 100x speedup !


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Postgres NoSQL emulation

2011-05-10 Thread Pierre C


While reading about NoSQL,

MongoDB let's you store and search JSON objects.In that case, you don't  
need to have the same columns in each row


The following ensued. Isn't it cute ?

CREATE TABLE mongo ( id SERIAL PRIMARY KEY, obj hstore NOT NULL );
INSERT INTO mongo (obj) SELECT ('a='||n||',key'||(n%10)||'='||n)::hstore  
FROM generate_series(1,10) n;


SELECT * FROM mongo LIMIT 10;
 id |   obj
+-
  1 | a=1, key1=1
  2 | a=2, key2=2
  3 | a=3, key3=3
  4 | a=4, key4=4
  5 | a=5, key5=5
  6 | a=6, key6=6
  7 | a=7, key7=7
  8 | a=8, key8=8
  9 | a=9, key9=9
 10 | a=10, key0=10

CREATE INDEX mongo_a ON mongo((obj-'a')) WHERE (obj-'a') IS NOT NULL;
CREATE INDEX mongo_k1 ON mongo((obj-'key1')) WHERE (obj-'key1') IS NOT  
NULL;
CREATE INDEX mongo_k2 ON mongo((obj-'key2')) WHERE (obj-'key2') IS NOT  
NULL;

VACUUM ANALYZE mongo;

SELECT * FROM mongo WHERE (obj-'key1')='271';
 id  |obj
-+---
 271 | a=271, key1=271
(1 ligne)

EXPLAIN ANALYZE SELECT * FROM mongo WHERE (obj-'key1')='271';
 QUERY PLAN
-
 Index Scan using mongo_k1 on mongo  (cost=0.00..567.05 rows=513 width=36)  
(actual time=0.024..0.025 rows=1 loops=1)

   Index Cond: ((obj - 'key1'::text) = '271'::text)
 Total runtime: 0.048 ms

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Background fsck

2011-04-18 Thread Pierre C


What's more, this is already a new controller. It replaced the previous  
one because of exactly the same persisting problem. I think tech support  
people not knowing a solution just buy some time for them and say flash  
this beta firmware maybe it helps or replace your hardware.


We had a problem like this on a server a few years ago on the job... The  
machine randomly crashed once a month. XFS coped alright until, one day,  
it threw the towel, and the poor maintenance guys needed to run xfsrepair.  
Needless to say, the machine crashed again while xfsrepair was running  
concurrently on all filesystems. All filesystems were then completely  
trashed... That convinced the boss maybe something was wrong and a new box  
was rushed in... Then a few tens of terabytes of backup restoration ...  
zz 


It turned out it was a faulty SCSI cable.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calculating 95th percentiles

2011-03-05 Thread Pierre C



Any time the volume tables are queried it is to calculate the deltas
between each in_octets and out_octets from the previous row (ordered
by timestamp). The delta is used because the external system, where
the data is retrieved from, will roll over the value sometimes.  I
have a function to do this calcuation:


Would it be possible to do this when inserting and store the deltas  
directly ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance issue in the fields.

2011-02-24 Thread Pierre C



I have created two tables. In the first table i am using many fields to
store 3 address.
as well as in b table, i am using array data type to store 3 address.  
 is
there any issue would face in performance related things which one  
will

cause the performance issue.


The array is interesting :
- if you put a gist index on it and do searches like array contains  
values X and Y and Z, gist index has a some special optimizations for this
- if you might store a variable number of integers, and for some reason  
you don't want a normalized one-line-per-value approach



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

2011-02-17 Thread Pierre C

Thomas Pöhler wrote:


I remember you said you were using nginx and php-fastcgi, how many web  
server boxes do you have, and what are the specs ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why we don't want hints Was: Slow count(*) again...

2011-02-11 Thread Pierre C



   select * from account_transaction where trans_type_id in ( ...
long, hard-coded list ...) and account_id=? order by created desc
limit 25;


You could use an index on (account_id, created, trans_type), in  
replacement of your index on (account_id, created). This will not prevent  
the Index Scan Backwards, but at least, index rows with trans_type not  
matching the WHERE clause will not generate any heap access...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Really really slow select count(*)

2011-02-06 Thread Pierre C



I have clustered that table, its still unbelievably slow.


Did you actually delete the old entries before clustering it?  if it's
still got 4G of old sessions or whatever in it, clustering ain't gonna
help.


Also, IMHO it is a lot better to store sessions in something like  
memcached, rather than imposing this rather large load on the main  
database...


PS : if your site has been down for 6 hours, you can TRUNCATE your  
sessions table...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] general hardware advice

2011-02-06 Thread Pierre C

On Sun, 06 Feb 2011 19:16:23 +0100, Linos i...@linos.es wrote:

I am searching what would be the best hardware combination to a new  
server i have to install, the new server will have a Postgresql 9.0 with  
a database of about 10gb, the database part it is not the problem for  
me, in this size almost, the part where i am a bit lost is that i have  
to share from the same server about 20TB of data with samba or ISCSI (i  
have to test both ways to share when i have the hardware) because this  
is to be the file-server of 8 Avid video workstations.


What is the expected load on the postgresql instance ?

Also, for multiple high throughput concurrent streams (as in AVID),  
filesystem choice is critical.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Really really slow select count(*)

2011-02-05 Thread Pierre C

On Fri, 04 Feb 2011 21:37:56 +0100, Shaun Thomas stho...@peak6.com wrote:


On 02/04/2011 02:14 PM, felix wrote:


oh and there in the footnotes to django they say dont' forget to run
the delete expired sessions management every once in a while.
thanks guys.


Oh Django... :)


it won't run now because its too big, I can delete them from psql though


You might be better off deleting the inverse. You know, start a  
transaction, select all the sessions that *aren't* expired, truncate the  
table, insert them back into the session table, and commit.


Note that for a session table, that is updated very often, you can use the  
postgres' HOT feature which will create a lot less dead rows. Look it up  
in the docs.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Wrong docs on wal_buffers?

2011-01-05 Thread Pierre C



And the risks are rather asymmetric.  I don't know of any problem from
too large a buffer until it starts crowding out shared_buffers, while
under-sizing leads to the rather drastic performance consequences of
AdvanceXLInsertBuffer having to wait on the WALWriteLock while holding
the WALInsertLock,


Suppose you have a large update which generates lots of WAL, some WAL  
segment switching will take place, and therefore some fsync()s. If  
wal_buffers is small enough that it fills up during the time it takes to  
fsync() the previous WAL segment, isn't there a risk that all WAL writes  
are stopped, waiting for the end of this fsync() ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] concurrent IO in postgres?

2010-12-24 Thread Pierre C


I wonder how the OP configured effective_io_concurrency ; even on a single  
drive with command queuing the fadvise() calls that result do make a  
difference...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MySQL HandlerSocket - Is this possible in PG?

2010-12-22 Thread Pierre C




Do you mean I should use PREPARE?

Currently I use PHP to access the DB which use libpq. Is that cosidered a
fast call API ? if not, can you please refer me to the right info.

PHP pg_pconnect command open a persistent PostgreSQL connection. Is it
enough or I better use PgPool2 or something similar?

Considering the points above, will I be able to get such high QPS from
PostgreSQL ? If so, it will be my pleasure to dump Reddis and work solely
with PG :)


I suppose you already have a web server like lighttpd, zeus, or nginx,  
using php as fastcgi, or apache behind a proxy ? In that case, since the  
number of php processes is limited (usually to something like 2x your  
number of cores), the number of postgres connections a web server  
generates is limited, and you can do without pgpool and use pg_pconnect.  
Be wary of the pg_pconnect bugs though (like if you restart pg, you also  
have to restart php, I suppose you know that).


Here are some timings (Core 2 Q6600) for a simple SELECT on PK query :

using tcp (localhost)
   218 µs / query :  pg_query
   226 µs / query :  pg_query_params
   143 µs / query :  pg_execute

using unix sockets
   107 µs / query :  pg_query
   122 µs / query :  pg_query_params
63 µs / query :  pg_execute

query inside plpgsql function
17 µs / query

Don't use PDO, it is 2x-3x slower.

TCP overhead is quite large...

If you have a named prepared statement (created with PREPARE) use  
pg_execute(), which is much faster than pg_query (for very simple queries).


Of course you need to prepare the statements... you can do that with  
pg_pool which can execute a script upon connection initialization.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MySQL HandlerSocket - Is this possible in PG?

2010-12-22 Thread Pierre C

On Wed, 22 Dec 2010 14:17:21 +0100, Michael Ben-Nes mich...@epoch.co.il
wrote:


Thanks, it is most interesting

--
Michael Ben-Nes - Internet Consultant and Director.
http://www.epoch.co.il - weaving the Net.
Cellular: 054-4848113
--



In fact, it would be possible to implement something like MySQL
HandlerSocket, using the following Very Ugly Hack :

This would only work for ultra simple SELECT 1 row WHERE primary key =
constant queries.

- a pooler (separate process) waits for connections
- clients connect to the pooler and send queries
- pooler accumulates enough queries to justify the overhead of what's
going to come next
- pooler takes a bunch of queries and encodes them in some custom ad-hoc
format (not SQL)
- pooler says to postgres SELECT do_my_queries( serialized data )
- do_my_queries() is a user function (in C) which uses postgres access
methods directly (like index access method on primary key), processes
queries, and sends results back as binary data
- repeat for next batch

Nested Loop Index Scan processes about 400.000 rows/s which is 2.5
us/query, maybe you could get into that ballpark (per core).

Of course it's a rather extremely ugly hack.

---

Note that you could very possibly have almost the same benefits with
almost none of the ugliness by doing the following :

same as above :
- a pooler (separate process) waits for connections
- clients connect to the pooler and send queries in the format query +
parameters (which libpq uses if you ask)
- pooler accumulates enough queries to justify the overhead of what's
going to come next

different :
- pooler looks at each query, and if it has not seen it yet on this
particular pg connection, issues a PREPARE on the query
- pooler sends, in one TCP block, a begin, then a bunch of execute named
prepared statement with parameters commands, then a rollback
- postgres executes all of those and returns all replies in one TCP block
(this would need a modification)
- pooler distributes results back to clients

This would need a very minor change to postgres (coalescing output
blocks). It would make the pooler pay TCP overhead instead of postgres,
and greatly improve cache locality in postgres.

Since error handling would be problematic (to say the least...) and
expensive it would only work on simple selects.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-18 Thread Pierre C



  The real performance problem with RAID 5 won't show up until a drive
  dies and it starts rebuilding

I don't  agree with that. RAID5 is  very slow for random  writes, since
it needs to :


The real problem is when RAID5 loses a drive and goes from acceptable  
kind of slow, to someone's fired kind of slow. Then of course in the  
middle the rebuild, a bad sector is discovered in some place the  
filesystem has never visited yet on one of the remaining drives, and all  
hell breaks loose.


RAID6 is only one extra disk...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres performance tunning

2010-12-17 Thread Pierre C

Dear Friends,
I have a requirement for running more that 15000 queries per  
second.
Can you please tell what all are the postgres parameters needs to be  
changed

to achieve this.
Already I have 17GB RAM and dual core processor and this machine  
is dedicated for database operation.


That depends on your queries : for simple things like SELECT * FROM table  
WHERE primary_key = constant, no problem, a desktop dual core will do  
it...

So, please provide more details...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Auto-clustering?

2010-12-17 Thread Pierre C


fc=# explain  analyse select collection, period, tariff, sum(bytesSent),  
sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600  
as startchunk from sample_20101001 where starttime between 1287493200  
and 1290171599  and collection=128and ip = '10.9.125.207' group by  
startchunk, tariff, collection,  
period;


If CLUSTER locks bother you, and you don't do UPDATEs, you might consider  
doing something like this :


- accumulate the rows in a recent table
- every hour, INSERT INTO archive SELECT * FROM recent ORDER BY (your  
cluster fields)

- DELETE FROM recent the rows you just inserted
- VACUUM recent

The cluster in your archive table will not be perfect but at least all  
rows from 1 source in 1 hour will be stored close together. But clustering  
doesn't need to be perfect either, if you get 100x better locality, that's  
already good !


Now, if you have a huge amount of data but never query it with a precision  
exceeding 1 hour, you might consider creating an aggregate table where, at  
the end of every hour, you only store sum(), min(), max() of the data for  
the last hour's data using GROUP BY the fields you want. You could also  
use a trigger, but that would generate a huge amount of UPDATEs.


For the above query you'd do :

INSERT INTO stats_by_hour (columns...) SELECT
collection, ip, period, tariff, sum(bytesSent),
sum(bytesReceived), sum(packets), max(sample), (starttime / 3600) * 3600
as startchunk from sample_20101001 WHERE starttime  some value
GROUP BY collection, ip, period, tariff, startchunk

Then you can run aggregates against this much smaller table instead.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with bulk read performance

2010-12-16 Thread Pierre C


If the data are stored as a byte array but retrieve into a ResultSet,  
the unpacking time goes up by an order of magnitude and the
observed total throughput is 25 MB/s.  If the data are stored in a  
Postgres float array and unpacked into a byte stream, the

observed throughput is 20 MB/s.



float - text conversions are very slow, this is in fact due to the  
mismatch between base-2 (IEEE754) and base-10 (text) floating point  
representation, which needs very very complex calculations.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] libpq vs ODBC

2010-12-09 Thread Pierre C

On Thu, 09 Dec 2010 06:51:26 +0100, Alex Goncharov
alex-goncha...@comcast.net wrote:


,--- You/Divakar (Wed, 8 Dec 2010 21:17:22 -0800 (PST)) *
| So it means there will be visible impact if the nature of DB  
interaction is DB

| insert/select. We do that mostly in my app.

You can't say a visible impact unless you can measure it in your
specific application.

Let's say ODBC takes 10 times of .001 sec for libpq.  Is this a
visible impact?


Well you have to consider server and client resources separately. If you  
waste a bit of CPU time on the client by using a suboptimal driver, that  
may be a problem, or not. It you waste server resources, that is much more  
likely to be a problem, because it is multiplied by the number of clients.  
I don't know about the specifics of ODBC performance, but for instance  
php's PDO driver's handling of prepared statements with postgres comes up  
as an example of what not to do.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Compared MS SQL 2000 to Postgresql 9.0 on Windows

2010-12-08 Thread Pierre C

The hardware it
is running on is fairly good, dual Xeon CPUs, 4 GB of RAM, Raid 5.


For a database you'd want to consider replacing the RAID1 with a RAID1 (or  
RAID10). RAID5 is slow for small random updates, which are common in  
databases. Since you probably have enough harddisks anyway, this won't  
cost you. Linux or freebsd would also be better choices for postgres  
rather than windows.


Also, as said, your issue looks very much like a problem in the way your  
application communicates with postgres : if it takes postgres 5 ms to  
process the query and your application gets the result 8 seconds later,  
there is a problem. Note that SQL Server probably takes just a few ms for  
such a simple query, too, so your not really benchmarking SQL server  
either.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] BBU Cache vs. spindles

2010-12-01 Thread Pierre C



Is that true?  I have no idea.  I thought everything was done at the
512-byte block level.


Newer disks (2TB and up) can have 4k sectors, but this still means a page  
spans several sectors.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Pierre C



Just once.


OK, another potential problem eliminated, it gets strange...


If I have 5000 lines in CSV file (that I load into 'temporary' table
using COPY) i can be sure that drone_id there is PK. That is because CSV
file contains measurements from all the drones, one measurement per
drone. I usualy have around 100 new drones, so I insert those to drones
and to drones_history. Then I first insert into drones_history and then
update those rows in drones. Should I try doing the other way around?


No, it doesn't really matter.


Although, I think I'm having some disk-related problems because when
inserting to the tables my IO troughput is pretty low. For instance,
when I drop constraints and then recreate them that takes around 15-30
seconds (on a 25M rows table) - disk io is steady, around 60 MB/s in
read and write.

It just could be that the ext3 partition is so fragmented. I'll try
later this week on a new set of disks and ext4 filesystem to see how it
goes.


If you CLUSTER a table, it is entirely rebuilt so if your disk free space  
isn't heavily fragmented, you can hope the table and indexes will get  
allocated in a nice contiguous segment.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Pierre C


So, I did. I run the whole script in psql, and here is the result for  
the INSERT:


realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id,  
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM  
tmp_drones_history;
   QUERY PLAN  
--
  Seq Scan on tmp_drones_history  (cost=0.00..81.60 rows=4160 width=48)  
(actual time=0.008..5.296 rows=5150 loops=1)
  Trigger for constraint drones_history_fk__drones: time=92.948  
calls=5150

  Total runtime: 16779.644 ms
(3 rows)


Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
The batch before, I run that one 'the usual way', it inserted 9922 rows,  
and it took 1 minute and 16 seconds.


I did not, however, enclose the process into begin/end.

So, here are results when I, in psql, first issued BEGIN:

realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id,  
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM  
tmp_drones_history;
   QUERY PLAN  
--
  Seq Scan on tmp_drones_history  (cost=0.00..79.56 rows=4056 width=48)  
(actual time=0.008..6.490 rows=5059 loops=1)
  Trigger for constraint drones_history_fk__drones: time=120.224  
calls=5059

  Total runtime: 39658.250 ms
(3 rows)

Time: 39658.906 ms



Mario



Note that in both cases postgres reports that the FK checks take 92-120  
milliseconds... which is a normal time for about 4000 rows.
Inserting 4000 lines with just a few fields like you got should take quite  
much less than 1 s...


Where the rest of the time goes, I have no idea. Disk thrashing ? Locks ?  
Gremlins ?


- try it on a fresh copy of all your tables (CREATE TABLE, INSERT INTO  
SELECT)
- try to put the WAL on a separate physical disk (or do a check with  
fsync=off)

- try it on another computer
- try it on another harddisk
- run oprofile on a debug compile of postgres
- it could even be the process title updates (I don't think so but...)
- try a ramdisk tablespace

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple database, multiple instances?

2010-11-30 Thread Pierre C


Having that many instances is not practical at all, so I'll have as many  
databases as I have 'realms'. I'll use pg_dump | nc  and nc | psql to  
move databases


Mario


Then you can use schemas, too, it'll be easier.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Pierre C


Now I tried removing the constraints from the history table (including  
the PK) and the inserts were fast. After few 'rounds' of inserts I added  
constraints back, and several round after that were fast again. But then  
all the same. Insert of some 11k rows took 4 seconds (with all  
constraints) and now the last one of only 4k rows took one minute. I did  
vacuum after each insert.



Mario


Hm, so for each line of drones_history you insert, you also update the  
correspoding drones table to reflect the latest data, right ?
How many times is the same row in drones updated ? ie, if you insert N  
rows in drones_nistory, how may drone_id's do you have ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-29 Thread Pierre C

realm_51=# vacuum analyze verbose drones;
INFO:  vacuuming public.drones
INFO:  scanned index drones_pk to remove 242235 row versions
DETAIL:  CPU 0.02s/0.11u sec elapsed 0.28 sec.
INFO:  drones: removed 242235 row versions in 1952 pages
DETAIL:  CPU 0.01s/0.02u sec elapsed 0.03 sec.
INFO:  index drones_pk now contains 174068 row versions in 721 pages
DETAIL:  107716 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.


As you can see your index contains 174068 active rows and 242235 dead rows  
that probably should have been removed a long time ago by autovacuum, but  
you seem to have it turned off. It does not take a long time to vacuum  
this table (only 0.3 sec) so it is not a high cost, you should enable  
autovacuum and let it do the job (note that this doesn't stop you from  
manual vacuuming after big updates).



CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  drones: found 486 removable, 174068 nonremovable row versions  
in 1958 out of 1958 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 64 unused item pointers.
0 pages are entirely empty.
CPU 0.22s/0.90u sec elapsed 22.29 sec.


Here, the table itself seems quite normal... strange.


INFO:  vacuuming pg_toast.pg_toast_2695558
INFO:  index pg_toast_2695558_index now contains 0 row versions in 1  
pages

DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_2695558: found 0 removable, 0 nonremovable row  
versions in 0 out of 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


Since you don't have large fields, the toast table is empty...


realm_51=# vacuum analyze verbose drones_history;
INFO:  vacuuming public.drones_history
INFO:  index drones_history_pk now contains 25440352 row versions in  
69268 pages

DETAIL:  0 index row versions were removed.


good


0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.12u sec elapsed 16.56 sec.
INFO:  drones_history: found 0 removable, 16903164 nonremovable row  
versions in 129866 out of 195180 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 2.00s/1.42u sec elapsed 49.24 sec.


good


INFO:  vacuuming pg_toast.pg_toast_2695510
INFO:  index pg_toast_2695510_index now contains 0 row versions in 1  
pages

DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_2695510: found 0 removable, 0 nonremovable row  
versions in 0 out of 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


same as above, no toast



realm_51=# select version();
version  
-
  PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian  
4.3.2-1.1) 4.3.2, 32-bit

(1 row)


Mario


ok

Try this :

CLUSTER drones_pkey ON drones;

Then check if your slow query gets a bit faster. If it does, try :

ALTER TABLE drones SET ( fillfactor = 50 );
ALTER INDEX drones_pkey SET ( fillfactor = 50 );
CLUSTER drones_pkey ON drones; (again)

This will make the updates on this table less problematic. VACUUM it after  
each mass update.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-29 Thread Pierre C


Yes, since (sample_id, drone_id) is primary key, postgres created  
composite index on those columns. Are you suggesting I add two more  
indexes, one for drone_id and one for sample_id?


(sample_id,drone_id) covers sample_id but if you make searches on drone_id  
alone it is likely to be very slow since you got a large number of  
sample_ids. Postgres can use any column of a multicolumn index but it is  
only interesting performance-wise if the cardinality of the first  
(ignored) columns is low. If you often make searches on drone_id, create  
an index. But this isn't what is slowing your foreign key checks.



Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.


Is there a way to do so inside plpgsql function?

I can recreate the whole process within psql and then post the explain  
analyze, it would just take me some time to do so. I'll post as soon as  
I'm done.


Yes, this would be interesting.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Pierre C


When I remove foreign constraints (drones_history_fk__samples and  
drones_history_fk__drones) (I leave the primary key on drones_history)  
than that INSERT, even for 50k rows, takes no more than a second.


So, my question is - is there anything I can do to make INSERTS with PK  
faster? Or, since all the reference checking is done inside the  
procedure for loading data, shall I abandon those constraints entirely?


Mario


Maybe... or not. Can you post details about :

- the foreign keys
- the tables that are referred to (including indexes)


CREATE TABLE foo (x INTEGER PRIMARY KEY); INSERT INTO foo SELECT * FROM  
generate_series( 1,10 );

Temps : 766,182 ms
test= VACUUM ANALYZE foo;
Temps : 71,938 ms
test= CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
CREATE TABLE
test= INSERT INTO bar SELECT * FROM generate_series( 1,10 );
Temps : 2834,430 ms

As you can see, 100.000 FK checks take less than 3 seconds on this very  
simple example. There is probably something that needs fixing.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Pierre C



I pasted DDL at the begining of my post.


Ah, sorry, didn't see it ;)

The only indexes tables have are the ones created because of PK  
constraints. Table drones has around 100k rows. Table drones_history has  
around 30M rows. I'm not sure what additional info you'd want but I'll  
be more than happy to provide more relevant information.


Can you post the following :

- pg version
- output of VACCUM ANALYZE VERBOSE for your 2 tables

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimizing query

2010-11-26 Thread Pierre C



Note that your LEFT JOIN condition is probably quite slow...

Please post EXPLAIN ANALYZE for this simplified version :

SELECT
R.Osoba weryfikuj?ca,
R.LP,
A.NKA,
A.NTA,
Sum(Ile)
FROMNumeryA A
LEFT JOIN   Rejestr stacji do naprawy R ON (
A.NKA = R.Numer kierunkowy
and A.NTA like R.Numer stacji
and substr(A.NTA,1,5) = substr(R.Numer stacji,1,5)
)
WHERE
A.DataPliku = current_date-4*30
and A.KodB??du=74::text
and R.Data weryfikacji = current_date-4*30
GROUP BY R.Osoba weryfikuj?ca,R.LP,A.NKA, A.NTA
ORDER BY Sum(Ile) DESC
LIMIT 5000

And also post EXPLAIN ANALYZE for this :

SELECT
A.NKA,
A.NTA,
Sum(Ile) AS ss -- if it's in this table
FROMNumeryA A
WHERE
A.DataPliku = current_date-4*30
and A.KodB??du=74::text
GROUP BY A.NKA, A.NTA

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Select * is very slow

2010-11-08 Thread Pierre C



The table have 200 records now.
Select * from employee takes 15 seconds to fetch the data!!!
Which seems to be very slow.
But when I say select id,name from empoyee it executes in 30ms.


30 ms is also amazingly slow for so few records and so little data.

- please provide results of EXPLAIN ANALYZE SELECT id FROM table
- huge bloat (table never vacuumed ?) = VACUUM VERBOSE
- bad network cable, network interface reverting to 10 Mbps, badly  
configured network, etc ? (test it and test ping to server, throughput,  
etc)

- server overloaded (swapping, etc) ? (vmstat, iostat, top, etc)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postmaster consuming /lots/ of memory with hash aggregate. why?

2010-11-06 Thread Pierre C



2. Why do both HashAggregate and GroupAggregate say the cost estimate
is 4 rows?


I've reproduced this :


CREATE TABLE popo AS SELECT (x%1000) AS a,(x%1001) AS b FROM  
generate_series( 1,100 ) AS x;

VACUUM ANALYZE popo;
EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo UNION ALL  
SELECT * FROM popo) AS foo GROUP BY a,b;

 QUERY PLAN

 HashAggregate  (cost=43850.00..44350.00 rows=4 width=8) (actual  
time=1893.441..2341.780 rows=100 loops=1)
   -  Append  (cost=0.00..28850.00 rows=200 width=8) (actual  
time=0.025..520.581 rows=200 loops=1)
 -  Seq Scan on popo  (cost=0.00..14425.00 rows=100 width=8)  
(actual time=0.025..142.639 rows=100 loops=1)
 -  Seq Scan on popo  (cost=0.00..14425.00 rows=100 width=8)  
(actual time=0.003..114.257 rows=100 loops=1)

 Total runtime: 2438.741 ms
(5 lignes)

Temps : 2439,247 ms

I guess the row count depends on the correlation of a and b, which pg has  
no idea about. In the first example, there is no correlation, now with  
full correlation :



UPDATE popo SET a=b;
VACUUM FULL popo;
VACUUM FULL popo;
ANALYZE popo;
EXPLAIN ANALYZE SELECT a,b,count(*) FROM (SELECT * FROM popo UNION ALL  
SELECT * FROM popo) AS foo GROUP BY a,b;

 QUERY PLAN

 HashAggregate  (cost=43850.00..44350.00 rows=4 width=8) (actual  
time=1226.201..1226.535 rows=1001 loops=1)
   -  Append  (cost=0.00..28850.00 rows=200 width=8) (actual  
time=0.008..518.068 rows=200 loops=1)
 -  Seq Scan on popo  (cost=0.00..14425.00 rows=100 width=8)  
(actual time=0.007..128.609 rows=100 loops=1)
 -  Seq Scan on popo  (cost=0.00..14425.00 rows=100 width=8)  
(actual time=0.005..128.502 rows=100 loops=1)

 Total runtime: 1226.797 ms

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-05 Thread Pierre C
On Thu, 04 Nov 2010 15:42:08 +0100, Nick Matheson  
nick.d.mathe...@noaa.gov wrote:
I think your comments really get at what our working hypothesis was, but  
given that our experience is limited compared to you all here on the  
mailing lists we really wanted to make sure we weren't missing any  
alternatives. Also the writing of custom aggregators will likely  
leverage any improvements we make to our storage throughput.


Quick test : SELECT sum(x) FROM a table with 1 INT column, 3M rows, cached
= 244 MB/s
= 6.7 M rows/s

Same on MySQL :

 sizeSELECT sum(x) (cached)
postgres 107 MB  0.44 s
myisam   20 MB   0.42 s
innodb   88 MB   1.98 s

As you can see, even though myisam is much smaller (no transaction data to  
store !) the aggregate performance isn't any better, and for innodb it is  
much worse.


Even though pg's per-row header is large, seq scan / aggregate performance  
is very good.


You can get performance in this ballpark by writing a custom aggregate in  
C ; it isn't very difficult, the pg source code is clean and full of  
insightful comments.


- take a look at how contrib/intagg works
- http://www.postgresql.org/files/documentation/books/aw_pgsql/node168.html
- and the pg manual of course

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-05 Thread Pierre C



Fair enough; I'm so used to bumping wal_buffers up to 16MB nowadays that
I forget sometimes that people actually run with the default where this
becomes an important consideration.


Do you have any testing in favor of 16mb vs. lower/higher?


From some tests I had done some time ago, using separate spindles (RAID1)  
for xlog, no battery, on 8.4, with stuff that generates lots of xlog  
(INSERT INTO SELECT) :


When using a small wal_buffers, there was a problem when switching from  
one xlog file to the next. Basically a fsync was issued, but most of the  
previous log segment was still not written. So, postgres was waiting for  
the fsync to finish. Of course, the default 64 kB of wal_buffers is  
quickly filled up, and all writes wait for the end of this fsync. This  
caused hiccups in the xlog traffic, and xlog throughput wassn't nearly as  
high as the disks would allow. Sticking a sthetoscope on the xlog  
harddrives revealed a lot more random accesses that I would have liked  
(this is a much simpler solution than tracing the IOs, lol)


I set wal writer delay to a very low setting (I dont remember which,  
perhaps 1 ms) so the walwriter was in effect constantly flushing the wal  
buffers to disk. I also used fdatasync instead of fsync. Then I set  
wal_buffers to a rather high value, like 32-64 MB. Throughput and  
performance were a lot better, and the xlog drives made a much more  
linear-access noise.


What happened is that, since wal_buffers was larger than what the drives  
can write in 1-2 rotations, it could absorb wal traffic during the time  
postgres waits for fdatasync / wal segment change, so the inserts would  
not have to wait. And lowering the walwriter delay made it write something  
on each disk rotation, so that when a COMMIT or segment switch came, most  
of the time, the WAL was already synced and there was no wait.


Just my 2 c ;)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Simple (hopefully) throughput question?

2010-11-04 Thread Pierre C



Is there any way using stored procedures (maybe C code that calls
SPI directly) or some other approach to get close to the expected 35
MB/s doing these bulk reads?  Or is this the price we have to pay for
using SQL instead of some NoSQL solution.  (We actually tried Tokyo
Cabinet and found it to perform quite well. However it does not measure
up to Postgres in terms of replication, data interrogation, community
support, acceptance, etc).


Reading from the tables is very fast, what bites you is that postgres has  
to convert the data to wire format, send it to the client, and the client  
has to decode it and convert it to a format usable by your application.  
Writing a custom aggregate in C should be a lot faster since it has direct  
access to the data itself. The code path from actual table data to an  
aggregate is much shorter than from table data to the client...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] MVCC and Implications for (Near) Real-Time Application

2010-10-29 Thread Pierre C


My questions are: (1) Does the MVCC architecture introduce significant  
delays between insert by a thread and visibility by other threads


As said by others, once commited it is immediately visible to all


(2) Are there any available benchmarks that can measure this delay?


Since you will not be batching INSERTs, you will use 1 INSERT per  
transaction.

If you use Autocommit mode, that's it.
If you don't, you will get a few extra network roundtrips after the  
INSERT, to send the COMMIT.


One INSERT is usually extremely fast unless you're short on RAM and the  
indexes that need updating need some disk seeking.


Anyway, doing lots of INSERTs each in its own transaction is usually very  
low-throughput, because at each COMMIT, postgres must always be sure that  
all the data is actually written to the harddisks. So, depending on the  
speed of your harddisks, each COMMIT can take up to 10-20 milliseconds.


On a 7200rpm harddisk, it is absolutely impossible to do more than 7200  
commits/minute if you want to be sure each time that the data really is  
written on the harddisk, unless :


- you use several threads (one disk write can group several commits from  
different connections, see the config file docs)
- you turn of synchronous_commit ; in this case commit is instantaneous,  
but if your server loses power or crashes, the last few seconds of data  
may be lost (database integrity is still guaranteed though)
- you use a battery backup cache on your RAID controller, in this case  
written to the harddisks is replaced by written to batteyr backed RAM  
which is a lot faster


If you dont use battery backed cache, place the xlog on a different RAID1  
array than the tables/indexes, this allows committing of xlog records  
(which is the time critical part) to proceed smoothly and not be disturbed  
by other IO on the indexes/tables. Also consider tuning your bgwriter and  
checkpoints, after experimentation under realistic load conditions.


So, when you benchmark your application, if you get disappointing results,  
think about this...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Select count(*), the sequel

2010-10-28 Thread Pierre C

Pierre C li...@peufeu.com wrote:


in-page compression

How would that be different from the in-page compression done by
TOAST now?  Or are you just talking about being able to make it
more aggressive?
-Kevin


Well, I suppose lzo-style compression would be better used on data that is  
written a few times maximum and then mostly read (like a forum, data  
warehouse, etc). Then, good candidate pages for compression also probably  
have all tuples visible to all transactions, therefore all row headers  
would be identical and would compress very well. Of course this introduces  
a small problem for deletes and updates...


Delta compression is : take all the values for a column inside a page,  
look at the values and their statistical distribution, notice for example  
that they're all INTs and the values on the page fit between X+n and X-n,  
store X and only encode n with as few bits as possible for each row. This  
is only an example, the idea is to exploit the fact that on the same page,  
all the values of one column often have lots in common. xid values in row  
headers are a good example of this.


TOAST compresses datums, so it performs well on large datums ; this is the  
opposite, the idea is to compress small tuples by using the reduncancies  
between tuples.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Select count(*), the sequel

2010-10-27 Thread Pierre C

Even if somebody had a
great idea that would make things smaller without any other penalty,
which I'm not sure I believe either.


I'd say that the only things likely to bring an improvement significant  
enough to warrant the (quite large) hassle of implementation would be :


- read-only / archive tables (get rid of row header overhead)
- in-page compression using per-column delta storage for instance (no  
random access penalty, but hard to implement, maybe easier for read-only  
tables)
- dumb LZO-style compression (license problems, needs parallel  
decompressor, random access penalty, hard to implement too)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] oracle to psql migration - slow query in postgres

2010-10-14 Thread Pierre C



 emailok | numeric(2,0)|


Note that NUMERIC is meant for
- really large numbers with lots of digits
- or controlled precision and rounding (ie, order total isn't  
99. $)


Accordingly, NUMERIC is a lot slower in all operations, and uses a lot  
more space, than all the other numeric types.


I see many columns in your table that are declared as NUMERIC but should  
be BOOLs, or SMALLINTs, or INTs, or BIGINTs.


Perhaps Oracle handles these differently, I dunno.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-13 Thread Pierre C


I guess I have to comment here again and point out that while I am  
having this
issue with text searches, I avoid using count(*) in such cases, I just  
use

next and previous links.


Unfortunately sometimes you got to do an ORDER BY on search results, and  
then all the rows got to be read...



Where the real problem (for me) is that when someone
searches a date or time range. My application keeps track of huge


Have you tried CLUSTER ?

Also, it is sad to say, but if you need an engine able to use index-only  
scans which would fit this type of query, replicate the table to MyISAM.  
Unfortunately, the MySQL optimizer is really not so smart about complex  
reporting queries (no hash joins, no hash aggregates) so if you don't have  
a multicolumn index covering that you can use for index-only scan in your  
query, you'll get either a really huge sort or a really nasty nested loop  
index scan...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-12 Thread Pierre C



suggest that 99% instances of the select count(*) idiom are probably

bad use of the SQL language.


Well, suppose you paginate results. If the user sees that the search query  
returns 500 pages, there are two options :


- you're google, and your sorting algorithms are so good that the answer  
the user wants is in the first page
- or the user will refine his search by entering more keywords tu get a  
manageable result set


So, in both cases, the count(*) was useless anyway. And the slowest ones  
are the most useless, since the user will immediatey discard the result  
and refine his query.


If your full text search is slow, try Xapian or Lucene.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow count(*) again...

2010-10-11 Thread Pierre C



I ran into a fine example of this when I was searching this mailing list,
Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3. Obviously at some point count(*) came into play here


Well, tsearch full text search is excellent, but it has to work inside the  
limits of the postgres database itself, which means row visibility checks,  
and therefore, yes, extremely slow count(*) on large result sets when the  
tables are not cached in RAM.


Also, if you want to use custom sorting (like by date, thread, etc)  
possibly all the matching rows will have to be read and sorted.


Consider, for example, the Xapian full text search engine. It is not MVCC  
(it is single writer, multiple reader, so only one process can update the  
index at a time, but readers are not locked out during writes). Of course,  
you would never want something like that for your main database ! However,  
in its particular application, which is multi-criteria full text search  
(and flexible sorting of results), it just nukes tsearch2 on datasets not  
cached in RAM, simply because everything in it including disk layout etc,  
has been optimized for the application. Lucene is similar (but I have not  
benchmarked it versus tsearch2, so I can't tell).


So, if your full text search is a problem, just use Xapian. You can update  
the Xapian index from a postgres trigger (using an independent process, or  
simply, a plpython trigger using the python Xapian bindings). You can  
query it using an extra process acting as a server, or you can write a  
set-returning plpython function which performs Xapian searches, and you  
can join the results to your tables.


Pg will never have such a fast count() as MyISAM does or the same  
insanely fast read performance,


Benchmark it you'll see, MyISAM is faster than postgres for small simple  
selects, only if :
- pg doesn't use prepared queries (planning time takes longer than a  
really simple select)

- myisam can use index-only access
- noone is writing to the myisam table at the moment, obviously

On equal grounds (ie, SELECT * FROM table WHERE pk = value) there is no  
difference. The TCP/IP overhead is larger than the query anyway, you have  
to use unix sockets on both to get valid timings. Since by default on  
localhost MySQL seems to use unix sockets and PG uses tcp/ip, PG seem 2x  
slower, which is in fact not true.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How does PG know if data is in memory?

2010-10-01 Thread Pierre C
It sounds horrendously complicated to keep track of to me, and in the  
end it won't make query execution any faster, it'll just potentially  
help the planner pick a better plan. I wonder if that'd be worth the  
extra CPU time spent managing the cache and cache content stats, and  
using those cache stats when planning? It'd be an interesting  
experiment, but the outcome is hardly obvious.


Well, suppose you pick an index scan, the only way to know which index  
(and heap) pages you'll need is to actually do the index scan... which  
isn't really something you'd do when planning. So you scan,


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] turn off caching for performance test

2010-08-26 Thread Pierre C



The bitmap heap scan is 3% faster,


3% isn't really significant. Especially if the new setting makes one query  
100 times slower... Like a query which will, by bad luck, get turned into  
a nested loop index scan for a lot of rows, on a huge table which isn't in  
cache...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-16 Thread Pierre C


Most (all?) hard drives have cache built into them. Many raid cards have  
cache built into them. When the power dies, all the data in any cache is  
lost, which is why it's dangerous to use it for write caching. For that  
reason, you can attach a BBU to a raid card which keeps the cache alive  
until the power is restored (hopefully). But no hard drive I am aware of  
lets you attach a battery, so using a hard drive's cache for write  
caching will always be dangerous.


That's why many raid cards will always disable write caching on the hard  
drives themselves, and only enable write caching using their own memory  
when a BBU is installed.


Does that make more sense?



Actually write cache is only dangerous if the OS and postgres think some  
stuff is written to the disk when in fact it is only in the cache and not  
written yet. When power is lost, cache contents are SUPPOSED to be lost.  
In a normal situation, postgres and the OS assume nothing is written to  
the disk (ie, it may be in cache not on disk) until a proper cache flush  
is issued and responded to by the hardware. That's what xlog and journals  
are for. If the hardware doesn't lie, and the kernel/FS doesn't have any  
bugs, no problem. You can't get decent write performance on rotating media  
without a write cache somewhere...



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question of using COPY on a table with triggers

2010-07-15 Thread Pierre C

Essentially, we insert a set of columns into a table, and each row fires
a trigger function which calls a very large stored procedure



For inserting lots of rows, COPY is much faster than INSERT because it  
parses data (a lot) faster and is more data-stream-friendly. However the  
actual inserting into the tbale and trigger-calling has to be done for  
both.


If the trigger is a very large stored procedure it is very likely that  
executing it will take a lot more time than parsing  executing the  
INSERT. So, using COPY instead of INSERT will not gain you anything.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Need help in performance tuning.

2010-07-11 Thread Pierre C


Two problems to recognize.  First is that building something in has the  
potential to significantly limit use and therefore advancement of work  
on external pools, because of the let's use the built in one instead of  
installing something extra mentality.  I'd rather have a great external  
project (which is what we have with pgBouncer) than a mediocre built-in  
one that becomes the preferred way just by nature of being in the core.


I would prefer having supplier A build a great product that seamlessly  
interfaces with supplier B's great product, rather than having supplier M$  
buy A, develop a half-working brain-dead version of B into A and market it  
as the new hot stuff, sinking B in the process. Anyway, orthogonal feature  
sets (like database and pooler) implemented in separate applications fit  
the open source development model quite well I think. Merge everything in,  
you get PHP.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance on new linux box

2010-07-08 Thread Pierre C


On the new system the bulk loads are extremely slower than on the  
previous

machine and so are the more complex queries.  The smaller transactional
queries seem comparable but i had expected an improvement.  Performing a  
db

import via psql -d databas -f dbfile illustrates this problem.


If you use psql (not pg_restore) and your file contains no BEGIN/COMMIT  
statements, you're probably doing 1 transaction per SQL command. As the  
others say, if the old box lied about fsync, and the new one doesn't,  
performance will suffer greatly. If this is the case, remember to do your  
imports the proper way : either use pg_restore, or group inserts in a  
transaction, and build indexes in parallel.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-18 Thread Pierre C


I'd like to point out the costs involved in having a whole separate  
version


It must be a setting, not a version.

For instance suppose you have a session table for your website and a users  
table.


- Having ACID on the users table is of course a must ;
- for the sessions table you can drop the D

Server crash would force all users to re-login on your website but if your  
server crashes enough that your users complain about that, you have  
another problem anyway. Having the sessions table not WAL-logged (ie  
faster) would not prevent you from having sessions.user_id REFERENCES  
users( user_id ) ... so mixing safe and unsafe tables would be much more  
powerful than just having unsafe tables.


And I really like the idea of non-WAL-logged indexes, too, since they can  
be rebuilt as needed, the DBA could decide between faster index updates  
but rebuild on crash, or normal updates and fast recovery.


Also materialized views etc, you can rebuild them on crash and the added  
update speed would be good.


Moreover, we already have a mechanism for taking a table that has had  
non-logged changes, and turning it into a fully logged table - we do  
that to the above mentioned tables when the transaction commits. I would  
strongly recommend providing an option to ALTER TABLE MAKE SAFE, which  
may involve some more acrobatics if the table is currently in use by  
multiple transactions, but would be valuable.


I believe the old discussions called this ALTER TABLE SET PERSISTENCE.

This would allow users to create temporary tables that can be shared  
by several connections. It would also allow bulk loading in parallel of  
a single large table.


This would need to WAL-log the entire table to send it to the slaves if  
replication is enabled, but it's a lot faster than replicating each record.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Parallel queries for a web-application |performance testing

2010-06-17 Thread Pierre C


When you set up a server that has high throughput requirements, the last  
thing you want to do is use it in a manner that cripples its throughput.  
Don't try and have 1000 parallel Postgres backends - it will process  
those queries slower than the optimal setup. You should aim to have  
approximately ((2 * cpu core count) + effective spindle count) number of  
backends, as that is the point at which throughput is the greatest. You  
can use pgbouncer to achieve this.


The same is true of a web server : 1000 active php interpreters (each  
eating several megabytes or more) are not ideal for performance !


For php, I like lighttpd with php-fastcgi : the webserver proxies requests  
to a small pool of php processes, which are only busy while generating the  
page. Once the page is generated the webserver handles all (slow) IO to  
the client.


An interesting side effect is that the number of database connections is  
limited to the number of PHP processes in the pool, so you don't even need  
a postgres connection pooler (unless you have lots of php boxes)...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-17 Thread Pierre C


Especially as, in repeated tests, PostgreSQL with persistence turned off  
is just as fast as the fastest nondurable NoSQL database.  And it has a  
LOT more features.


An option to completely disable WAL for such use cases would make it a lot  
faster, especially in the case of heavy concurrent writes.


Now, while fsync=off and tmpfs for WAL more-or-less eliminate the IO for  
durability, they don't eliminate the CPU time.


Actually the WAL overhead is some CPU and lots of locking.

Which means that a caching version of PostgreSQL could be even faster.
To do that, we'd need to:


a) Eliminate WAL logging entirely
b) Eliminate checkpointing
c) Turn off the background writer
d) Have PostgreSQL refuse to restart after a crash and instead call an  
exteral script (for reprovisioning)


Of the three above, (a) is the most difficult codewise.


Actually, it's pretty easy, look in xlog.c


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-17 Thread Pierre C



Well I guess I'd prefer a per-transaction setting, allowing to bypass
WAL logging and checkpointing. Forcing the backend to care itself for
writing the data I'm not sure is a good thing, but if you say so.


Well if the transaction touches a system catalog it better be WAL-logged...

A per-table (or per-index) setting makes more sense IMHO. For instance on  
recovery, truncate this table (this was mentioned before).
Another option would be make the table data safe, but on recovery,  
destroy and rebuild this index : because on a not so large, often updated  
table, with often updated indexes, it may not take long to rebuild the  
indexes, but all those wal-logged index updates do add some overhead.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Pierre C



Have you tried connecting using a UNIX socket instead of a TCP socket on
localhost ? On such very short queries, the TCP overhead is significant.


Actually UNIX sockets are the default for psycopg2, had forgotten that.

I get 7400 using UNIX sockets and 3000 using TCP (host=localhost)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Pierre C


FYI I've tweaked this program a bit :

import psycopg2
from time import time
conn = psycopg2.connect(database='peufeu')
cursor = conn.cursor()
cursor.execute(CREATE TEMPORARY TABLE test (data int not null))
conn.commit()
cursor.execute(PREPARE ins AS INSERT INTO test VALUES ($1))
cursor.execute(PREPARE sel AS SELECT 1)
conn.commit()
start = time()
tx = 0
N = 100
d = 0
while d  10:
for n in xrange( N ):
cursor.execute(EXECUTE ins(%s), (tx,));
#~ conn.commit()
#~ cursor.execute(EXECUTE sel );
conn.commit()
d = time() - start
tx += N
print result : %d tps % (tx / d)
cursor.execute(DROP TABLE test);
conn.commit();

Results (Core 2 quad, ubuntu 10.04 64 bits) :

SELECT 1 : 21000 queries/s (I'd say 50 us per query isn't bad !)
INSERT with commit every 100 inserts : 17800 insets/s
INSERT with commit every INSERT : 7650 tps

fsync is on but not synchronous_commit.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-16 Thread Pierre C



I'm not surprised that Python add is so slow, but I am surprised that
I didn't remember it was... ;-)


it's not the add(), it's the time.time()...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-09 Thread Pierre C
Within the data to import most rows have 20 till 50 duplicates.  
Sometime much more, sometimes less.
 In that case (source data has lots of redundancy), after importing the  
data chunks in parallel, you can run a first pass of de-duplication on  
the chunks, also in parallel, something like :

 CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;
 or you could compute some aggregates, counts, etc. Same as before, no  
WAL needed, and you can use all your cores in parallel.
  From what you say this should reduce the size of your imported data  
by a lot (and hence the time spent in the non-parallel operation).


Thank you very much for this advice. I've tried it inanother project  
with similar import-problems. This really speed the import up.


Glad it was useful ;)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Large (almost 50%!) performance drop after upgrading to 8.4.4?

2010-06-09 Thread Pierre C


Can you give the config params for those :

fsync =
synchronous_commit =
wal_sync_method =

Also, some vmstat 1 output during the runs would be interesting.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-07 Thread Pierre C


Within the data to import most rows have 20 till 50 duplicates. Sometime  
much more, sometimes less.


In that case (source data has lots of redundancy), after importing the  
data chunks in parallel, you can run a first pass of de-duplication on the  
chunks, also in parallel, something like :


CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;

or you could compute some aggregates, counts, etc. Same as before, no WAL  
needed, and you can use all your cores in parallel.


From what you say this should reduce the size of your imported data by a  
lot (and hence the time spent in the non-parallel operation).


With a different distribution, ie duplicates only between existing and  
imported data, and not within the imported data, this strategy would be  
useless.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to insert a bulk of data with unique-violations very fast

2010-06-06 Thread Pierre C

Since you have lots of data you can use parallel loading.

Split your data in several files and then do :

CREATE TEMPORARY TABLE loader1 ( ... )
COPY loader1 FROM ...

Use a TEMPORARY TABLE for this : you don't need crash-recovery since if  
something blows up, you can COPY it again... and it will be much faster  
because no WAL will be written.


If your disk is fast, COPY is cpu-bound, so if you can do 1 COPY process  
per core, and avoid writing WAL, it will scale.


This doesn't solve the other half of your problem (removing the  
duplicates) which isn't easy to parallelize, but it will make the COPY  
part a lot faster.


Note that you can have 1 core process the INSERT / removing duplicates  
while the others are handling COPY and filling temp tables, so if you  
pipeline it, you could save some time.


Does your data contain a lot of duplicates, or are they rare ? What  
percentage ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-06-02 Thread Pierre C


As promised, I did a tiny benchmark - basically, 8 empty tables are  
filled with 100k rows each within 8 transactions (somewhat typically for  
my application). The test machine has 4 cores, 64G RAM and RAID1 10k  
drives for data.


# INSERTs into a TEMPORARY table:
[joac...@testsrv scaling]$ time pb query -d scaling_qry_1.xml

real3m18.242s
user1m59.074s
sys 1m51.001s

# INSERTs into a standard table:
[joac...@testsrv scaling]$ time pb query -d scaling_qry_1.xml

real3m35.090s
user2m5.295s
sys 2m2.307s

Thus, there is a slight hit of about 10% (which may even be within  
meausrement variations) - your milage will vary.


Usually WAL causes a much larger performance hit than this.

Since the following command :

CREATE TABLE tmp AS SELECT n FROM generate_series(1,100) AS n

which inserts 1M rows takes 1.6 seconds on my desktop, your 800k rows  
INSERT taking more than 3 minutes is a bit suspicious unless :


- you got huge fields that need TOASTing ; in this case TOAST compression  
will eat a lot of CPU and you're benchmarking TOAST, not the rest of the  
system

- you got some non-indexed foreign key
- some other reason ?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Bulk Delete

2010-05-17 Thread Pierre C

DELETE FROM table1 WHERE table2_id = ?


For bulk deletes, try :

DELETE FROM table1 WHERE table2_id IN (list of a few thousands ids)

- or use a JOIN delete with a virtual VALUES table
- or fill a temp table with ids and use a JOIN DELETE

This will save cliet/server roundtrips.

Now, something that can make a DELETE very slow is a non-indexed ON DELETE  
CASCADE foreign key : when you DELETE FROM table1 and it cascades to a  
DELETE on table2, and you forget the index on table2. Also check the time  
spent in triggers. Do you have a GIN index ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Replacing Cursors with Temporary Tables

2010-04-24 Thread Pierre C


FYI, I had a query like this :

(complex search query ORDER BY foo LIMIT X)
LEFT JOIN objects_categories oc
LEFT JOIN categories c
GROUP BY ...
(more joins)
ORDER BY foo LIMIT X

Here, we do a search on objects (i'm not gonna give all the details,  
they're not interesting for the problem at hand).
Point is that these objects can belong to several categories, so I need to  
perform a GROUP BY with array_agg() somewhere unless I want the JOIN to  
return several rows per object, which is not what I want. This makes the  
query quite complicated...


I ended up rewriting it like this :

(complex search query ORDER BY foo LIMIT X)
LEFT JOIN
(SELECT .. FROM objects_categories oc
 LEFT JOIN categories c
 GROUP BY ...
) ON ...
(more joins)
ORDER BY foo LIMIT X

Basically moving the aggregates into a separate query. It is easier to  
handle.


I tried to process it like this, in a stored proc :

- do the (complex search query ORDER BY foo LIMIT X) alone and stuff it in  
a cursor

- extract the elements needed into arrays (mostly object_id)
- get the other information as separate queries like :

SELECT object_id, category_id, category_name
FROM objects_categories JOIN categories ON ...
WHERE object_id =ANY( my_array );

and return the results into cursors, too.

Or like this (using 2 cursors) :

SELECT object_id, array_agg(category_id) FROM objects_categories WHERE  
object_id =ANY( my_array );


SELECT category_id, category_name, ...
FROM categories WHERE category_id IN (
 SELECT category_id FROM objects_categories WHERE object_id =ANY( my_array  
));


I found it to be quite faster, and it also simplifies my PHP code. From  
PHP's point of view, it is simpler to get a cursor that returns the  
objects, and separate cursors that can be used to build an in-memory PHP  
hashtable of only the categories we're going to display. Also, it avoids  
retrieving lots of data multiple times, since many objects will belong to  
the same categories. With the second example, I can use my ORM to  
instantiate only one copy of each.


It would be quite useful if we could SELECT from a cursor, or JOIN a  
cursor to an existing table...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] count is ten times faster

2010-04-13 Thread Pierre C


How do you explain the cost is about ten times lower in the 2nd query  
than the first ?


Function call cost ?

Can you EXPLAIN ANALYZE ?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] planer chooses very bad plan

2010-04-12 Thread Pierre C


explain analyze SELECT * FROM telegrams WHERE ((recipient_id=508933  
AND recipient_deleted=FALSE) OR (user_id=508933 AND user_deleted=FALSE))  
ORDER BY id DESC LIMIT 10 OFFSET 0


If you need very fast performance on this query, you need to be able to  
use the index for ordering.


Note that the following query will only optimize the first page of results  
in the case you want to display BOTH sent and received telegrams.



- Create an index on (recipient_id, id) WHERE NOT recipient_deleted
- Create an index on (user_id, id) WHERE NOT user_deleted
- Drop redundant indexes (recipient_id) and (user_id)

SELECT * FROM (
SELECT * FROM telegrams WHERE recipient_id=508933 AND  
recipient_deleted=FALSE ORDER BY id DESC LIMIT 10

UNION ALL
SELECT * FROM telegrams WHERE user_id=508933 AND user_deleted=FALSE  
ORDER BY id DESC LIMIT 10

) AS foo ORDER BY id DESC LIMIT 10;

These indexes will also optimize the queries where you only display the  
inbox and outbox, in which case it will be able to use the index for  
ordering on any page, because there will be no UNION.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: *** PROBABLY SPAM *** [PERFORM] Does the psql executable support a fetch many approach when dumping large queries to stdout?

2010-04-02 Thread Pierre C
Does the psql executable have any ability to do a fetch many, using a  
server-side named cursor, when returning results? It seems like it tries  
to retrieve the query entirely to local memory before printing to  
standard out.


I think it grabs the whole result set to calculate the display column  
widths. I think there is an option to tweak this but don't remember which,  
have a look at the psql commands (\?), formatting section.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to fast the REINDEX

2010-04-01 Thread Pierre C


So am I to understand I don't need to do daily reindexing as a  
maintenance measure with 8.3.7 on FreeBSD.


Sometimes it's better to have indexes with some space in them so every  
insert doesn't hit a full index page and triggers a page split to make  
some space.

Of course if the index is 90% holes you got a problem ;)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Database size growing over time and leads to performance impact

2010-03-27 Thread Pierre C

1. VACUUM FULL ANALYZE once in a week during low-usage time and


VACUUM FULL compacts tables, but tends to bloat indexes. Running it weekly  
is NOT RECOMMENDED.


A correctly configured autovacuum (or manual vacuum in some circumstances)  
should maintain your DB healthy and you shouldn't need VACUUM FULL.


If you realize you got a bloat problem, for instance due to a  
misconfigured vacuum, use CLUSTER, which re-generates table AND index  
data, and besides, having your table clustered on an index of your choice  
can boost performance quite a lot in some circumstances.


8.2 is so old I don't remember if autovacuum is even included. Please try  
upgrading to the latest version...


Since your database probably fits in RAM, CLUSTER will be pretty fast.
You can schedule it weekly, if you need clustering. If you don't,  
autovacuum will suffice.
Hint : add a SELECT count(*) FROM yourtable; before CLUSTER yourtable;  
so that the table is pulled in the OS disk cache, it'll make CLUSTER  
faster.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why Wal_buffer is 64KB

2010-03-26 Thread Pierre C



After fsync/syncronous_commit off


Do not use fsync off, it is not safe. Who cares about the performance of  
fsync=off, when in practice you'd never use it with real data.

synchronnous_commit=off is fine for some applications, though.

More info is needed about your configuration (hardware, drives, memory,  
etc).


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why Wal_buffer is 64KB

2010-03-25 Thread Pierre C


If you do large transactions, which emits large quantities of xlog, be  
aware that while the previous xlog segment is being fsynced, no new writes  
happen to the next segment. If you use large wal_buffers (more than 16 MB)  
these buffers can absorb xlog data while the previous segment is being  
fsynced, which allows a higher throughput. However, large wal_buffers also  
mean the COMMIT of small transactions might find lots of data in the  
buffers that noone has written/synced yet, which isn't good. If you use  
dedicated spindle(s) for the xlog, you can set the walwriter to be  
extremely aggressive (write every 5 ms for instance) and use fdatasync.  
This way, at almost every rotation of the disk, xlog gets written. I've  
found this configuration gives increased throughput, while not  
compromising latency, but you need to test it for yourself, it depends on  
your whole system.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-25 Thread Pierre C

Hannu Krosing wrote:

Pulling the plug should not corrupt a postgreSQL database, unless it was
using disks which lie about write caching.

Didn't we recently put the old wife's 'the disks lied' tale to bed in  
favour of actually admiting that some well known filesystems and  
saftware raid systems have had trouble with their write barriers?


I put a cheap UPS on the home server (which uses Software RAID) precisely  
because I don't really trust that stuff, and there is also the RAID5 write  
hole... and maybe the RAID1 write hole too... and installing a UPS takes  
less time that actually figuring out if the system is power-loss-safe.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-22 Thread Pierre C
On Mon, 22 Mar 2010 12:14:51 +0100, Merlin Moncure mmonc...@gmail.com  
wrote:



On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote:
Note however that Oracle offeres full transactionality and does in  
place row

updates. There is more than one way to do it.


There's no free lunch.


MVCC : VACUUM
Oracle : Rollback Segments
MyISAM : no concurrency/transactions

It's all about which compromise suits you ;)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-22 Thread Pierre C


This is one of the more-or-less solved problems in Unix/Linux.  Ext*  
file systems have a reserve usually of 10% of the disk space that  
nobody except root can use.  It's not for root, it's because with 10%  
of the disk free, you can almost always do a decent job of allocating  
contiguous blocks and get good performance.  Unless Postgres has some  
weird problem that Linux has never seen before (and that wouldn't be  
unprecedented...), there's probably no need to fool with  
file-allocation strategies.


Craig


Its fairly easy to break.  Just do a parallel import with say, 16  
concurrent tables being written to at once.  Result?  Fragmented tables.


Delayed allocation (ext4, XFS) helps a lot for concurrent writing at a  
medium-high rate (a few megabytes per second and up) when lots of data can  
sit in the cache and be flushed/allocated as big contiguous chunks. I'm  
pretty sure ext4/XFS would pass your parallel import test.


However if you have files like tables (and indexes) or logs that grow  
slowly over time (something like a few megabytes per hour or less), after  
a few days/weeks/months, horrible fragmentation is an almost guaranteed  
result on many filesystems (NTFS being perhaps the absolute worst).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-19 Thread Pierre C


I also wonder why the reported runtime of 5.847 ms is so much different  
to the runtime reported of my scripts (both php and ruby are almost the  
same). What's the best tool to time queries in postgresql? Can this be  
done from pgadmin?


I've seen differences like that. Benchmarking isn't easy. The client  
libraries, the particular language bindings you use, the connection... all  
that can add overhead that is actually mych larger that what you're trying  
to measure.


- On localhost, some MySQL distros will default to a UNIX Socket, some  
Postgres distros will default to a TCP socket, or even SSL, and vice versa.


Needless to say, on a small query like SELECT * FROM users WHERE  
user_id=$1, this makes a lot of difference, since the query time (just a  
few tens of microseconds) is actually shorter than the TCP overhead.  
Depending on how you connect you can get a 2-3x variation in throughput  
with client and server on the same machine, just between TCP and UNIX  
socket.


On queries that retrieve lots of data, overheads are also quite different  
(especially with SSL...)


- I've seen postgres saturate a 1 GB/s ethernet link between server and  
client during benchmark.


- Performance depends a LOT on your language bindings. For instance :

php : PDO is quite a lot slower than pg_query() especially if you use  
prepared statements which are used only once,
python : psycopg, pygresql, mysql-python behave quite differently (psycopg  
being by far the fastest of the bunch), especially when retrieving lots of  
results, and converting those results back to python types...


So, what are you benchmarking exactly ?...

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-17 Thread Pierre C

I was thinking in something like that, except that the factor I'd use
would be something like 50% or 100% of current size, capped at (say) 1  
GB.


Using fallocate() ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Pierre C


-My warnings about downsides related to checkpoint issues with larger  
buffer pools isn't an opinion at all; that's a fact based on limitations  
in how Postgres does its checkpoints.  If we get something more like  
Oracle's incremental checkpoint logic, this particular concern might go  
away.


Does PG issue checkpoint writes in sorted order ?

I wonder about something, too : if your DB size is smaller than RAM, you  
could in theory set shared_buffers to a size larger than your DB provided  
you still have enough free RAM left for work_mem and OS writes management.  
How does this interact with the logic which prevents seq-scans hogging  
shared_buffers ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] shared_buffers advice

2010-03-16 Thread Pierre C
I think the logic you are referring to is the clock sweep buffer  
accounting

scheme.  That just makes sure that the most popular pages stay in the
buffers.  If your entire db fits in the buffer pool then it'll all get in
there real fast.



Actually, I meant that in the case of a seq scan, PG will try to use just  
a few buffers (a ring) in shared_buffers instead of thrashing the whole  
buffers. But if there was actually a lot of free space in shared_buffers,  
do the pages stay, or do they not ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 10K vs 15k rpm for analytics

2010-03-09 Thread Pierre C
On Tue, 09 Mar 2010 08:00:50 +0100, Greg Smith g...@2ndquadrant.com  
wrote:



Scott Carey wrote:
For high sequential throughput, nothing is as optimized as XFS on Linux  
yet.  It has weaknesses elsewhere however.




When files are extended one page at a time (as postgres does)  
fragmentation can be pretty high on some filesystems (ext3, but NTFS is  
the absolute worst) if several files (indexes + table) grow  
simultaneously. XFS has delayed allocation which really helps.



I'm curious what you feel those weaknesses are.


Handling lots of small files, especially deleting them, is really slow on  
XFS.

Databases don't care about that.

There is also the dark side of delayed allocation : if your application is  
broken, it will manifest itself very painfully. Since XFS keeps a lot of  
unwritten stuff in the buffers, an app that doesn't fsync correctly can  
lose lots of data if you don't have a UPS.


Fortunately, postgres handles fsync like it should be.

A word of advice though : a few years ago, we lost a few terabytes on XFS  
(after that, restoring from backup was quite slow !) because a faulty SCSI  
cable crashed the server, then crashed it again during xfsrepair. So if  
you do xfsrepair on a suspicious system, please image the disks first.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 10K vs 15k rpm for analytics

2010-03-03 Thread Pierre C


With 24 drives it'll probably be the controller that is the limiting  
factor of bandwidth. Our HP SAN controller with 28 15K drives delivers  
170MB/s at maximum with raid 0 and about 155MB/s with raid 1+0.


I get about 150-200 MB/s on  a linux software RAID of 3 cheap Samsung  
SATA 1TB drives (which is my home multimedia server)...
IOPS would be of course horrendous, that's RAID-5, but that's not the  
point here.


For raw sequential throughput, dumb drives with dumb software raid can be  
pretty fast, IF each drive has a dedicated channel (SATA ensures this) and  
the controller is on a fast PCIexpress (in my case, chipset SATA  
controller).


I don't suggest you use software RAID with cheap consumer drives, just  
that any expensive setup that doesn't deliver MUCH more performance that  
is useful to you (ie in your case sequential IO) maybe isn't worth the  
extra price... There are many bottlenecks...


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-03-02 Thread Pierre C



I always assumed SCSI disks had a write-through cache and therefore
didn't need a drive cache flush comment.


Maximum performance can only be reached with a writeback cache so the  
drive can reorder and cluster writes, according to the realtime position  
of the heads and platter rotation.


The problem is not the write cache itself, it is that, for your data to be  
safe, the flush cache or barrier command must get all the way through  
the application / filesystem to the hardware, going through a nondescript  
number of software/firmware/hardware layers, all of which may :


- not specify if they honor or ignore flush/barrier commands, and which  
ones

- not specify if they will reordre writes ignoring barriers/flushes or not
- have been written by people who are not aware of such issues
- have been written by companies who are perfectly aware of such issues  
but chose to ignore them to look good in benchmarks

- have some incompatibilities that result in broken behaviour
- have bugs

As far as I'm concerned, a configuration that doesn't properly respect the  
commands needed for data integrity is broken.


The sad truth is that given a software/hardware IO stack, there's no way  
to be sure, and testing isn't easy, if at all possible to do. Some cache  
flushes might be ignored under some circumstances.


For this to change, you don't need a hardware change, but a mentality  
change.


Flash filesystem developers use flash simulators which measure wear  
leveling, etc.


We'd need a virtual box with a simulated virtual harddrive which is able  
to check this.


What a mess.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Extracting superlatives - SQL design philosophy

2010-02-25 Thread Pierre C



-- More explicit
select aggregate_using(max(date), city, temp, date) from bar group by
city, temp order by city;


select city, max(ROW(temp, date)) from bar group by city;

Does not work (alas) for lack of a default comparison for record type.

Another solution, which works wonders if you've got the list of cities in  
a separate table, and an index on (city, temp) is this :


SELECT c.city, (SELECT ROW( t.date, t.temp ) FROM cities_temp t WHERE  
t.city=c.city ORDER BY temp DESC LIMIT 1) FROM cities;


This will do a nested loop index scan and it is the fastest way, except if  
you have very few rows per city.
The syntax is ugly and you have to extract the stuff from the ROW()  
afterwards, though.


Unfortunately, this does not work :

SELECT c.city, (SELECT t.date, t.temp FROM cities_temp t WHERE  
t.city=c.city ORDER BY temp DESC LIMIT 1) AS m FROM cities;


because the subselect isn't allowed to return more than 1 column.

Note that you can also get the usually annoying top-N by category to use  
the index by doing something like :


SELECT c.city, (SELECT array_agg(date) FROM (SELECT t.date FROM  
cities_temp t WHERE t.city=c.city ORDER BY temp DESC LIMIT 5)) AS m FROM  
cities;


The results aren't in a very usable form either, but :

CREATE INDEX ti ON annonces( type_id, price ) WHERE price IS NOT NULL;

EXPLAIN ANALYZE SELECT
t.id, (SELECT ROW(a.id, a.price, a.date_annonce)
FROM annonces a
WHERE a.type_id = t.id AND price IS NOT NULL
ORDER BY price DESC LIMIT 1)
FROM types_bien t;
  QUERY  
PLAN

--
 Seq Scan on types_bien t  (cost=0.00..196.09 rows=57 width=4) (actual  
time=0.025..0.511 rows=57 loops=1)

   SubPlan 1
 -  Limit  (cost=0.00..3.41 rows=1 width=16) (actual  
time=0.008..0.008 rows=1 loops=57)
   -  Index Scan Backward using ti on annonces a   
(cost=0.00..8845.65 rows=2592 width=16) (actual time=0.007..0.007 rows=1  
loops=57)

 Index Cond: (type_id = $0)
 Total runtime: 0.551 ms

explain analyze
select distinct type_id, first_value(price) over w as max_price
from annonces where price is not null
window w as (partition by type_id order by price desc);
 QUERY PLAN

 HashAggregate  (cost=30515.41..30626.87 rows=11146 width=10) (actual  
time=320.927..320.971 rows=46 loops=1)
   -  WindowAgg  (cost=27729.14..29958.16 rows=111451 width=10) (actual  
time=195.289..282.150 rows=111289 loops=1)
 -  Sort  (cost=27729.14..28007.76 rows=111451 width=10) (actual  
time=195.278..210.762 rows=111289 loops=1)

   Sort Key: type_id, price
   Sort Method:  quicksort  Memory: 8289kB
   -  Seq Scan on annonces  (cost=0.00..18386.17 rows=111451  
width=10) (actual time=0.009..72.589 rows=111289 loops=1)

 Filter: (price IS NOT NULL)
 Total runtime: 322.382 ms

Here using the index is 600x faster... worth a bit of ugly SQL, you decide.

By disabling seq_scan and bitmapscan, you can corecr this plan :

EXPLAIN ANALYZE SELECT DISTINCT ON (type_id) type_id, date_annonce, price  
FROM annonces WHERE price IS NOT NULL ORDER BY type_id, price LIMIT 40;

QUERY PLAN
--
 Limit  (cost=0.00..78757.61 rows=33 width=14) (actual time=0.021..145.509  
rows=40 loops=1)
   -  Unique  (cost=0.00..78757.61 rows=33 width=14) (actual  
time=0.021..145.498 rows=40 loops=1)
 -  Index Scan using ti on annonces  (cost=0.00..78478.99  
rows=111451 width=14) (actual time=0.018..132.671 rows=110796 loops=1)

 Total runtime: 145.549 ms

This plan would be very bad (unless the whole table is in RAM) because I  
guess the index scan isn't aware of the DISTINCT ON, so it scans all rows  
in the index and in the table.









--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SSD + RAID

2010-02-23 Thread Pierre C

Note that's power draw per bit.  dram is usually much more densely
packed (it can be with fewer transistors per cell) so the individual
chips for each may have similar power draws while the dram will be 10
times as densely packed as the sram.


Differences between SRAM and DRAM :

- price per byte (DRAM much cheaper)

- silicon area per byte (DRAM much smaller)

- random access latency
   SRAM = fast, uniform, and predictable, usually 0/1 cycles
   DRAM = a few up to a lot of cycles depending on chip type,
   which page/row/column you want to access, wether it's R or W,
   wether the page is already open, etc

In fact, DRAM is the new harddisk. SRAM is used mostly when low-latency is  
needed (caches, etc).


- ease of use :
   SRAM very easy to use : address, data, read, write, clock.
   SDRAM needs a smart controller.
   SRAM easier to instantiate on a silicon chip

- power draw
   When used at high speeds, SRAM ist't power-saving at all, it's used for  
speed.

   However when not used, the power draw is really negligible.

While it is true that you can recover *some* data out of a SRAM/DRAM chip  
that hasn't been powered for a few seconds, you can't really trust that  
data. It's only a forensics tool.


Most DRAM now (especially laptop DRAM) includes special power-saving modes  
which only keep the data retention logic (refresh, etc) powered, but not  
the rest of the chip (internal caches, IO buffers, etc). Laptops, PDAs,  
etc all use this feature in suspend-to-RAM mode. In this mode, the power  
draw is higher than SRAM, but still pretty minimal, so a laptop can stay  
in suspend-to-RAM mode for days.


Anyway, the SRAM vs DRAM isn't really relevant for the debate of SSD data  
integrity. You can backup both with a small battery of ultra-cap.


What is important too is that the entire SSD chipset must have been  
designed with this in mind : it must detect power loss, and correctly  
react to it, and especially not reset itself or do funny stuff to the  
memory when the power comes back. Which means at least some parts of the  
chipset must stay powered to keep their state.


Now I wonder about something. SSDs use wear-leveling which means the  
information about which block was written where must be kept somewhere.  
Which means this information must be updated. I wonder how crash-safe and  
how atomic these updates are, in the face of a power loss.  This is just  
like a filesystem. You've been talking only about data, but the block  
layout information (metadata) is subject to the same concerns. If the  
drive says it's written, not only the data must have been written, but  
also the information needed to locate that data...


Therefore I think the yank-the-power-cord test should be done with random  
writes happening on an aged and mostly-full SSD... and afterwards, I'd be  
interested to know if not only the last txn really committed, but if some  
random parts of other stuff weren't wear-leveled into oblivion at the  
power loss...







--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plpgsql plan cache

2010-02-22 Thread Pierre C


I cannot understand why the index is not being used when in the plpgsql  
function?
I even tried to make a test function containing nothing more than the  
single query. Still the index is not being used.
When running the same query in the sql prompt, the index is in use  
though.


Please post the following :

- EXPLAIN ANALYZE your query directly in psql
- PREPARE testq AS your query
- EXPLAIN ANALYZE EXECUTE testq( your parameters )

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] plpgsql plan cache

2010-02-22 Thread Pierre C


Actually, planner was smart in using a bitmap index scan in the prepared  
query. Suppose you later EXECUTE that canned plan with a date range which  
covers say half of the table : the indexscan would be a pretty bad choice  
since it would have to access half the rows in the table in index order,  
which is potentially random disk IO. Bitmap Index Scan is slower in your  
high-selectivity case, but it can withstand much more abuse on the  
parameters.


PG supports the quite clever syntax of EXECUTE 'blah' USING params, you  
don't even need to mess with quoting.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance