Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-12 Thread Robert Haas
On Thu, Dec 2, 2010 at 3:36 AM, Mario Splivalo
mario.spliv...@megafon.hr wrote:
 On 12/01/2010 09:43 AM, Pierre C wrote:

 Note that in both cases postgres reports that the FK checks take 92-120
 milliseconds... which is a normal time for about 4000 rows.
 Inserting 4000 lines with just a few fields like you got should take
 quite much less than 1 s...

 Where the rest of the time goes, I have no idea. Disk thrashing ? Locks
 ? Gremlins ?

 - try it on a fresh copy of all your tables (CREATE TABLE, INSERT INTO
 SELECT)
 - try to put the WAL on a separate physical disk (or do a check with
 fsync=off)
 - try it on another computer
 - try it on another harddisk
 - run oprofile on a debug compile of postgres
 - it could even be the process title updates (I don't think so but...)
 - try a ramdisk tablespace

 I'm allready running it with fsync=off. The funny thing is, as I add new
 realm it runs fine until the history table grows around 5M rows. After that
 the slowdown is huge.

Perhaps - that's the point at which the WAL volume becomes large
enough to force a checkpoint in the middle of the operation?  You
might try turning on log_checkpoints.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-02 Thread Mario Splivalo

On 12/01/2010 10:43 PM, Pierre C wrote:

On Wed, 01 Dec 2010 18:24:35 +0100, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:


Mladen Gogala mladen.gog...@vmsinfo.com wrote:


There is a operating system which comes with a very decent extent
based file system and a defragmentation tool, included in the OS.
The file system is called NTFS

Been there, done that. Not only was performance quite poor compared
to Linux, but reliability and staff time to manage things suffered
in comparison to Linux.


Please don't start with NTFS. It is the worst excuse for a filesystem
I've ever seen.


It is OT, but, could you please shead just some light on that? Part of 
my next project is to test performance of pg9 on both windows and linux 
systems so I'd appreciate any data/info you both may have.


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-02 Thread Kevin Grittner
Mario Splivalo mario.spliv...@megafon.hr wrote:
 
 It is OT, but, could you please shead just some light on that?
 Part of my next project is to test performance of pg9 on both
 windows and linux systems so I'd appreciate any data/info you both
 may have.
 
I don't know how much was the filesystem, but with both tuned to the
best of our ability Linux on xfs ran much faster than Windows on
NTFS.  The lack of atomic operations and a lockfile utility on
Windows/NTFS was something of a handicap.  I have found Linux to be
much more reliable and (once I got my bash scripting knowledge of
common Linux utilities to a certain level), much easier to
administer.  Getting my head around xargs was, I think, the tipping
point.  ;-)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mario Splivalo
On 12/01/2010 01:51 AM, Pierre C wrote:
 
 Now I tried removing the constraints from the history table (including
 the PK) and the inserts were fast. After few 'rounds' of inserts I
 added constraints back, and several round after that were fast again.
 But then all the same. Insert of some 11k rows took 4 seconds (with
 all constraints) and now the last one of only 4k rows took one minute.
 I did vacuum after each insert.


 Mario
 
 Hm, so for each line of drones_history you insert, you also update the
 correspoding drones table to reflect the latest data, right ?

Yes.

 How many times is the same row in drones updated ? ie, if you insert N
 rows in drones_nistory, how may drone_id's do you have ?

Just once.

If I have 5000 lines in CSV file (that I load into 'temporary' table
using COPY) i can be sure that drone_id there is PK. That is because CSV
file contains measurements from all the drones, one measurement per
drone. I usualy have around 100 new drones, so I insert those to drones
and to drones_history. Then I first insert into drones_history and then
update those rows in drones. Should I try doing the other way around?

Although, I think I'm having some disk-related problems because when
inserting to the tables my IO troughput is pretty low. For instance,
when I drop constraints and then recreate them that takes around 15-30
seconds (on a 25M rows table) - disk io is steady, around 60 MB/s in
read and write.

It just could be that the ext3 partition is so fragmented. I'll try
later this week on a new set of disks and ext4 filesystem to see how it
goes.

Mario

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mario Splivalo
On 12/01/2010 02:47 AM, Joshua D. Drake wrote:
 On Sun, 2010-11-28 at 12:46 +0100, Mario Splivalo wrote:
 The database for monitoring certain drone statuses is quite simple:

 
 This is the slow part:
 INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
 drone_temperature, drone_pressure)
 SELECT * FROM tmpUpdate;

 For 100 rows this takes around 2 seconds. For 1000 rows this takes 
 around 40 seconds. For 5000 rows this takes around 5 minutes.
 For 50k rows this takes around 30 minutes! Now this is where I start lag 
 because I get new CSV every 10 minutes or so.
 
 Have you considered making the foreign key check deferrable?
 

Yes, as Mladen Gogala had advised. No noticable change in performance -
it's still slow :)

But, just for the sake of clarification - I tought that DEFERRABLE would
matter if I do a lot of INSERTs, inside a FOR loop or something like
that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference?

Mario

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Pierre C



Just once.


OK, another potential problem eliminated, it gets strange...


If I have 5000 lines in CSV file (that I load into 'temporary' table
using COPY) i can be sure that drone_id there is PK. That is because CSV
file contains measurements from all the drones, one measurement per
drone. I usualy have around 100 new drones, so I insert those to drones
and to drones_history. Then I first insert into drones_history and then
update those rows in drones. Should I try doing the other way around?


No, it doesn't really matter.


Although, I think I'm having some disk-related problems because when
inserting to the tables my IO troughput is pretty low. For instance,
when I drop constraints and then recreate them that takes around 15-30
seconds (on a 25M rows table) - disk io is steady, around 60 MB/s in
read and write.

It just could be that the ext3 partition is so fragmented. I'll try
later this week on a new set of disks and ext4 filesystem to see how it
goes.


If you CLUSTER a table, it is entirely rebuilt so if your disk free space  
isn't heavily fragmented, you can hope the table and indexes will get  
allocated in a nice contiguous segment.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Pierre C


So, I did. I run the whole script in psql, and here is the result for  
the INSERT:


realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id,  
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM  
tmp_drones_history;
   QUERY PLAN  
--
  Seq Scan on tmp_drones_history  (cost=0.00..81.60 rows=4160 width=48)  
(actual time=0.008..5.296 rows=5150 loops=1)
  Trigger for constraint drones_history_fk__drones: time=92.948  
calls=5150

  Total runtime: 16779.644 ms
(3 rows)


Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
The batch before, I run that one 'the usual way', it inserted 9922 rows,  
and it took 1 minute and 16 seconds.


I did not, however, enclose the process into begin/end.

So, here are results when I, in psql, first issued BEGIN:

realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id,  
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM  
tmp_drones_history;
   QUERY PLAN  
--
  Seq Scan on tmp_drones_history  (cost=0.00..79.56 rows=4056 width=48)  
(actual time=0.008..6.490 rows=5059 loops=1)
  Trigger for constraint drones_history_fk__drones: time=120.224  
calls=5059

  Total runtime: 39658.250 ms
(3 rows)

Time: 39658.906 ms



Mario



Note that in both cases postgres reports that the FK checks take 92-120  
milliseconds... which is a normal time for about 4000 rows.
Inserting 4000 lines with just a few fields like you got should take quite  
much less than 1 s...


Where the rest of the time goes, I have no idea. Disk thrashing ? Locks ?  
Gremlins ?


- try it on a fresh copy of all your tables (CREATE TABLE, INSERT INTO  
SELECT)
- try to put the WAL on a separate physical disk (or do a check with  
fsync=off)

- try it on another computer
- try it on another harddisk
- run oprofile on a debug compile of postgres
- it could even be the process title updates (I don't think so but...)
- try a ramdisk tablespace

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala




Mario Splivalo wrote:

  
Yes, as Mladen Gogala had advised. No noticable change in performance -
it's still slow :)
  


Declaring constraints as deferrable  doesn't do anything as such, you
have to actually set the constraints deferred to have an effect. You
have to do it within a transaction block. If done outside of the
transaction block, there is no effect:

This is what happens when "set constraints" is issued outside the
transaction block:

 constraint test1_pk primary key(col1)
deferrable);    
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test1_pk" for table "test1"
CREATE TABLE
Time: 41.218 ms
scott=# set constraints all deferred;   
SET CONSTRAINTS
Time: 0.228 ms
scott=# begin;  
BEGIN
Time: 0.188 ms
scott=#  insert into test1 values(1);   
INSERT 0 1
Time: 0.929 ms
scott=#  insert into test1 values(1);   
ERROR:  duplicate key value violates unique constraint "test1_pk"
DETAIL:  Key (col1)=(1) already exists.
scott=# end;
ROLLBACK
Time: 0.267 ms
scott=# 
  

It works like a charm when issued within the transaction block:
scott=# begin;  
BEGIN
Time: 0.202 ms
scott=# set constraints all deferred;   
SET CONSTRAINTS
Time: 0.196 ms
scott=#  insert into test1 values(1);   
INSERT 0 1
Time: 0.334 ms
scott=#  insert into test1 values(1);   
INSERT 0 1
Time: 0.327 ms
scott=# end;
ERROR:  duplicate key value violates unique constraint "test1_pk"
DETAIL:  Key (col1)=(1) already exists.
scott=# 

I was able to insert the same value twice, it only failed at the end of
the transaction.

  
But, just for the sake of clarification - I tought that DEFERRABLE would
matter if I do a lot of INSERTs, inside a FOR loop or something like
that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference?
  

You cannot tell which part takes a long time, select or insert, without
profiling. I certainly cannot do it over the internet.

-- 
 
Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions







Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mario Splivalo

On 12/01/2010 05:34 PM, Mladen Gogala wrote:

Mario Splivalo wrote:



Yes, as Mladen Gogala had advised. No noticable change in performance -
it's still slow :)



Declaring constraints as deferrable  doesn't do anything as such, you
have to actually set the constraints deferred to have an effect. You
have to do it within a transaction block. If done outside of the
transaction block, there is no effect:


I understand, I did as you suggested.

Begin; Set constraints all deferred; select my_insert_drones_function(); 
commit




I was able to insert the same value twice, it only failed at the end of
the transaction.

But, just for the sake of clarification - I tought that DEFERRABLE would
matter if I do a lot of INSERTs, inside a FOR loop or something like
that. Since I'm doing INSERT INTO ... SELECT, does it makes any difference?


You cannot tell which part takes a long time, select or insert, without
profiling. I certainly cannot do it over the internet.


If I first select to a dummy temprary table, that SELECT is fast. Just 
INSERT INTO SELECT is slow.


I'll try what Pierre suggested, on whole new filesystem. This one did 
get quite filled with thousands of files that I deleted while the 
database was working.


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala

Mario Splivalo wrote:
I'll try what Pierre suggested, on whole new filesystem. This one did 
get quite filled with thousands of files that I deleted while the 
database was working.


Mario
  


Yes, that is a good idea. That's the reason why we need a 
defragmentation tool on Linux. Unfortunately, the only file system that 
currently has a decent defragmentation tool is XFS and that is a paid 
option, at least with Red Hat. Greg Smith has recently posted a 
wonderful review of PostgreSQL on various file systems:


http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html

There is a operating system which comes with a very decent extent based 
file system and a defragmentation tool, included in the OS. The file 
system is called NTFS and company is in the land of Redmond, WA where 
the shadows lie. One OS to rule them all...


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Kenneth Marshall
On Wed, Dec 01, 2010 at 12:15:19PM -0500, Mladen Gogala wrote:
 Mario Splivalo wrote:
 I'll try what Pierre suggested, on whole new filesystem. This one did get 
 quite filled with thousands of files that I deleted while the database was 
 working.

  Mario
   

 Yes, that is a good idea. That's the reason why we need a defragmentation 
 tool on Linux. Unfortunately, the only file system that currently has a 
 decent defragmentation tool is XFS and that is a paid option, at least with 
 Red Hat. Greg Smith has recently posted a wonderful review of PostgreSQL on 
 various file systems:

 http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html

 There is a operating system which comes with a very decent extent based 
 file system and a defragmentation tool, included in the OS. The file system 
 is called NTFS and company is in the land of Redmond, WA where the 
 shadows lie. One OS to rule them all...

 -- 
 Mladen Gogala Sr. Oracle DBA
 1500 Broadway
 New York, NY 10036
 (212) 329-5251
 http://www.vmsinfo.com The Leader in Integrated Media Intelligence 
 Solutions


Redhat6 comes with ext4 which is an extent based filesystem with
decent performance.

Ken

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Kevin Grittner
Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
 There is a operating system which comes with a very decent extent
 based file system and a defragmentation tool, included in the OS.
 The file system is called NTFS
 
Been there, done that.  Not only was performance quite poor compared
to Linux, but reliability and staff time to manage things suffered
in comparison to Linux.
 
We had the luxury of identical hardware and the ability to load
balance a web site with millions of hits per day evenly between them
in both environments, as well as off-line saturation load testing. 
At least for running a PostgreSQL database, my experience suggests
that the only reasonable excuse for running database on a Windows
server is that you're under a mandate from ill-informed managers to
do so.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala

Kenneth Marshall wrote:

Redhat6 comes with ext4 which is an extent based filesystem with
decent performance.

Ken
  
But e4defrag is still not available. And, of course, Red Hat 6 is still 
not available, either.  Maybe Red Hat 7 will do the trick? I assume it 
will work beautifully with PostgreSQL 15.0.


--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-12-01 Thread Mladen Gogala

Kevin Grittner wrote:

Mladen Gogala mladen.gog...@vmsinfo.com wrote:
 
  
 
Been there, done that.  Not only was performance quite poor compared

to Linux, but reliability and staff time to manage things suffered
in comparison to Linux.
 
  
I must say that I am quite impressed with Windows 7 servers, especially 
64 bit version. Unfortunately, I don't have any PostgreSQL instances on 
those, but Exchange works very, very well. Also, personal impressions 
from clicking and running office applications are quite good. Don't get 
me wrong, I am an old Unix/Linux hack and I would like nothing better 
but to see Linux succeed, but I don't like

what I see.

--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Mladen Gogala
I'm just back from vacation, so I apologize in advance if I missed 
anything of importance. Here is something to consider:


Instead of using the statement you used to create the table, try the 
following:


CREATE TABLE drones_history (
drone_id integer not null,
sample_id integer not null,
drone_log_notice character varying,
drone_temperature float,
drone_pressure float,
constraint drones_history_pk primary key (drone_id, sample_id),
	constraint drones_history_fk__samples foreign key (sample_id) 
references samples(sample_id),

constraint drones_history_fk__drones foreign key (drone_id) references  
drones(drone_id) deferrable
);


At the beginning of the load, you should defer all of the deferrable 
constraints, setting constraints deferred and  issuing the copy 
statement within a transaction block, like this:


   scott=# begin; 
   BEGIN

   Time: 0.203 ms
   scott=# set constraints all deferred;
   SET CONSTRAINTS
   Time: 0.201 ms
   scott=# copy test1 from '/tmp/test1.csv';
   COPY 100
   Time: 11.939 ms
   scott=# commit;
   ERROR:  insert or update on table test1 violates foreign key
   constraint fk_tst1_deptno
   DETAIL:  Key (col1)=(1) is not present in table dept.


Of course, that will require complete rewrite of your load script, 
because the errors will be checked at the commit time and transaction 
can either fail as a whole or succeed as a whole. It's all or nothing 
situation. How frequently do you see records with an incorrect drone_id? 
If that happens only once in a blue moon, you may need no stinkin' 
foreign keys in the first place, you may be able
to have a batch job that will flag all the records with an invalid 
drone_id instead.
Furthermore, you can make sure that you have enough shared buffers to 
cache the entire drones table. Also, do strace on the postgres 
process handling your session and see whether the time is spent writing 
to WAL archives. If that is slowing you down, you should consider buying 
a SSD or a high end disk drive. I have never had such problem, but you 
should also check whether pg_loader can do anything for you.


As far as speed is concerned, inserting with deferred foreign keys is 
almost as fast as inserting without foreign keys:



scott=# alter table test1 drop constraint fk_tst1_deptno;
ALTER TABLE
Time: 16.219 ms
scott=# copy test1 from '/tmp/test1.csv';
COPY 100
Time: 10.418 ms

If you take a look at the example above, you will see that inserting 
with a deferred FK took 11.939 milliseconds while inserting into  the 
same table without the FK took 10.418 milliseconds, the difference of 
1.5 milliseconds per 100 rows. The timing of 2 seconds per 100
rows looks suspiciously high.  Me thinks that your problem is not just 
the foreign key, there must be something else devouring the time. You 
should have a test instance, compiled with -g option and do profiling.


Mario Splivalo wrote:

The database for monitoring certain drone statuses is quite simple:

CREATE TABLE samples (
sample_id integer not null primary key,
sample_timestamp timestamp not null default now()
);

CREATE TABLE drones (
drone_id integer not null primary key,
drone_log_notice character varying,
crone_coordinates point not null,
drone_temperature float,
drone_pressure float
);

CREATE TABLE drones_history (
drone_id integer not null,
sample_id integer not null,
drone_log_notice character varying,
drone_temperature float,
drone_pressure float,
constraint drones_history_pk primary key (drone_id, sample_id),
	constraint drones_history_fk__samples foreign key (sample_id) 
references samples(sample_id),
	constraint drones_history_fk__drones foreign key (drone_id) references 
drones(drone_id)

);

Every ten to twenty minutes I receive CSV file with most of the drones 
statuses. CSV file includes data for new drones, if they're put into 
use. When I receive new data I load whole CSV file to a database, then 
call stored procedure that 'deals' with that data.


So far I have around 6000 samples, around 160k drones and drones_history 
is around 25M rows.


The CSV file contains around 15k-20k of 'rows', mostly data about old 
drones. Every now and then (on every 5th - 10th CSV-insert) there is 
data with around 1000-5000 new drones.


Here is what I do in stored procedure, after i COPYed the data from the 
CSV to temporary.drones table:


First, I create temporary table, inside the procedure, that holds rows 
for the new drones:


CREATE TEMPORARY TABLE tmpNew ON COMMIT DROP AS
SELECT drone_id, log_notice, coord_x, coord_y, temp, press
FROM temp.drones WHERE NOT EXISTS (SELECT 1 FROM public.drones WHERE 
public.drones.drone_id = temporary.drone.drone_id);


This is done in miliseconds, even if the count for the new drones is 
large (i've tested it with 10k new drones although I real-life action 
I'd never get more thatn 

Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Mario Splivalo

On 11/29/2010 05:47 PM, Pierre C wrote:

realm_51=# vacuum analyze verbose drones;
INFO: vacuuming public.drones
INFO: scanned index drones_pk to remove 242235 row versions
DETAIL: CPU 0.02s/0.11u sec elapsed 0.28 sec.
INFO: drones: removed 242235 row versions in 1952 pages
DETAIL: CPU 0.01s/0.02u sec elapsed 0.03 sec.
INFO: index drones_pk now contains 174068 row versions in 721 pages
DETAIL: 107716 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.


As you can see your index contains 174068 active rows and 242235 dead
rows that probably should have been removed a long time ago by
autovacuum, but you seem to have it turned off. It does not take a long
time to vacuum this table (only 0.3 sec) so it is not a high cost, you
should enable autovacuum and let it do the job (note that this doesn't
stop you from manual vacuuming after big updates).


Yes, you're right. I was doing some testing and I neglected to enable 
vacuuming after inserts. But what this shows is that table drones is 
having dead rows, and that table does get updated a lot. However, I 
don't have any performance problems here. The UPDATE takes no more than 
10 seconds even if I update 50k (out of 150k) rows.


I disabled autovacuum because I got a lot of WARNING:  pgstat wait 
timeout and I could see the autovacuum job (pg_stat_activity) running 
during the run of the plpgsql function that handles inserts.


I left the autovacuum off but I do VACUUM after each CSV insert.


good


0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.12u sec elapsed 16.56 sec.
INFO: drones_history: found 0 removable, 16903164 nonremovable row
versions in 129866 out of 195180 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 2.00s/1.42u sec elapsed 49.24 sec.


good


INFO: vacuuming pg_toast.pg_toast_2695510
INFO: index pg_toast_2695510_index now contains 0 row versions in 1
pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: pg_toast_2695510: found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


same as above, no toast


Yes. Just to make things clear, I never update/delete drones_history. I 
just INSERT, and every now and then I'll be doing SELECTs.






realm_51=# select version();
version
-

PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian
4.3.2-1.1) 4.3.2, 32-bit
(1 row)


Mario


ok

Try this :

CLUSTER drones_pkey ON drones;

Then check if your slow query gets a bit faster. If it does, try :

ALTER TABLE drones SET ( fillfactor = 50 );
ALTER INDEX drones_pkey SET ( fillfactor = 50 );
CLUSTER drones_pkey ON drones; (again)

This will make the updates on this table less problematic. VACUUM it
after each mass update.


Is this going to make any difference considering slow insert on 
drones_history? Because INSERTs/UPDATEs on drones tables are fast. The 
only noticable difference is that drones is 150k rows 'large' and 
drones_history has around 25M rows:


realm_51=# select count(*) from drones_history ;
  count
--
 25550475
(1 row)

Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Mario Splivalo

On 11/29/2010 05:53 PM, Pierre C wrote:



Yes, since (sample_id, drone_id) is primary key, postgres created
composite index on those columns. Are you suggesting I add two more
indexes, one for drone_id and one for sample_id?


(sample_id,drone_id) covers sample_id but if you make searches on
drone_id alone it is likely to be very slow since you got a large number
of sample_ids. Postgres can use any column of a multicolumn index but it
is only interesting performance-wise if the cardinality of the first
(ignored) columns is low. If you often make searches on drone_id, create
an index. But this isn't what is slowing your foreign key checks.


Again, you have a point there. When I get to SELECTs to the history 
table I'll be doing most of the filtering on the drone_id (but also on 
sample_id, because I'll seldom drill all the way back in time, I'll be 
interested in just some periods), so I'll take this into consideration.


But, as you've said, that's not what it's slowing my FK checks.




Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.


Is there a way to do so inside plpgsql function?

I can recreate the whole process within psql and then post the explain
analyze, it would just take me some time to do so. I'll post as soon
as I'm done.


Yes, this would be interesting.


So, I did. I run the whole script in psql, and here is the result for 
the INSERT:


realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id, 
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM 
tmp_drones_history;
  QUERY PLAN 


--
 Seq Scan on tmp_drones_history  (cost=0.00..81.60 rows=4160 width=48) 
(actual time=0.008..5.296 rows=5150 loops=1)

 Trigger for constraint drones_history_fk__drones: time=92.948 calls=5150
 Total runtime: 16779.644 ms
(3 rows)


Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
The batch before, I run that one 'the usual way', it inserted 9922 rows, 
and it took 1 minute and 16 seconds.


I did not, however, enclose the process into begin/end.

So, here are results when I, in psql, first issued BEGIN:

realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id, 
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM 
tmp_drones_history;
  QUERY PLAN 


--
 Seq Scan on tmp_drones_history  (cost=0.00..79.56 rows=4056 width=48) 
(actual time=0.008..6.490 rows=5059 loops=1)

 Trigger for constraint drones_history_fk__drones: time=120.224 calls=5059
 Total runtime: 39658.250 ms
(3 rows)

Time: 39658.906 ms



Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Mark Kirkwood

On 30/11/10 05:53, Pierre C wrote:


Yes, since (sample_id, drone_id) is primary key, postgres created 
composite index on those columns. Are you suggesting I add two more 
indexes, one for drone_id and one for sample_id?


(sample_id,drone_id) covers sample_id but if you make searches on 
drone_id alone it is likely to be very slow since you got a large 
number of sample_ids. Postgres can use any column of a multicolumn 
index but it is only interesting performance-wise if the cardinality 
of the first (ignored) columns is low. If you often make searches on 
drone_id, create an index. But this isn't what is slowing your foreign 
key checks.


Exactly, sorry - I was having a brain fade moment about which way your 
foreign key checks were going when I suggested adding those indexes... :-(


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Mario Splivalo

On 11/30/2010 05:26 PM, Mladen Gogala wrote:

At the beginning of the load, you should defer all of the deferrable
constraints, setting constraints deferred and issuing the copy statement
within a transaction block, like this:

scott=# begin; BEGIN
Time: 0.203 ms
scott=# set constraints all deferred;
SET CONSTRAINTS
Time: 0.201 ms
scott=# copy test1 from '/tmp/test1.csv';
COPY 100
Time: 11.939 ms
scott=# commit;
ERROR: insert or update on table test1 violates foreign key
constraint fk_tst1_deptno
DETAIL: Key (col1)=(1) is not present in table dept.


Of course, that will require complete rewrite of your load script,
because the errors will be checked at the commit time and transaction
can either fail as a whole or succeed as a whole. It's all or nothing


Well, it is like that now. First I load the data from the CSV into the 
temporary table (just named temporary, exists on the server). That table 
is usualy aroun 10k rows. Then I call the function which does the job.



situation. How frequently do you see records with an incorrect drone_id?


Seldom.


If that happens only once in a blue moon, you may need no stinkin'
foreign keys in the first place, you may be able
to have a batch job that will flag all the records with an invalid
drone_id instead.


I did have that idea, yes, but still, I'd like to know what is slowing 
postgres down. Because when I look at the disk I/O, it seems very random 
- i get around 800k of disk reads and ocasionaly 1500k of writes (during 
insert into history table).



Furthermore, you can make sure that you have enough shared buffers to
cache the entire drones table. Also, do strace on the postgres
process handling your session and see whether the time is spent writing
to WAL archives. If that is slowing you down, you should consider buying
a SSD or a high end disk drive. I have never had such problem, but you
should also check whether pg_loader can do anything for you.

As far as speed is concerned, inserting with deferred foreign keys is
almost as fast as inserting without foreign keys:

scott=# alter table test1 drop constraint fk_tst1_deptno;
ALTER TABLE
Time: 16.219 ms
scott=# copy test1 from '/tmp/test1.csv';
COPY 100
Time: 10.418 ms

If you take a look at the example above, you will see that inserting
with a deferred FK took 11.939 milliseconds while inserting into the
same table without the FK took 10.418 milliseconds, the difference of
1.5 milliseconds per 100 rows. The timing of 2 seconds per 100
rows looks suspiciously high. Me thinks that your problem is not just
the foreign key, there must be something else devouring the time. You
should have a test instance, compiled with -g option and do profiling.


I'll have to. So far I've been doing this only on that dedicated server. 
I'll try to download the database to my desktop and try the tests there.


Concerning the shared_buffers, it's 256M, and the drones table is just 15M.

I have tried your recommendation and it yielded no difference.

Now I tried removing the constraints from the history table (including 
the PK) and the inserts were fast. After few 'rounds' of inserts I added 
constraints back, and several round after that were fast again. But then 
all the same. Insert of some 11k rows took 4 seconds (with all 
constraints) and now the last one of only 4k rows took one minute. I did 
vacuum after each insert.



Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Pierre C


Now I tried removing the constraints from the history table (including  
the PK) and the inserts were fast. After few 'rounds' of inserts I added  
constraints back, and several round after that were fast again. But then  
all the same. Insert of some 11k rows took 4 seconds (with all  
constraints) and now the last one of only 4k rows took one minute. I did  
vacuum after each insert.



Mario


Hm, so for each line of drones_history you insert, you also update the  
correspoding drones table to reflect the latest data, right ?
How many times is the same row in drones updated ? ie, if you insert N  
rows in drones_nistory, how may drone_id's do you have ?


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-30 Thread Joshua D. Drake
On Sun, 2010-11-28 at 12:46 +0100, Mario Splivalo wrote:
 The database for monitoring certain drone statuses is quite simple:
 

 This is the slow part:
 INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
 drone_temperature, drone_pressure)
 SELECT * FROM tmpUpdate;
 
 For 100 rows this takes around 2 seconds. For 1000 rows this takes 
 around 40 seconds. For 5000 rows this takes around 5 minutes.
 For 50k rows this takes around 30 minutes! Now this is where I start lag 
 because I get new CSV every 10 minutes or so.

Have you considered making the foreign key check deferrable?

JD

-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-29 Thread Mario Splivalo

On 11/28/2010 10:50 PM, Pierre C wrote:



I pasted DDL at the begining of my post.


Ah, sorry, didn't see it ;)


The only indexes tables have are the ones created because of PK
constraints. Table drones has around 100k rows. Table drones_history
has around 30M rows. I'm not sure what additional info you'd want but
I'll be more than happy to provide more relevant information.


Can you post the following :

- pg version
- output of VACCUM ANALYZE VERBOSE for your 2 tables


Here it is:

realm_51=# vacuum analyze verbose drones;
INFO:  vacuuming public.drones
INFO:  scanned index drones_pk to remove 242235 row versions
DETAIL:  CPU 0.02s/0.11u sec elapsed 0.28 sec.
INFO:  drones: removed 242235 row versions in 1952 pages
DETAIL:  CPU 0.01s/0.02u sec elapsed 0.03 sec.
INFO:  index drones_pk now contains 174068 row versions in 721 pages
DETAIL:  107716 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  drones: found 486 removable, 174068 nonremovable row versions 
in 1958 out of 1958 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 64 unused item pointers.
0 pages are entirely empty.
CPU 0.22s/0.90u sec elapsed 22.29 sec.
INFO:  vacuuming pg_toast.pg_toast_2695558
INFO:  index pg_toast_2695558_index now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_2695558: found 0 removable, 0 nonremovable row 
versions in 0 out of 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.drones
INFO:  drones: scanned 1958 of 1958 pages, containing 174068 live rows 
and 0 dead rows; 174068 rows in sample, 174068 estimated total rows

VACUUM
realm_51=# vacuum analyze verbose drones_history;
INFO:  vacuuming public.drones_history
INFO:  index drones_history_pk now contains 25440352 row versions in 
69268 pages

DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.12u sec elapsed 16.56 sec.
INFO:  drones_history: found 0 removable, 16903164 nonremovable row 
versions in 129866 out of 195180 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 2.00s/1.42u sec elapsed 49.24 sec.
INFO:  vacuuming pg_toast.pg_toast_2695510
INFO:  index pg_toast_2695510_index now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_2695510: found 0 removable, 0 nonremovable row 
versions in 0 out of 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.drones_history
INFO:  drones_history: scanned 195180 of 195180 pages, containing 
25440352 live rows and 0 dead rows; 60 rows in sample, 25440352 
estimated total rows

VACUUM
realm_51=# select version();
   version 


-
 PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian 
4.3.2-1.1) 4.3.2, 32-bit

(1 row)


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-29 Thread Mario Splivalo

On 11/29/2010 08:11 AM, Mark Kirkwood wrote:

On 29/11/10 00:46, Mario Splivalo wrote:


This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice,
drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start
lag because I get new CSV every 10 minutes or so.


Have you created indexes on drones_history(sample_id) and
drones_history(drone_id)? That would probably help speed up your INSERT
quite a bit if you have not done so.


Yes, since (sample_id, drone_id) is primary key, postgres created 
composite index on those columns. Are you suggesting I add two more 
indexes, one for drone_id and one for sample_id?



Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.


Is there a way to do so inside plpgsql function?

I can recreate the whole process within psql and then post the explain 
analyze, it would just take me some time to do so. I'll post as soon as 
I'm done.


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-29 Thread Pierre C

realm_51=# vacuum analyze verbose drones;
INFO:  vacuuming public.drones
INFO:  scanned index drones_pk to remove 242235 row versions
DETAIL:  CPU 0.02s/0.11u sec elapsed 0.28 sec.
INFO:  drones: removed 242235 row versions in 1952 pages
DETAIL:  CPU 0.01s/0.02u sec elapsed 0.03 sec.
INFO:  index drones_pk now contains 174068 row versions in 721 pages
DETAIL:  107716 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.


As you can see your index contains 174068 active rows and 242235 dead rows  
that probably should have been removed a long time ago by autovacuum, but  
you seem to have it turned off. It does not take a long time to vacuum  
this table (only 0.3 sec) so it is not a high cost, you should enable  
autovacuum and let it do the job (note that this doesn't stop you from  
manual vacuuming after big updates).



CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  drones: found 486 removable, 174068 nonremovable row versions  
in 1958 out of 1958 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 64 unused item pointers.
0 pages are entirely empty.
CPU 0.22s/0.90u sec elapsed 22.29 sec.


Here, the table itself seems quite normal... strange.


INFO:  vacuuming pg_toast.pg_toast_2695558
INFO:  index pg_toast_2695558_index now contains 0 row versions in 1  
pages

DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_2695558: found 0 removable, 0 nonremovable row  
versions in 0 out of 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


Since you don't have large fields, the toast table is empty...


realm_51=# vacuum analyze verbose drones_history;
INFO:  vacuuming public.drones_history
INFO:  index drones_history_pk now contains 25440352 row versions in  
69268 pages

DETAIL:  0 index row versions were removed.


good


0 index pages have been deleted, 0 are currently reusable.
CPU 0.38s/0.12u sec elapsed 16.56 sec.
INFO:  drones_history: found 0 removable, 16903164 nonremovable row  
versions in 129866 out of 195180 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 2.00s/1.42u sec elapsed 49.24 sec.


good


INFO:  vacuuming pg_toast.pg_toast_2695510
INFO:  index pg_toast_2695510_index now contains 0 row versions in 1  
pages

DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_2695510: found 0 removable, 0 nonremovable row  
versions in 0 out of 0 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.


same as above, no toast



realm_51=# select version();
version  
-
  PostgreSQL 8.4.2 on i686-pc-linux-gnu, compiled by GCC gcc (Debian  
4.3.2-1.1) 4.3.2, 32-bit

(1 row)


Mario


ok

Try this :

CLUSTER drones_pkey ON drones;

Then check if your slow query gets a bit faster. If it does, try :

ALTER TABLE drones SET ( fillfactor = 50 );
ALTER INDEX drones_pkey SET ( fillfactor = 50 );
CLUSTER drones_pkey ON drones; (again)

This will make the updates on this table less problematic. VACUUM it after  
each mass update.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-29 Thread Pierre C


Yes, since (sample_id, drone_id) is primary key, postgres created  
composite index on those columns. Are you suggesting I add two more  
indexes, one for drone_id and one for sample_id?


(sample_id,drone_id) covers sample_id but if you make searches on drone_id  
alone it is likely to be very slow since you got a large number of  
sample_ids. Postgres can use any column of a multicolumn index but it is  
only interesting performance-wise if the cardinality of the first  
(ignored) columns is low. If you often make searches on drone_id, create  
an index. But this isn't what is slowing your foreign key checks.



Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
drone_log_notice, drone_temperature, drone_pressure)
SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.


Is there a way to do so inside plpgsql function?

I can recreate the whole process within psql and then post the explain  
analyze, it would just take me some time to do so. I'll post as soon as  
I'm done.


Yes, this would be interesting.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Mario Splivalo

The database for monitoring certain drone statuses is quite simple:

CREATE TABLE samples (
sample_id integer not null primary key,
sample_timestamp timestamp not null default now()
);

CREATE TABLE drones (
drone_id integer not null primary key,
drone_log_notice character varying,
crone_coordinates point not null,
drone_temperature float,
drone_pressure float
);

CREATE TABLE drones_history (
drone_id integer not null,
sample_id integer not null,
drone_log_notice character varying,
drone_temperature float,
drone_pressure float,
constraint drones_history_pk primary key (drone_id, sample_id),
	constraint drones_history_fk__samples foreign key (sample_id) 
references samples(sample_id),
	constraint drones_history_fk__drones foreign key (drone_id) references 
drones(drone_id)

);

Every ten to twenty minutes I receive CSV file with most of the drones 
statuses. CSV file includes data for new drones, if they're put into 
use. When I receive new data I load whole CSV file to a database, then 
call stored procedure that 'deals' with that data.


So far I have around 6000 samples, around 160k drones and drones_history 
is around 25M rows.


The CSV file contains around 15k-20k of 'rows', mostly data about old 
drones. Every now and then (on every 5th - 10th CSV-insert) there is 
data with around 1000-5000 new drones.


Here is what I do in stored procedure, after i COPYed the data from the 
CSV to temporary.drones table:


First, I create temporary table, inside the procedure, that holds rows 
for the new drones:


CREATE TEMPORARY TABLE tmpNew ON COMMIT DROP AS
SELECT drone_id, log_notice, coord_x, coord_y, temp, press
FROM temp.drones WHERE NOT EXISTS (SELECT 1 FROM public.drones WHERE 
public.drones.drone_id = temporary.drone.drone_id);


This is done in miliseconds, even if the count for the new drones is 
large (i've tested it with 10k new drones although I real-life action 
I'd never get more thatn 5k new drones per CSV).


INSERT INTO public.drones (drone_id, drone_log_notice, coordinates, 
drone_temperature, drone_temperature)
SELECT drone_id, log_notice, point(coord_x, coord_y) as coordinates, 
temp, press FROM tmpNew;
INSERT INTO public.drones_history (sample_id, drone_id, 
drone_log_notice, drone_temperature, drone_pressure)

SELECT a_sample_id, drone_id, log_notice, temp, pressue FROM tmpNew;

This is also done in miliseconds.

Now, I 'update' data for the existing drones, and fill in the history 
table on those drones. First I create temporary table with just the 
changed rows:


CREATE TEMPORARY TABLE tmpUpdate ON COMMIT DROP AS
SELECT a_batch_id, t.drone_id, t.log_notice, t.temp, t.press
   FROM temporary.drones t
   JOIN public.drones p
   ON t.drone_id = p.drone_id
WHERE p.drone_log_notice != t.log_notice OR p.temp != t.temp OR p.press 
!= t.press;


Now, that part is also fast. I usualy have around 100-1000 drones that 
changed 'state', but sometimes I get even half of the drones change 
states (around 50k) and creation of the tmpUpdate takes no more than ten 
to twenty milliseconds.


This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes 
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start lag 
because I get new CSV every 10 minutes or so.


And the last part is to upadte the actual drones table:
UPDATE public.drones p
SET drone_log_notice = t.log_notice, drone_temperature = t.temp, 
drone_pressure = t.press

FROM temporary.drones t
WHERE t.drone_id = p.drone_id
AND (t.log_notice != p.drone_log_notice OR t.temp != p.drone_temperature 
OR p.press != t.drone_pressure);


This is also very fast, even when almost half the table is updated the 
UPDATE takes around 10 seconds. Usualy it's around 30-50 ms.


The machine I'm doing this has 4 GB of RAM, dual-Xeon something (3GHz). 
Two SAS drives in mirror, capable of around 100 MB/s in sequential r/w 
(i know it means nothing, but just to get an idea).


Database is around 2 GB is size (pg_database_size). When I dump/recreate 
the database I can speedup things a bit, but after half day of 
operations the INSERTs are slow again.
When I do dump/restore of the database I get around 40/50 MB/sec 
reding/writing from the disk (COPYing data, PK/FK constraints creation), 
but when that INSERT gets stuck io-wait goes to skies - iostat shows 
that Postgres is mainly reading from the disks, around 800k/sec - 1024k/sec.


I've set shared_buffers to 256M, work_mem to 96M, wal_buffers to 16M and 
checkpoint_segments to 16. I've turned off autovaccum, I do 
analyze/vacuum after each insert-job is done, after TRUNCATEing 
temporary.drones table.


Out of despair I tried to set fsync=off, but that gave me just a small 

Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Pierre C


When I remove foreign constraints (drones_history_fk__samples and  
drones_history_fk__drones) (I leave the primary key on drones_history)  
than that INSERT, even for 50k rows, takes no more than a second.


So, my question is - is there anything I can do to make INSERTS with PK  
faster? Or, since all the reference checking is done inside the  
procedure for loading data, shall I abandon those constraints entirely?


Mario


Maybe... or not. Can you post details about :

- the foreign keys
- the tables that are referred to (including indexes)


CREATE TABLE foo (x INTEGER PRIMARY KEY); INSERT INTO foo SELECT * FROM  
generate_series( 1,10 );

Temps : 766,182 ms
test= VACUUM ANALYZE foo;
Temps : 71,938 ms
test= CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
CREATE TABLE
test= INSERT INTO bar SELECT * FROM generate_series( 1,10 );
Temps : 2834,430 ms

As you can see, 100.000 FK checks take less than 3 seconds on this very  
simple example. There is probably something that needs fixing.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Mario Splivalo

On 11/28/2010 07:56 PM, Pierre C wrote:



When I remove foreign constraints (drones_history_fk__samples and
drones_history_fk__drones) (I leave the primary key on drones_history)
than that INSERT, even for 50k rows, takes no more than a second.

So, my question is - is there anything I can do to make INSERTS with
PK faster? Or, since all the reference checking is done inside the
procedure for loading data, shall I abandon those constraints entirely?

Mario


Maybe... or not. Can you post details about :

- the foreign keys
- the tables that are referred to (including indexes)


I pasted DDL at the begining of my post. The only indexes tables have 
are the ones created because of PK constraints. Table drones has around 
100k rows. Table drones_history has around 30M rows. I'm not sure what 
additional info you'd want but I'll be more than happy to provide more 
relevant information.




CREATE TABLE foo (x INTEGER PRIMARY KEY); I
generate_series( 1,10 );
Temps : 766,182 ms
test= VACUUM ANALYZE foo;
Temps : 71,938 ms
test= CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
CREATE TABLE
test= INSERT INTO bar SELECT * FROM generate_series( 1,10 );
Temps : 2834,430 ms

As you can see, 100.000 FK checks take less than 3 seconds on this very
simple example. There is probably something that needs fixing.



Yes, when the FKyed table is small enough inserts are quite fast. But 
when they grow larger the whole system slows down.


I just repeated your test and I'm getting similar results - on my 
desktop. I'll try to assemble some code to recreate workload and see if 
I'll run into same problems.


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Pierre C



I pasted DDL at the begining of my post.


Ah, sorry, didn't see it ;)

The only indexes tables have are the ones created because of PK  
constraints. Table drones has around 100k rows. Table drones_history has  
around 30M rows. I'm not sure what additional info you'd want but I'll  
be more than happy to provide more relevant information.


Can you post the following :

- pg version
- output of VACCUM ANALYZE VERBOSE for your 2 tables

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Mark Kirkwood

On 29/11/10 00:46, Mario Splivalo wrote:


This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes 
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start 
lag because I get new CSV every 10 minutes or so.


Have you created indexes on drones_history(sample_id) and  
drones_history(drone_id)? That would probably help speed up your INSERT 
quite a bit if you have not done so.


Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id, 
drone_log_notice, drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.

Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance