Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Mark Kirkwood

On 29/11/10 00:46, Mario Splivalo wrote:


This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes 
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start 
lag because I get new CSV every 10 minutes or so.


Have you created indexes on drones_history(sample_id) and  
drones_history(drone_id)? That would probably help speed up your INSERT 
quite a bit if you have not done so.


Also it would be worthwhile for you to post the output of:

EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id, 
drone_log_notice, drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

to the list, so we can see what is taking the time.

Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Full Text index is not using during OR operation

2010-11-28 Thread AI Rumman
explain
SELECT crmentity.crmid, crmentity.setype, crmentity.modifiedtime,
activity.subject,case when ( users.user_name not like '') then
users.user_name else groups.groupname end as user_name, activity.date_start
FROM crmentity INNER JOIN activity ON crmentity.crmid = activity.activityid
and crmentity.deleted = 0
LEFT JOIN activitygrouprelation ON activitygrouprelation.activityid =
crmentity.crmid
LEFT JOIN groups ON groups.groupname = activitygrouprelation.groupname
LEFT join users ON crmentity.smownerid= users.id
WHERE
to_tsvector(' en', for_fts( activity.subject)) @@ to_tsquery(' en',
replace(' Dhaka University of Bangladesh:*', ' ',':* & '))
or
to_tsvector(' en', for_fts( crmentity.description)) @@ to_tsquery(' en',
replace(' Dhaka University of Bangladesh:*', ' ',':* & '))
ORDER BY crmentity.modifiedtime DESC LIMIT 100

  QUERY PLAN

--
 Limit (cost=112724.54..112724.54 rows=1 width=99)
  -> Sort (cost=112724.54..112724.54 rows=1 width=99)
  Sort Key: crmentity.modifiedtime
  -> Nested Loop Left Join (cost=0.00..112724.53 rows=1 width=99)
  -> Nested Loop Left Join (cost=0.00..112724.24 rows=1 width=82)
  -> Nested Loop Left Join (cost=0.00..112723.96 rows=1 width=79)
  -> Nested Loop (cost=0.00..112723.68 rows=1 width=56)
  Join Filter: ((to_tsvector('en'::regconfig,
regexp_replace((activity.subject)::text,
'(&[^;]+;)|(<[^>]+>)|([\\s\\r\\n\\t]+)'::text, ''::text, 'gs'::text)) @@ '''
Dhaka'':* & ''univers'':* & ''bangladesh'':*'::tsquery) OR
(to_tsvector('en'::regconfig, regexp_replace(crmentity.description,
'(&[^;]+;)|(<[^>]+>)|([\\s\\r\\n\\t]+)'::text, ' '::text, 'gs'::text)) @@
''' Dhaka'':* & ''univers'':* & ''bangladesh'':*'::tsquery))
  -> Index Scan using activity_pkey on activity (cost=0.00..10223.89
rows=343070 width=36)
  -> Index Scan using crmentity_pkey on crmentity (cost=0.00..0.27 rows=1
width=151)
  Index Cond: (crmentity.crmid = activity.activityid)
  Filter: (crmentity.deleted = 0)
  -> Index Scan using activitygrouprelation_activityid_idx on
activitygrouprelation (cost=0.00..0.27 rows=1 width=27)
  Index Cond: (activitygrouprelation.activityid = crmentity.crmid)
  -> Index Scan using groups_groupname_idx on groups (cost=0.00..0.27 rows=1
width=26)
  Index Cond: ((groups.groupname)::text =
(activitygrouprelation.groupname)::text)
  -> Index Scan using users_pkey on users (cost=0.00..0.27 rows=1 width=25)
  Index Cond: (crmentity.smownerid = users.id)


The above query are not using fts indexes, even hang the server.

But,


explain
SELECT crmentity.crmid, crmentity.setype, crmentity.modifiedtime,
activity.subject,case when ( users.user_name not like '') then
users.user_name else groups.groupname end as user_name, activity.date_start
FROM crmentity INNER JOIN activity ON crmentity.crmid = activity.activityid
and crmentity.deleted = 0
LEFT JOIN activitygrouprelation ON activitygrouprelation.activityid =
crmentity.crmid
LEFT JOIN groups ON groups.groupname = activitygrouprelation.groupname
LEFT join users ON crmentity.smownerid= users.id
WHERE
to_tsvector(' en', for_fts( activity.subject)) @@ to_tsquery(' en',
replace(' Dhaka University of Bangladesh:*', ' ',':* & '))
ORDER BY crmentity.modifiedtime DESC LIMIT 100

--

 Limit (cost=1.46..1.47 rows=1 width=99) (actual time=0.824..0.824 rows=0
loops=1)
  -> Sort (cost=1.46..1.47 rows=1 width=99) (actual time=0.819..0.819 rows=0
loops=1)
  Sort Key: crmentity.modifiedtime
  Sort Method: quicksort Memory: 17kB
  -> Nested Loop Left Join (cost=0.27..1.45 rows=1 width=99) (actual
time=0.752..0.752 rows=0 loops=1)
  -> Nested Loop Left Join (cost=0.27..1.17 rows=1 width=82) (actual
time=0.750..0.750 rows=0 loops=1)
  -> Nested Loop Left Join (cost=0.27..0.88 rows=1 width=79) (actual
time=0.748..0.748 rows=0 loops=1)
  -> Nested Loop (cost=0.27..0.60 rows=1 width=56) (actual time=0.746..0.746
rows=0 loops=1)
  -> Bitmap Heap Scan on activity (cost=0.27..0.30 rows=1 width=36) (actual
time=0.744..0.744 rows=0 loops=1)
  Recheck Cond: (to_tsvector('en'::regconfig,
regexp_replace((subject)::text,
'(&[^;]+;)|(<[^>]+>)|([\\s\\r\\n\\t]+)'::text, ' '::text,
'gs'::text)) @@ ''' Dhaka'':* & ''univers'':* & ''bangladesh'':*'::tsquery)
  -> Bitmap Index Scan on ftx_en_activity_subject (cost=0.00..0.27 rows=1
width=0) (actual time=0.740..0.740 rows=0 loops=1)
  Index Cond: (to_tsvector('en'::regconfig, regexp_replace((subject)::text,
'(&[^;]+;)|(<[^>]+>)|([\\s\\r\\n\\t]+)'::text, ' '::te
xt, 'gs'::text)) @@ ''' Dhaka'':* & ''univers'':* &
''bangladesh'':*'::tsquery)
  -> Index Scan using crmentity_pkey on crmentity (cost=0.00..0.29 rows=1
width=24) (never executed)
  Index Cond: (crmentity.crmid = activity.activityid)
  Filter: (crmentity.deleted = 0)

Re: [PERFORM] Hi- Sleeptime reduction

2010-11-28 Thread Kevin Grittner
aaliya zarrin  wrote:
 
> I am new to Postgres. I just wanted to know how to change the sleep
> time.
> 
> I want to reduce the sleep time, How much will it affect other
> performance issues if sleep time is reduced.
 
I don't know what sleep time you mean.  It would probably be best if
you started from a description of some specific problem rather than a
hypothetical solution.
 
For general advice on tuning, you could start here:
 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
 
For ideas on how to submit a request for help with a performance
problem, you might want to review this:
 
http://wiki.postgresql.org/wiki/SlowQueryQuestions
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Hi- Sleeptime reduction

2010-11-28 Thread aaliya zarrin
Hi all,

I am new to Postgres. I just wanted to know how to change the sleep time.

I want to reduce the sleep time, How much will it affect other performance
issues if sleep time is reduced.

Plz help. I apologize if I am sending mail to wrong contact. Kindly suggest
the correct contact details if you know.


-- 


-- 
Thanks & Regards,

Aaliya Zarrin
(+91)-9160665888


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Pierre C



I pasted DDL at the begining of my post.


Ah, sorry, didn't see it ;)

The only indexes tables have are the ones created because of PK  
constraints. Table drones has around 100k rows. Table drones_history has  
around 30M rows. I'm not sure what additional info you'd want but I'll  
be more than happy to provide more relevant information.


Can you post the following :

- pg version
- output of VACCUM ANALYZE VERBOSE for your 2 tables

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Mario Splivalo

On 11/28/2010 07:56 PM, Pierre C wrote:



When I remove foreign constraints (drones_history_fk__samples and
drones_history_fk__drones) (I leave the primary key on drones_history)
than that INSERT, even for 50k rows, takes no more than a second.

So, my question is - is there anything I can do to make INSERTS with
PK faster? Or, since all the reference checking is done inside the
procedure for loading data, shall I abandon those constraints entirely?

Mario


Maybe... or not. Can you post details about :

- the foreign keys
- the tables that are referred to (including indexes)


I pasted DDL at the begining of my post. The only indexes tables have 
are the ones created because of PK constraints. Table drones has around 
100k rows. Table drones_history has around 30M rows. I'm not sure what 
additional info you'd want but I'll be more than happy to provide more 
relevant information.




CREATE TABLE foo (x INTEGER PRIMARY KEY); I
generate_series( 1,10 );
Temps : 766,182 ms
test=> VACUUM ANALYZE foo;
Temps : 71,938 ms
test=> CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
CREATE TABLE
test=> INSERT INTO bar SELECT * FROM generate_series( 1,10 );
Temps : 2834,430 ms

As you can see, 100.000 FK checks take less than 3 seconds on this very
simple example. There is probably something that needs fixing.



Yes, when the FKyed table is small enough inserts are quite fast. But 
when they grow larger the whole system slows down.


I just repeated your test and I'm getting similar results - on my 
desktop. I'll try to assemble some code to recreate workload and see if 
I'll run into same problems.


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Pierre C


When I remove foreign constraints (drones_history_fk__samples and  
drones_history_fk__drones) (I leave the primary key on drones_history)  
than that INSERT, even for 50k rows, takes no more than a second.


So, my question is - is there anything I can do to make INSERTS with PK  
faster? Or, since all the reference checking is done inside the  
procedure for loading data, shall I abandon those constraints entirely?


Mario


Maybe... or not. Can you post details about :

- the foreign keys
- the tables that are referred to (including indexes)


CREATE TABLE foo (x INTEGER PRIMARY KEY); INSERT INTO foo SELECT * FROM  
generate_series( 1,10 );

Temps : 766,182 ms
test=> VACUUM ANALYZE foo;
Temps : 71,938 ms
test=> CREATE TABLE bar ( x INTEGER REFERENCES foo(x) );
CREATE TABLE
test=> INSERT INTO bar SELECT * FROM generate_series( 1,10 );
Temps : 2834,430 ms

As you can see, 100.000 FK checks take less than 3 seconds on this very  
simple example. There is probably something that needs fixing.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SELECT INTO large FKyed table is slow

2010-11-28 Thread Mario Splivalo

The database for monitoring certain drone statuses is quite simple:

CREATE TABLE samples (
sample_id integer not null primary key,
sample_timestamp timestamp not null default now()
);

CREATE TABLE drones (
drone_id integer not null primary key,
drone_log_notice character varying,
crone_coordinates point not null,
drone_temperature float,
drone_pressure float
);

CREATE TABLE drones_history (
drone_id integer not null,
sample_id integer not null,
drone_log_notice character varying,
drone_temperature float,
drone_pressure float,
constraint drones_history_pk primary key (drone_id, sample_id),
	constraint drones_history_fk__samples foreign key (sample_id) 
references samples(sample_id),
	constraint drones_history_fk__drones foreign key (drone_id) references 
drones(drone_id)

);

Every ten to twenty minutes I receive CSV file with most of the drones 
statuses. CSV file includes data for new drones, if they're put into 
use. When I receive new data I load whole CSV file to a database, then 
call stored procedure that 'deals' with that data.


So far I have around 6000 samples, around 160k drones and drones_history 
is around 25M rows.


The CSV file contains around 15k-20k of 'rows', mostly data about old 
drones. Every now and then (on every 5th - 10th CSV-insert) there is 
data with around 1000-5000 new drones.


Here is what I do in stored procedure, after i COPYed the data from the 
CSV to temporary.drones table:


First, I create temporary table, inside the procedure, that holds rows 
for the new drones:


CREATE TEMPORARY TABLE tmpNew ON COMMIT DROP AS
SELECT drone_id, log_notice, coord_x, coord_y, temp, press
FROM temp.drones WHERE NOT EXISTS (SELECT 1 FROM public.drones WHERE 
public.drones.drone_id = temporary.drone.drone_id);


This is done in miliseconds, even if the count for the new drones is 
large (i've tested it with 10k new drones although I real-life action 
I'd never get more thatn 5k new drones per CSV).


INSERT INTO public.drones (drone_id, drone_log_notice, coordinates, 
drone_temperature, drone_temperature)
SELECT drone_id, log_notice, point(coord_x, coord_y) as coordinates, 
temp, press FROM tmpNew;
INSERT INTO public.drones_history (sample_id, drone_id, 
drone_log_notice, drone_temperature, drone_pressure)

SELECT a_sample_id, drone_id, log_notice, temp, pressue FROM tmpNew;

This is also done in miliseconds.

Now, I 'update' data for the existing drones, and fill in the history 
table on those drones. First I create temporary table with just the 
changed rows:


CREATE TEMPORARY TABLE tmpUpdate ON COMMIT DROP AS
SELECT a_batch_id, t.drone_id, t.log_notice, t.temp, t.press
   FROM temporary.drones t
   JOIN public.drones p
   ON t.drone_id = p.drone_id
WHERE p.drone_log_notice != t.log_notice OR p.temp != t.temp OR p.press 
!= t.press;


Now, that part is also fast. I usualy have around 100-1000 drones that 
changed 'state', but sometimes I get even half of the drones change 
states (around 50k) and creation of the tmpUpdate takes no more than ten 
to twenty milliseconds.


This is the slow part:
INSERT INTO drones_history (sample_id, drone_id, drone_log_notice, 
drone_temperature, drone_pressure)

SELECT * FROM tmpUpdate;

For 100 rows this takes around 2 seconds. For 1000 rows this takes 
around 40 seconds. For 5000 rows this takes around 5 minutes.
For 50k rows this takes around 30 minutes! Now this is where I start lag 
because I get new CSV every 10 minutes or so.


And the last part is to upadte the actual drones table:
UPDATE public.drones p
SET drone_log_notice = t.log_notice, drone_temperature = t.temp, 
drone_pressure = t.press

FROM temporary.drones t
WHERE t.drone_id = p.drone_id
AND (t.log_notice != p.drone_log_notice OR t.temp != p.drone_temperature 
OR p.press != t.drone_pressure);


This is also very fast, even when almost half the table is updated the 
UPDATE takes around 10 seconds. Usualy it's around 30-50 ms.


The machine I'm doing this has 4 GB of RAM, dual-Xeon something (3GHz). 
Two SAS drives in mirror, capable of around 100 MB/s in sequential r/w 
(i know it means nothing, but just to get an idea).


Database is around 2 GB is size (pg_database_size). When I dump/recreate 
the database I can speedup things a bit, but after half day of 
operations the INSERTs are slow again.
When I do dump/restore of the database I get around 40/50 MB/sec 
reding/writing from the disk (COPYing data, PK/FK constraints creation), 
but when that INSERT gets stuck io-wait goes to skies - iostat shows 
that Postgres is mainly reading from the disks, around 800k/sec - 1024k/sec.


I've set shared_buffers to 256M, work_mem to 96M, wal_buffers to 16M and 
checkpoint_segments to 16. I've turned off autovaccum, I do 
analyze/vacuum after each insert-job is done, after TRUNCATEing 
temporary.drones table.


Out of despair I tried to set fsync=off, but that gave me just a small 
perfo

[PERFORM] Simple database, multiple instances?

2010-11-28 Thread Mario Splivalo

I have simple database schema, containing just three tables:

samples, drones, drones_history.

Now, those tables hold data for the drones for a simulation. Each 
simulation dataset will grow to around 10 GB in around 6 months.


Since the data is not related in any way I was thinking in separating 
each simulation into it's own database. That way it would be much easier 
for me to, at later date, move some of the databases to other servers 
(when dataset grows beyond the original server storage capacity limit).


But. At this time I have around 600 simulations, that would mean 
creating 600 databases, and in future there could very well be around 
5000 simulations. Is postgres going to have 'issues' with that large 
number of databases?


Or do I model my system in a way that each database holds around 100 
simulations?


Mario

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance