Re: [PERFORM] serious problems with vacuuming databases

2006-04-10 Thread Ahmad Fajar
Hi Tomas,

Tomas wrote:
We've decided to remove unneeded 'old' data, which means removing about
99.999% of rows from tables A, C and D (about 2 GB of data). At the
beginning, the B table (containing aggregated from A, C and D) was emptied
(dropped and created) and filled in with current data. Then, before the
deletion the data from tables A, C, D were backed up using another tables
(say A_old, C_old, D_old) filled in using
.
1) drop, create and fill table B (aggregated data from A, C, D)
2) copy 'old' data from A, C and D to A_old, C_old a D_old
3) delete old data from A, C, D
4) dump data from A_old, C_old and D_old
5) truncate tables A, C, D
6) vacuum full analyze tables A, C, D, A_old, C_old and D_old


I think you do some difficult database maintainance. Why you do that, if you
just want to have some small piece of datas from your tables. Why don't you
try something like:
1. create table A with no index (don't fill data to this table), 
2. create table A_week_year inherit table A, with index you want, and some
condition for insertion. (eg: table A1 you used for 1 week data of a year
and so on..)
3. do this step for table B, C and D
4. if you have relation, make the relation to inherit table (optional).

I think you should read the postgresql help, for more information about
table inheritance.

The impact is, you might have much table. But each table will only have
small piece of datas, example: just for one week. And you don't have to do a
difficult database maintainance like you have done. You just need to create
tables for every week of data, do vacuum/analyze and regular backup.


Best regards,
ahmad fajar,



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Takes too long to fetch the data from database

2006-04-10 Thread soni de
Hello,

I have difficulty in fetching the records from the database.
Database table contains more than 1 GB data.
For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance.
please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time.


Re: [PERFORM]

2006-04-10 Thread Doron Baranes
Hi,

I Attached here a file with details about the tables, the queries and
the 
Explain analyze plans.
Hope this can be helpful to analyze my problem

10x
Doron

-Original Message-
From: Ragnar [mailto:[EMAIL PROTECTED] 
Sent: Sunday, April 09, 2006 2:37 PM
To: Doron Baranes
Subject: RE: [PERFORM]

On sun, 2006-04-09 at 14:11 +0200, Doron Baranes wrote:

Please reply to the list, not to me directly. this way
others can help you too.

 I did vacuum database analyze a few days ago.

yes, I saw that in your original post. I mentioned
VACUUM FULL ANALYZE , not just VACUUM ANALYZE

 I'll attached a few explain plans.

[explain plans deleted]

These are useless. you must show us the output of 
EXPLAIN ANALYZE. these are output of EXPLAIN.
A plan is not much use without seeing the query itself.

you still have not answered the question about
what indexes you have.

gnari


TABLES


Table log.msg_info
   Column   |Type |Modifiers
+-+--
 msgid  | bigint  | not null default 
nextval('log.msg_info_msgid_seq'::text)
 sender | character varying(255)  |
 subject| text|
 size   | bigint  |
 entry_time | timestamp without time zone | default now()
 source_ip  | cidr|
 origin | smallint|
Indexes:
msg_info_pkey primary key, btree (msgid)
ddindx btree (date(entry_time))
msg_info_entry_time btree (entry_time)
msg_info_sender_index btree (sender)
msg_info_size btree (size)
msg_info_subject btree (subject)

 Table log.msg_fate
   Column|  Type  |   Modifiers
-++
 msgid   | bigint | not null default 
nextval('log.msg_fate_msgid_seq'::text)
 grp_fate_id | bigint | not null default 
nextval('log.msg_fate_grp_fate_id_seq'::text)
 modid   | integer|
 description | character varying(255) |
 rule_origin | bigint |
 action  | smallint   |
 ruleid  | integer|
Indexes:
msg_fate_pkey primary key, btree (grp_fate_id)
msg_fate_action btree (action)
msg_fate_description btree (description)
msg_fate_modid btree (modid)
msg_fate_msgid btree (msgid)
Foreign-key constraints:
msgid_fkey FOREIGN KEY (msgid) REFERENCES log.msg_info(msgid) ON UPDATE 
CASCADE ON DELETE CASCADE

  Table log.msg_fate_recipients
   Column|Type |   Modifiers
-+-+---
 grp_fate_id | bigint  |
 recipient   | character varying(255)  |
 update_at   | timestamp without time zone | default now()
 last_action | integer |
 zone_id | integer |
 direction   | smallint| default 7
Indexes:
msg_fate_recipients_grp_fate_id btree (grp_fate_id)
msg_fate_recipients_last_action_idx btree (last_action)
msg_fate_recipients_recipient_idx btree (recipient)
msg_fate_recipients_update_at btree (update_at)
msg_fate_recipients_zone_id btree (zone_id)
Triggers:
stats_for_domain AFTER INSERT ON log.msg_fate_recipients FOR EACH ROW 
EXECUTE PROCEDURE log.collect_stats_for_domain()
stats_for_object AFTER INSERT ON log.msg_fate_recipients FOR EACH ROW 
EXECUTE PROCEDURE log.collect_stats_for_object()
update_timestamp_last_action BEFORE UPDATE ON log.msg_fate_recipients FOR 
EACH ROW EXECUTE PROCEDURE log.recipients_status_changed_update()

  Table pineapp.zones
  Column   |  Type  |  Modifiers
---++-
 zone_id   | integer| not null default 
nextval('pineapp.zones_zone_id_seq'::text)
 zone_name | character varying(20)  |
 zone_desc | character varying(255) |
 zone_type | smallint   |
Indexes:
zones_pkey primary key, btree (zone_id)
zones_zone_id btree (zone_id)

QUERIES
***


1)
explain analyze SELECT date_trunc('hour'::text, i.entry_time) AS datetime,
COUNT(fr.grp_fate_id) ,
SUM(i.size)
FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr
WHERE i.origin = 1
AND i.msgid=f.msgid
AND i.entry_time  '2006-01-25'
AND f.grp_fate_id=fr.grp_fate_id
GROUP BY datetime
order by datetime;

 QUERY 
PLAN  

[PERFORM] Restore performance?

2006-04-10 Thread Jesper Krogh
Hi 

I'm currently upgrading a Posgresql 7.3.2 database to a
8.1.something-good

I'd run pg_dump | gzip  sqldump.gz  on the old system. That took about
30 hours and gave me an 90GB zipped file. Running 
cat sqldump.gz | gunzip | psql 
into the 8.1 database seems to take about the same time. Are there 
any tricks I can use to speed this dump+restore process up? 

The database contains quite alot of BLOB, thus the size. 

Jesper
-- 
./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED]



---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Dump restore performance 7.3 - 8.1

2006-04-10 Thread Jesper Krogh

Hi

I'm currently upgrading a Posgresql 7.3.2 database to a
8.1.something-good

I'd run pg_dump | gzip  sqldump.gz  on the old system. That took about
30 hours and gave me an 90GB zipped file. Running
cat sqldump.gz | gunzip | psql
into the 8.1 database seems to take about the same time. Are there
any tricks I can use to speed this dump+restore process up?

Neither disk-io (viewed using vmstat 1) or cpu (viewed using top) seems to
be the bottleneck.

The database contains quite alot of BLOB's, thus the size.

Jesper
-- 
Jesper Krogh


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Restore performance?

2006-04-10 Thread Andreas Pflug

Jesper Krogh wrote:
Hi 


I'm currently upgrading a Posgresql 7.3.2 database to a
8.1.something-good

I'd run pg_dump | gzip  sqldump.gz  on the old system. That took about
30 hours and gave me an 90GB zipped file. Running 
cat sqldump.gz | gunzip | psql 
into the 8.1 database seems to take about the same time. Are there 
any tricks I can use to speed this dump+restore process up? 


If you can have both database systems up at the same time, you could 
pg_dump | psql.


Regards,
Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Restore performance?

2006-04-10 Thread Jesper Krogh
 If they both took the same amount of time, then you are almost certainly
 bottlenecked on gzip.

 Try a faster CPU or use gzip -fast.

gzip does not seem to be the bottleneck, on restore is psql the nr. 1
consumer on cpu-time.

Jesper
Sorry for the double post.
-- 
Jesper Krogh


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Restore performance?

2006-04-10 Thread Marcin Mańk
 I'd run pg_dump | gzip  sqldump.gz  on the old system. That took about
 30 hours and gave me an 90GB zipped file. Running 
 cat sqldump.gz | gunzip | psql 
 into the 8.1 database seems to take about the same time. Are there 
 any tricks I can use to speed this dump+restore process up? 
 
 The database contains quite alot of BLOB, thus the size. 

You could try slony - it can do almost-zero-downtime upgrades.

Greetings
Marcin Mank

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] OT: Data structure design question: How do they count

2006-04-10 Thread Richard Huxton

Brendan Duddridge wrote:


Now, initially I thought they would just pre-compute these counts, but 
the problem is, when you click on any of the above attribute values, 
they reduce the remaining possible set of matching products (and set of 
possible remaining attributes and attribute values) by the amount 
displayed next to the attribute value selected. You can click on any 
combination of attribute values to filter down the remaining set of 
matching products, so there's a large combination of paths you can take 
to arrive at a set of products you might be interested in.


Do you think they are pre-computed? Or do you think they might use a 
query similar to the following?:


Pre-computed almost certainly, but at what level of granularity? And 
with application-level caching?



select pav.attribute_value_id, count(p.product_id)
from product_attribute_value pav,
 attribute a,
 product p
where a.attribute_id in (some set of attribute ids) and
pav.product_id = p.product_id and
pav.attribute_id = a.attribute_id and p.product_id in
(select product_id
 from category_product
 where category_id = some category id) and
p.is_active = 'true'
group by pav.attribute_value_id;

It would seem to me that although the above query suggests a normalized 
database structure, that joining with 3 tables plus a 4th table in the 
sub-query with an IN qualifier and grouping to get the product counts 
would take a VERY long time, especially on a possible result set of 
1,260,658 products.


Hmm - I'm not sure I'd say this was necessarily normalised. In the 
example you gave there were three definite types of attribute:

 1. Price range ( 20, 20-50, ...)
 2. Product type (lighting, rugs, ...)
 3. Store (art.com, homeannex, ...)
Your example discards this type information.

I'm also not sure it lets store A sell widgets for 19.99 and B for 25.99

So - let's look at how we might break this down into simple relations:
 product_types (product_id, prod_type, prod_subtype)
 product_availability (product_id, store_id, price_range)
and so on for each set of parameters.

Then, if PG isn't calculating fast enough I'd be tempted to throw in a 
summary table:
 product_counts(store_id, price_range, prod_type, prod_subtype, ..., 
num_products)

Then total over this for the top-level queries.

I'd also cache common top-level queries at the applicaton level anyway.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-10 Thread Simon Riggs
On Fri, 2006-04-07 at 19:05 -0400, Tom Lane wrote:

 It's plausible though that we are seeing contention across members of
 the LWLock array, with the semop storm just being a higher-level symptom
 of the real hardware-level problem.  You might try increasing
 LWLOCK_PADDED_SIZE to 64 or even 128, see
 src/backend/storage/lmgr/lwlock.c (this is something that does exist in
 8.1, so it'd be easy to try).

pSeries cache lines are 128 bytes wide, so I'd go straight to 128.

If you're renting all 8 CPUs, I'd drop to 4 and try that instead. With 8
CPUs the contention will vary according to what each CPU is doing at any
one time - when they all hit the contention spot, things will get worse.

The pSeries has good CPUs and great caching, so I'd expect contention to
be somewhat more apparent as a bottleneck.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-10 Thread Gavin Hamill

Simon Riggs wrote:


pSeries cache lines are 128 bytes wide, so I'd go straight to 128.


Hello :)

OK, that line of code is:

#define LWLOCK_PADDED_SIZE  (sizeof(LWLock) = 16 ? 16 : 32)

What should I change this to? I don't understand the syntax of the = 16 
? : stuff...


would a simple  #define LWLOCK_PADDED_SIZE  128 be sufficient?


If you're renting all 8 CPUs, I'd drop to 4 and try that instead. With 8
CPUs the contention will vary according to what each CPU is doing at any
one time - when they all hit the contention spot, things will get worse.

 

We have a physical machine installed in our rack at the data centre, 
rather than renting a virtual partition of a real machine... I'm not 
sure how to enable/disable CPUs even with the help of 'smitty' :)



The pSeries has good CPUs and great caching, so I'd expect contention to
be somewhat more apparent as a bottleneck.

 


Yep, I expected 32MB of 'L3' cache would yield impressive results :)

Cheers,
Gavin.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM]

2006-04-10 Thread Ragnar
On mán, 2006-04-10 at 10:30 +0200, Doron Baranes wrote:

 I Attached here a file with details about the tables, the queries and
 the 
 Explain analyze plans.
 Hope this can be helpful to analyze my problem

first query:

 explain analyze SELECT date_trunc('hour'::text, 
 i.entry_time) AS datetime,
 COUNT(fr.grp_fate_id) ,
 SUM(i.size)
 FROM log.msg_info as i,log.msg_fate as f, 
 log.msg_fate_recipients as fr
 WHERE i.origin = 1
 AND i.msgid=f.msgid
 AND i.entry_time  '2006-01-25'
 AND f.grp_fate_id=fr.grp_fate_id
 GROUP BY datetime
 order by datetime;

if i.origin has high selectivity (if very
few rows in msg_info have origin=1 in this
case), an index on msg_info(orgin) can help.
unfortunately, as you are using 7.4 and this
is a smallint column, you would have to change 
the query slightly to make use of that:
  WHERE i.origin = 1::smallint
if more than a few % or the rows have this value,
then this will not help 

the index on msg_info(entry_time) is unlikely
to be used, because a simple '' comparison
has little selectivity. try to add an upper limit
to the query to make it easier for the planner
so see that few rows would be returned (if that is 
the case)
for example:
  AND i.entry_time BETWEEN '2006-01-25'
   AND '2006-05-01'
this might also improve the estimated number
of groups on datetime (notice: estimated rows=1485233,
real=623), although I am not sure if that will help you

I do now know how good the planner is with dealing
with the date_trunc('hour'::text, i.entry_time),
so possibly you could get some improvement with
an indexed entry_hour column populated with trigger
or by your application, and change your query to:

explain analyze SELECT i.entry_hour,
COUNT(fr.grp_fate_id) ,
SUM(i.size)
FROM log.msg_info as i,log.msg_fate as f, log.msg_fate_recipients as fr
WHERE i.origin = 1
AND i.msgid=f.msgid
AND i.entry_hour BETWEEN '2006-01-25:00:00'
 AND '2006-05-01:00:00'
AND f.grp_fate_id=fr.grp_fate_id
GROUP BY entry_hour
order by entry_hour;

(adjust the upper limit to your reality)

do these suggestions help at all?

gnari



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Restore performance?

2006-04-10 Thread Tom Lane
Jesper Krogh [EMAIL PROTECTED] writes:
 gzip does not seem to be the bottleneck, on restore is psql the nr. 1
 consumer on cpu-time.

Hm.  We've seen some situations where readline mistakenly decides that
the input is interactive and wastes lots of cycles doing useless
processing (like keeping history).  Try psql -n and see if that helps.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-10 Thread Tom Lane
Gavin Hamill [EMAIL PROTECTED] writes:
 would a simple  #define LWLOCK_PADDED_SIZE  128 be sufficient?

Yeah, that's fine.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Restore performance?

2006-04-10 Thread Rajesh Kumar Mallah
On 4/10/06, Jesper Krogh [EMAIL PROTECTED] wrote:
HiI'm currently upgrading a Posgresql 7.3.2 database to a8.1.something-goodI'd run pg_dump | gzip  sqldump.gzon the old system. That took about30 hours and gave me an 90GB zipped file. Running
cat sqldump.gz | gunzip | psqlinto the 8.1 database seems to take about the same time. Are thereany tricks I can use to speed this dump+restore process up?was the last restore successfull ? 
if so why do you want to repeat ?some tips1. run new version of postgres in a different port and pipe pg_dump to psqlthis may save the CPU time of compression , there is no need for a temporary
dump file.pg_dump | /path/to/psql813 -p 54XX newdb2. use new version of pg_dump to dump the old database as new version  is supposed to be wiser.3. make sure you are trapping the restore errors properly
psql newdb 21 | cat | tee err works for me.The database contains quite alot of BLOB, thus the size.
Jesper--./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: [EMAIL PROTECTED]---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Restore performance?

2006-04-10 Thread Rajesh Kumar Mallah
sorry for the post , i didn' saw the other replies only after posting.On 4/10/06, Rajesh Kumar Mallah [EMAIL PROTECTED]
 wrote:
On 4/10/06, Jesper Krogh [EMAIL PROTECTED]
 wrote:
HiI'm currently upgrading a Posgresql 7.3.2 database to a8.1.something-goodI'd run pg_dump | gzip  sqldump.gzon the old system. That took about30 hours and gave me an 90GB zipped file. Running
cat sqldump.gz | gunzip | psqlinto the 8.1 database seems to take about the same time. Are thereany tricks I can use to speed this dump+restore process up?
was the last restore successfull ? 
if so why do you want to repeat ?some tips1. run new version of postgres in a different port and pipe pg_dump to psqlthis may save the CPU time of compression , there is no need for a temporary

dump file.pg_dump | /path/to/psql813 -p 54XX newdb2. use new version of pg_dump to dump the old database as new version  is supposed to be wiser.3. make sure you are trapping the restore errors properly
psql newdb 21 | cat | tee err works for me.
The database contains quite alot of BLOB, thus the size.
Jesper--./Jesper Krogh, [EMAIL PROTECTED], Jabber ID: 
[EMAIL PROTECTED]---(end of broadcast)---
TIP 6: explain analyze is your friend




Re: [PERFORM] Takes too long to fetch the data from database

2006-04-10 Thread Rajesh Kumar Mallah
what is the query ?use LIMIT or a restricting where clause.regdsmallah.On 4/10/06, soni de 
[EMAIL PROTECTED] wrote:Hello,

I have difficulty in fetching the records from the database.
Database table contains more than 1 GB data.
For fetching the records it is taking more the 1 hour and that's why it is slowing down the performance.
please provide some help regarding improving the performance and how do I run query so that records will be fetched in a less time.




Re: [PERFORM] Restore performance?

2006-04-10 Thread Rajesh Kumar Mallah
4. fsync can also be turned off while loading huge dataset , but seek others comments too (as study docs) as i am not sure about the reliability. i think it can make a lot of difference.
On 4/10/06, Jesper Krogh [EMAIL PROTECTED] wrote:
Rajesh Kumar Mallah wrote: I'd run pg_dump | gzip  sqldump.gzon the old system. That took about 30 hours and gave me an 90GB zipped file. Running cat sqldump.gz | gunzip | psql
 into the 8.1 database seems to take about the same time. Are there any tricks I can use to speed this dump+restore process up? was the last restore successfull ? if so why do you want to repeat ?
about the same time == Estimated guess from restoring a few tablesI was running a testrun, without disabling updates to the productiondatabase, the real run is scheduled for easter where there hopefully is
no users on the system. So I need to repeat, I'm just trying to get afeelingabout how long time I need to allocate for the operation. 1. run new version of postgres in a different port and pipe pg_dump to psql
 this may save the CPU time of compression , there is no need for a temporary dump file. pg_dump | /path/to/psql813-p 54XX newdbI'll do that. It is a completely different machine anyway.
 2. use new version of pg_dump to dump the old database as new version is supposed to be wiser.Check. 3. make sure you are trapping the restore errors properly psql newdb 21 | cat | tee err works for me.
Thats noted.--Jesper Krogh, [EMAIL PROTECTED]


Re: [PERFORM] Restore performance?

2006-04-10 Thread Alvaro Herrera
Rajesh Kumar Mallah wrote:
 4. fsync can also be turned off while loading huge dataset ,
 but seek others comments  too (as study docs) as i am not sure about the
 reliability. i think it can make a lot of difference.

Also be sure to increase maintenance_work_mem so that index creation
goes faster.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Better index stategy for many fields with few values

2006-04-10 Thread Oscar Picasso
Hi,I want to optimize something like this.- My items table:code int -- can take one of 100 valuesproperty varchar(250) -- can take one of 5000 valuesparam01 char(10) -- can take one of 10 valuesparam02 char(10) -- can take one of 10 values...[ 20 similar columns }...parama20 char(10) -- can take one of 10 values- The kind of query I want to optimize:select * from itemswhere code betwwen 5 and 22and param01 = 'P'and param02 = 'G'...[ all the 20 paramXX columns are used in the query}...and param20 = 'C';How can I optimize this kind of query? I was thinking about using a multicolumns index, but I have read that we should limit multicolumns indice to at most 2 or 3 columns. If that's true then 22 columns
 for a multicolumn incdex seems way too much. Or maybe it is workable as every column uses only a very limited set of values?I was also thinking about about using a functional index. What do you think would be the best solution in such a case?Thanks.Oscar
		Blab-away for as little as 1¢/min. Make  PC-to-Phone Calls using Yahoo! Messenger with Voice.

Re: [PERFORM] Better index stategy for many fields with few values

2006-04-10 Thread PFC



- My items table:
code int  -- can take one of 100 values
property varchar(250) -- can take one of 5000 values
param01 char(10)  -- can take one of 10 values
param02 char(10)  -- can take one of 10 values
...
[ 20 similar columns }
...
parama20 char(10) -- can take one of 10 values


	Instead of 20 columns, you could instead use a param field containing  
an array of 20 TEXT fields.
	Then create a simple index on (code, param) and SELECT WHERE code BETWEEN  
... AND param = '{P,G,,C}'


	If you don't want to modify your structure, you can create a functional  
index on an array {param1...param20}, but your queries will be a bit  
uglier.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Takes too long to fetch the data from database

2006-04-10 Thread Joshua D. Drake

Rajesh Kumar Mallah wrote:


what is the query ?
use LIMIT or a restricting where clause.


You could also use a cursor.

Joshua D. Drake



regds
mallah.

On 4/10/06, *soni de*  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

Hello,
 
I have difficulty in fetching the records from the database.

Database table contains more than 1 GB data.
For fetching the records it is taking more the 1 hour and that's why
it is slowing down the performance.
please provide some help regarding improving the performance and how
do I run query so that records will be fetched in a less time.





--

=== The PostgreSQL Company: Command Prompt, Inc. ===
  Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
  Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Restore performance?

2006-04-10 Thread Vivek Khera


On Apr 10, 2006, at 3:55 AM, Jesper Krogh wrote:

I'd run pg_dump | gzip  sqldump.gz  on the old system. That took  
about

30 hours and gave me an 90GB zipped file. Running
cat sqldump.gz | gunzip | psql
into the 8.1 database seems to take about the same time. Are there
any tricks I can use to speed this dump+restore process up?

The database contains quite alot of BLOB, thus the size.


Well, your pg_dump command lost your BLOBs since the plain text  
format doesn't support them.


But once you use the -Fc format on your dump and enable blob backups,  
you can speed up reloads by increasing your checkpoint segments to a  
big number like 256 and the checkpoint timeout to something like 10  
minutes.  All other normal tuning parameters should be what you plan  
to use for your normal operations, too.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg 8.1.3, AIX, huge box, painfully slow.

2006-04-10 Thread Brad Nicholson
Tom Lane wrote:

 This is unfortunately not going to help you as far as getting that
 machine into production now (unless you're brave enough to run CVS tip
 as production, which I certainly am not).  I'm afraid you're most likely
 going to have to ship that pSeries back at the end of the month, but
 while you've got it it'd be awfully nice if we could use it as a testbed

We have PSeries boxes here that won't be going away anytime soon.  If
there are any specific test cases that need to run, I should be able to
find the time to do it.

-- 
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] OT: Data structure design question: How do they count so fast?

2006-04-10 Thread Brendan Duddridge

Hi Richard (and anyone else who want's to comment!),

I'm not sure it will really work pre-computed. At least not in an  
obvious way (for me! :-)) It's fine to display a pre-computed list of  
product counts for the initial set of attribute and attribute values,  
but we need to be able to display the counts of products for any  
combination of selected attribute values.


Once an attribute value is picked that reduces the set of products to  
a small enough set, the queries are fast, so, perhaps we just need to  
pre-compute the counts for combinations of attribute values that lead  
to a high count of products. Hence, our thought on building a  
decision-tree type data structure. This would store the various  
combinations of attributes and attribute values, along with the  
product count, along with a list of attributes and attribute values  
that are applicable for the current set of selected attribute values.  
This sounds like it could get rather complicated, so we were hoping  
someone might have an idea on a much simpler solution.


Thanks,


Brendan Duddridge | CTO | 403-277-5591 x24 |  [EMAIL PROTECTED]

ClickSpace Interactive Inc.
Suite L100, 239 - 10th Ave. SE
Calgary, AB  T2G 0V9

http://www.clickspace.com

On Apr 10, 2006, at 3:23 AM, Richard Huxton wrote:


Brendan Duddridge wrote:
Now, initially I thought they would just pre-compute these counts,  
but the problem is, when you click on any of the above attribute  
values, they reduce the remaining possible set of matching  
products (and set of possible remaining attributes and attribute  
values) by the amount displayed next to the attribute value  
selected. You can click on any combination of attribute values to  
filter down the remaining set of matching products, so there's a  
large combination of paths you can take to arrive at a set of  
products you might be interested in.
Do you think they are pre-computed? Or do you think they might use  
a query similar to the following?:


Pre-computed almost certainly, but at what level of granularity?  
And with application-level caching?



select pav.attribute_value_id, count(p.product_id)
from product_attribute_value pav,
 attribute a,
 product p
where a.attribute_id in (some set of attribute ids) and
pav.product_id = p.product_id and
pav.attribute_id = a.attribute_id and p.product_id in
(select product_id
 from category_product
 where category_id = some category id) and
p.is_active = 'true'
group by pav.attribute_value_id;
It would seem to me that although the above query suggests a  
normalized database structure, that joining with 3 tables plus a  
4th table in the sub-query with an IN qualifier and grouping to  
get the product counts would take a VERY long time, especially on  
a possible result set of 1,260,658 products.


Hmm - I'm not sure I'd say this was necessarily normalised. In the  
example you gave there were three definite types of attribute:

 1. Price range ( 20, 20-50, ...)
 2. Product type (lighting, rugs, ...)
 3. Store (art.com, homeannex, ...)
Your example discards this type information.

I'm also not sure it lets store A sell widgets for 19.99 and B for  
25.99


So - let's look at how we might break this down into simple relations:
 product_types (product_id, prod_type, prod_subtype)
 product_availability (product_id, store_id, price_range)
and so on for each set of parameters.

Then, if PG isn't calculating fast enough I'd be tempted to throw  
in a summary table:
 product_counts(store_id, price_range, prod_type,  
prod_subtype, ..., num_products)

Then total over this for the top-level queries.

I'd also cache common top-level queries at the applicaton level  
anyway.


--
  Richard Huxton
  Archonet Ltd





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] bad performance on Solaris 10

2006-04-10 Thread Chris Mair

  Chris,
  Just to make sure the x4100 config is similar to your Linux system, can 
  you verify the default setting for disk write cache and make sure they 
  are both enabled or disabled. Here's how to check in Solaris.
  As root, run format -e - pick a disk - cache - write_cache - display
  
  Not sure how to do it on Linux though!
  
  Regards,
  -Robert
 
 I don't have access to the machine for the next few days due to eh...
 let's call it firewall accident ;), but it might very well be that it
 was off on the x4100 (I know it's on the smaller Linux box).
 
 That together with the bad default sync method can definitely explain
 the strangely slow out of box performance I got.
 
 So thanks again for explaining this to me :)
 
 Bye, Chris.

Just for completeness:
I checked now using the above commands and can confirm the write cache
was disabled on the x4100 and was on on Linux. 

Bye, Chris.






---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Encouraging multi-table join order

2006-04-10 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 I have a query that is intended to select from multiple small tables 
 to get a limited subset of incidentid and then join with a very 
 large table.  One of the operations will require a sequential scan, but 
 the planner is doing the scan on the very large table before joining the 
 small ones, resulting in a huge amount of disk I/O.  How would I make 
 this query join the large table only after narrowing down the possible 
 selections from the smaller tables?  This is running on version 8.0.3.

That's very strange --- the estimated cost of the seqscan is high enough
that the planner should have chosen a nestloop with inner indexscan on
the big table.  I'm not sure about the join-order point, but the hash
plan for the first join seems wrong in any case.

Um, you do have an index on eventactivity.incidentid, right?  What's the
datatype(s) of the incidentid columns?  What happens to the plan if you
turn off enable_hashjoin and enable_mergejoin?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Encouraging multi-table join order

2006-04-10 Thread Dan Harris

Tom Lane wrote:

That's very strange --- the estimated cost of the seqscan is high enough
that the planner should have chosen a nestloop with inner indexscan on
the big table.  I'm not sure about the join-order point, but the hash
plan for the first join seems wrong in any case.

Um, you do have an index on eventactivity.incidentid, right?  What's the
datatype(s) of the incidentid columns?  What happens to the plan if you
turn off enable_hashjoin and enable_mergejoin?

regards, tom lane
  
Yes, eventactivity.incidentid is indexed.  The datatype is varchar(40).  
Although, by checking this, I noticed that k_h.incidentid was 
varchar(100).  Perhaps the difference in length between the keys caused 
the planner to not use the fastest method?  I have no defense as to why 
those aren't the same.. I will make them so and check.


Here's the EXPLAIN analyze with enable_hashjoin = off and 
enable_mergejoin = off :


Limit  (cost=4226535.73..4226544.46 rows=698 width=82) (actual 
time=74339.016..74356.521 rows=888 loops=1)
  -  Unique  (cost=4226535.73..4226544.46 rows=698 width=82) (actual 
time=74339.011..74354.073 rows=888 loops=1)
-  Sort  (cost=4226535.73..4226537.48 rows=698 width=82) 
(actual time=74339.003..74344.031 rows=3599 loops=1)
  Sort Key: eventmain.entrydate, eventmain.incidentid, 
eventgeo.eventlocation, eventactivity.recordtext
  -  Nested Loop  (cost=0.00..4226502.76 rows=698 
width=82) (actual time=921.325..74314.959 rows=3599 loops=1)
-  Nested Loop  (cost=0.00..4935.61 rows=731 
width=72) (actual time=166.354..14638.308 rows=1162 loops=1)
  -  Nested Loop  (cost=0.00..2482.47 rows=741 
width=50) (actual time=150.396..7348.013 rows=1162 loops=1)
-  Index Scan using k_h_id_idx on k_h  
(cost=0.00..217.55 rows=741 width=14) (actual time=129.540..1022.243 
rows=1162 loops=1)

  Index Cond: (id = 33396)
  Filter: ((entrydate = 
'2006-01-01 00:00:00'::timestamp without time zone) AND (entrydate  
'2006-04-08 00:00:00'::timestamp without time zone))
-  Index Scan using 
eventgeo_incidentid_idx on eventgeo  (cost=0.00..3.04 rows=1 width=36) 
(actual time=5.260..5.429 rows=1 loops=1162)
  Index Cond: 
((eventgeo.incidentid)::text = (outer.incidentid)::text)
  -  Index Scan using eventmain_incidentid_idx 
on eventmain  (cost=0.00..3.30 rows=1 width=22) (actual 
time=5.976..6.259 rows=1 loops=1162)
Index Cond: 
((eventmain.incidentid)::text = (outer.incidentid)::text)
-  Index Scan using eventactivity1 on 
eventactivity  (cost=0.00..5774.81 rows=20 width=52) (actual 
time=29.768..51.334 rows=3 loops=1162)
  Index Cond: ((outer.incidentid)::text = 
(eventactivity.incidentid)::text)
  Filter: ' '::text || (recordtext)::text) 
|| ' '::text) ~~ '%HAL%'::text) AND (entrydate = '2006-01-01 
00:00:00'::timestamp without time zone) AND (entrydate  '2006-04-08 
00:00:00'::timestamp without time zone))




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Encouraging multi-table join order

2006-04-10 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 Yes, eventactivity.incidentid is indexed.  The datatype is varchar(40).  
 Although, by checking this, I noticed that k_h.incidentid was 
 varchar(100).  Perhaps the difference in length between the keys caused 
 the planner to not use the fastest method?

No, the planner wouldn't care about that.

 Here's the EXPLAIN analyze with enable_hashjoin = off and 
 enable_mergejoin = off :

OK, so it does consider the right plan, but it's estimating it'll take
longer than the other one.  One thing that's very strange is that the
estimated number of rows out has changed ... did you re-ANALYZE since
the previous message?

  -  Index Scan using eventactivity1 on 
 eventactivity  (cost=0.00..5774.81 rows=20 width=52) (actual 
 time=29.768..51.334 rows=3 loops=1162)
Index Cond: ((outer.incidentid)::text = 
 (eventactivity.incidentid)::text)
Filter: ' '::text || (recordtext)::text) 
 || ' '::text) ~~ '%HAL%'::text) AND (entrydate = '2006-01-01 
 00:00:00'::timestamp without time zone) AND (entrydate  '2006-04-08 
 00:00:00'::timestamp without time zone))

So it's estimating 5775 cost units per probe into eventactivity, which
is pretty high --- it must think that a lot of rows will be retrieved by
the index (way more than the 20 or so it thinks will get past the filter
condition).  What does the pg_stats entry for eventactivity.incidentid
contain?  It might be worth increasing the statistics target for that
column to try to get a better estimate.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] pgmemcache

2006-04-10 Thread C Storm
I was wondering if anyone on the list has a successful installation of
pgmemcache running
that uses LISTEN/NOTIFY to signal a successfully completed transaction,
i.e., to get around the fact
that TRIGGERS are transaction unaware.  Or perhaps any other
information regarding a successful
deployment of pgmemcache.

The information available on the web/groups is pretty scant. Any
information would be greatly appreciated!


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] slow IN clause

2006-04-10 Thread FavoYang
I have a slow sql:
SELECT * FROM mytable WHERE id IN (1,3,5,7,3k here...);
mytable is about 10k rows.

if don't use the IN clause, it will cost 0,11 second, otherwise it
will cost 2.x second
I guess pg use linear search to deal with IN clause, is there any way
to let pg use other search method with IN clause? (ex.Binary Search or
hash Search)


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] slow IN clause

2006-04-10 Thread Vinko Vrsalovic
On lun, 2006-04-10 at 12:44 +0800, Qingqing Zhou wrote: 
 [EMAIL PROTECTED] wrote
  I have a slow sql:
  SELECT * FROM mytable WHERE id IN (1,3,5,7,3k here...);
  mytable is about 10k rows.
 
  if don't use the IN clause, it will cost 0,11 second, otherwise it
  will cost 2.x second
  I guess pg use linear search to deal with IN clause, is there any way
  to let pg use other search method with IN clause? (ex.Binary Search or
  hash Search)
 
 
 If you can put (1, 3, .., 3k) in a table, PG may choose a hash join.

And maybe using

SELECT * FROM yourtable WHERE id  6002 AND id % 2 = 1;

turns out to be faster, if we are allowed to extrapolate from the
example. 

V.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org