Re: [PERFORM] database bloat,non removovable rows, slow query etc...

2006-09-01 Thread Patrick Hatcher
Are there open transactions on the table in question?  We had the same
issue.  A 100K row table was so bloated that the system thought there was
1M rows.  We had many IDLE transaction that we noticed in TOP, but since
we could not track down which process or user was holding the table we had
to restart Pg.  Once restarted we were able to do a VACUUM FULL and this
took care of the issue.
hth
Patrick Hatcher
Development Manager  Analytics/MIO
Macys.com



   
 Matteo Sgalaberni 
 [EMAIL PROTECTED] 
 Sent by:   To 
 pgsql-performance pgsql-performance@postgresql.org
 [EMAIL PROTECTED]  cc 
 .org  
   Subject 
   [PERFORM] database bloat,non
 09/01/06 05:39 AM removovable rows, slow query etc... 
   
   
   
   
   
   




Hi, probably this is a very frequenfly question... I read archivies of
this list but I didn't found a finally solution for this aspect. I'll
explain my situation.

PSQL version 8.1.3
configuration of fsm,etcc default
autovacuum and statistics activated

22 daemons that have a persistent connection to this database(all
connection are in idle(no transaction opened).

this is the vacuum output of a table that it's updated frequently:
database=# VACUUM ANALYZE verbose cliente;
INFO:  vacuuming public.cliente
INFO:  index cliente_pkey now contains 29931 row versions in 88 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index cliente_login_key now contains 29931 row versions in 165
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.01s/0.00u sec elapsed 0.00 sec.
INFO:  cliente: found 0 removable, 29931 nonremovable row versions in 559
pages
DETAIL:  29398 dead row versions cannot be removed yet.
There were 9 unused item pointers.
0 pages are entirely empty.
CPU 0.01s/0.01u sec elapsed 0.01 sec.
INFO:  vacuuming pg_toast.pg_toast_370357
INFO:  index pg_toast_370357_index now contains 0 row versions in 1 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  pg_toast_370357: found 0 removable, 0 nonremovable row versions in
0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing public.cliente
INFO:  cliente: scanned 559 of 559 pages, containing 533 live rows and
29398 dead rows; 533 rows in sample, 533 estimated total rows
VACUUM

database=# SELECT * from  pgstattuple('cliente');
 table_len | tuple_count | tuple_len | tuple_percent | dead_tuple_count |
dead_tuple_len | dead_tuple_percent | free_space | free_percent
---+-+---+---+--++++--

   4579328 | 533 | 84522 |  1.85 |29398 |
4279592 |  93.45 |  41852 | 0.91
(1 row)

The performance of this table it's degraded now and autovacuum/vacuum full
don't remove these dead tuples. Only if I do a CLUSTER of the table the
tuples
are removed.

The same problem is on other very trafficated tables.

I think that the problems probably are:
- tune the value of my fsm/etc settings in postgresql.conf but i don't
understdand how to tune it correctly.
- the persistent connections to this db conflict with the
autovacuum but i don't understand why. there are no transaction opened,
only connections in idle state.

Tell me what do you think...

Regards,

Matteo



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] database model tshirt sizes

2006-03-19 Thread Patrick Hatcher
We have size and color in the product table itself.  It is really an
attribute of the product.  If you update the availability of the product
often, I would split out the quantity into a separate table so that you can
truncate and update as needed.

Patrick Hatcher
Development Manager  Analytics/MIO
Macys.com


   
 NbForYou
 [EMAIL PROTECTED] 
 .com  To 
 Sent by:  pgsql-performance@postgresql.org  
 pgsql-performance  cc 
 [EMAIL PROTECTED] 
 .org  Subject 
   [PERFORM] database model tshirt 
   sizes   
 03/18/06 07:03 AM 
   
   
   
   
   




Hello,

Does anybody know how to build a database model to include sizes for rings,
tshirts, etc?


the current database is built like:

table product
=

productid int8 PK
productname charvar(255)
quantity int4


what i want now is that WHEN (not all products have multiple sizes) there
are multiple sizes available. The sizes are stored into the database. I was
wondering to include a extra table:

table sizes:

productid int8 FK
size varchar(100)


but then i have a quantity problem. Because now not all size quantities can
be stored into this table, because it allready exist in my product table.

How do professionals do it? How do they make their model to include sizes
if any available?


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Slow query. Any way to speed up?

2006-01-06 Thread Patrick Hatcher
Duh sorry.  We will eventually move to 8.x, it's just a matter of finding
the time:

Explain analyze
Select gmmid, gmmname, divid, divname, feddept, fedvend,itemnumber as
mstyle,amc_week_id,
sum(tran_itm_total) as net_dollars

FROM
public.tbldetaillevel_report a2 join  cdm.cdm_ddw_tran_item a1  on
a1.item_upc = a2.upc
join public.date_dim a3 on a3.date_dim_id = a1.cal_date
where
a3.date_dim_id between '2005-10-30' and '2005-12-31'
and
a1.appl_id  in  ('MCOM','NET')
and
 a1.tran_typ_id in ('S','R')
group by 1,2,3,4,5,6,7,8
order by 1,2,3,4,5,6,7,8


GroupAggregate  (cost=1648783.47..1650793.74 rows=73101 width=65) (actual
time=744556.289..753136.278 rows=168343 loops=1)
  -  Sort  (cost=1648783.47..1648966.22 rows=73101 width=65) (actual
time=744556.236..746634.566 rows=1185096 loops=1)
Sort Key: a2.gmmid, a2.gmmname, a2.divid, a2.divname, a2.feddept,
a2.fedvend, a2.itemnumber, a3.amc_week_id
-  Merge Join  (cost=1598067.59..1642877.78 rows=73101 width=65)
(actual time=564862.772..636550.484 rows=1185096 loops=1)
  Merge Cond: (outer.upc = inner.item_upc)
  -  Index Scan using report_upc_idx on tbldetaillevel_report
a2  (cost=0.00..47642.36 rows=367309 width=58) (actual
time=82.512..65458.137 rows=365989 loops=1)
  -  Sort  (cost=1598067.59..1598250.34 rows=73100 width=23)
(actual time=564764.506..566529.796 rows=1248862 loops=1)
Sort Key: a1.item_upc
-  Hash Join  (cost=94.25..1592161.99 rows=73100
width=23) (actual time=493500.913..548924.039 rows=1248851 loops=1)
  Hash Cond: (outer.cal_date =
inner.date_dim_id)
  -  Seq Scan on cdm_ddw_tran_item a1
(cost=0.00..1547562.88 rows=8754773 width=23) (actual
time=14.219..535704.691 rows=10838135 loops=1)
Filter: appl_id)::text = 'MCOM'::text)
OR ((appl_id)::text = 'NET'::text)) AND ((tran_typ_id = 'S'::bpchar) OR
(tran_typ_id = 'R'::bpchar)))
  -  Hash  (cost=94.09..94.09 rows=64 width=8)
(actual time=362.953..362.953 rows=0 loops=1)
-  Index Scan using date_date_idx on
date_dim a3  (cost=0.00..94.09 rows=64 width=8) (actual
time=93.710..362.802 rows=63 loops=1)
  Index Cond: ((date_dim_id =
'2005-10-30'::date) AND (date_dim_id = '2005-12-31'::date))
Total runtime: 753467.847 ms


Patrick Hatcher
Development Manager  Analytics/MIO
Macys.com
415-422-1610



   
 Tom Lane  
 [EMAIL PROTECTED] 
 s To 
   Patrick Hatcher 
 01/05/06 09:07 PM [EMAIL PROTECTED]
cc 
   pgsql-performance@postgresql.org
   Subject 
   Re: [PERFORM] Slow query. Any way   
   to speed up?
   
   
   
   
   
   




Patrick Hatcher [EMAIL PROTECTED] writes:
 The following SQL takes 4+ mins to run.  I have indexes on all join
fields
 and I've tried rearranging the table orders but haven't had any luck.

Please show EXPLAIN ANALYZE output, not just EXPLAIN.  It's impossible
to tell whether the planner is making any wrong guesses when you can't
see the actual times/rowcounts ...

(BTW, 7.4 is looking pretty long in the tooth.)

 regards, tom lane



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] slow update

2005-10-13 Thread Patrick Hatcher
Thanks.  No foreign keys and I've been bitten by the mismatch datatypes and
checked that before sending out the message :)

Patrick Hatcher
Development Manager  Analytics/MIO
Macys.com




   
 Tom Lane  
 [EMAIL PROTECTED] 
 s To 
   Patrick Hatcher 
 10/13/2005 11:34  [EMAIL PROTECTED]
 AM cc 
   postgres performance list   
   pgsql-performance@postgresql.org  
   Subject 
   Re: [PERFORM] slow update   
   
   
   
   
   
   




Patrick Hatcher [EMAIL PROTECTED] writes:
 Pg 7.4.5

 Trying to do a update of fields on 23M row database.
 Is it normal for this process to take 16hrs and still clocking?

Are there foreign keys pointing at the table being updated?  If so,
failure to index the referencing columns could create this sort of
performance problem.  Also, in 7.4 you'd better be sure the referencing
columns are the same datatype as the referenced column.

 regards, tom lane



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Poor SQL performance

2005-09-02 Thread Patrick Hatcher




Hey there folks.  I'm at a loss as to how to increase the speed of this
query.  It's something I need to run each day, but can't at the rate this
runs.  Tables are updated 1/day and is vacuum analyzed after each load.

select ddw_tran_key, r.price_type_id, t.price_type_id
from
cdm.cdm_ddw_tran_item_header h JOIN cdm.cdm_ddw_tran_item t
on t.appl_xref=h.appl_xref
JOIN
mdc_upc u ON
u.upc = t.item_upc
JOIN
mdc_price_history r
ON
r.upc_id = u.keyp_upc and date(r.site_timestamp) = h.first_order_date
where
cal_date = '2005-08-31'
and
h.appl_id= 'MCOM'
and tran_typ_id='S'
limit 1000


My explain is just horrendous:

QUERY
PLAN
--
 Limit  (cost=288251.71..342657.36 rows=258 width=14)
   -  Merge Join  (cost=288251.71..342657.36 rows=258 width=14)
 Merge Cond: ((outer.appl_xref)::text = inner.?column6?)
 Join Filter: (date(inner.site_timestamp) =
outer.first_order_date)
 -  Index Scan using cdm_ddw_tran_item_header_pkey on
cdm_ddw_tran_item_header h  (cost=0.00..51188.91 rows=789900 width=21)
   Filter: ((appl_id)::text = 'MCOM'::text)
 -  Sort  (cost=288251.71..288604.31 rows=141038 width=39)
   Sort Key: (t.appl_xref)::text
   -  Hash Join  (cost=29708.54..276188.93 rows=141038
width=39)
 Hash Cond: (outer.upc_id = inner.keyp_upc)
 -  Seq Scan on mdc_price_history r
(cost=0.00..189831.09 rows=11047709 width=16)
 -  Hash  (cost=29698.81..29698.81 rows=3892 width=31)
   -  Nested Loop  (cost=0.00..29698.81 rows=3892
width=31)
 -  Index Scan using
cdm_ddw_tran_item_cal_date on cdm_ddw_tran_item t  (cost=0.00..14046.49
rows=3891 width=35)
   Index Cond: (cal_date =
'2005-08-31'::date)
   Filter: (tran_typ_id = 'S'::bpchar)
 -  Index Scan using mdcupcidx on mdc_upc
u  (cost=0.00..4.01 rows=1 width=12)
   Index Cond: (u.upc =
outer.item_upc)
(18 rows)



What I found is that I remove change the line:
r.upc_id = u.keyp_upc and date(r.site_timestamp) = h.first_order_date

To
r.upc_id = u.keyp_upc

My query plan drops to:
  QUERY
PLAN
---
 Limit  (cost=33327.39..37227.51 rows=1000 width=14)
   -  Hash Join  (cost=33327.39..279027.01 rows=62998 width=14)
 Hash Cond: (outer.upc_id = inner.keyp_upc)
 -  Seq Scan on mdc_price_history r  (cost=0.00..189831.09
rows=11047709 width=8)
 -  Hash  (cost=33323.05..33323.05 rows=1738 width=14)
   -  Nested Loop  (cost=0.00..33323.05 rows=1738 width=14)
 -  Nested Loop  (cost=0.00..26335.62 rows=1737
width=18)
   -  Index Scan using cdm_ddw_tran_item_cal_date
on cdm_ddw_tran_item t  (cost=0.00..14046.49 rows=3891 width=35)
 Index Cond: (cal_date =
'2005-08-31'::date)
 Filter: (tran_typ_id = 'S'::bpchar)
   -  Index Scan using
cdm_ddw_tran_item_header_pkey on cdm_ddw_tran_item_header h
(cost=0.00..3.15 rows=1 width=17)
 Index Cond: ((outer.appl_xref)::text =
(h.appl_xref)::text)
 Filter: ((appl_id)::text = 'MCOM'::text)
 -  Index Scan using mdcupcidx on mdc_upc u
(cost=0.00..4.01 rows=1 width=12)
   Index Cond: (u.upc = outer.item_upc)
(15 rows)




Unfortunately, I need this criteria since it contains the first date of the
order and is used to pull the correct price.
Any suggestions?
TIA
Patrick





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Sluggish server performance

2005-03-28 Thread Patrick Hatcher




Pg:  7.4.5
RH 7.3
Raid 0+1  (200G 15k RPM)
Quad Xeon
8G ram

95% Read-only
5% - read-write

I'm experiencing extreme load issues on my machine anytime I have more than
40 users connected to the database.  The majority of the users appear to be
in an idle state according TOP, but if more than3 or more queries are ran
the system slows to a crawl. The queries don't appear to the root cause
because they run fine when the load drops.  I also doing routine vacuuming
on the tables.

Is there some place I need to start looking for the issues bogging down the
server?


Here are some of my settings.  I can provide more as needed:


cat /proc/sys/kernel/shmmax
175013888

max_connections = 100

#---
# RESOURCE USAGE (except WAL)
#---

# - Memory -

shared_buffers = 2000   # min 16, at least max_connections*2, 8KB
each
sort_mem = 12288# min 64, size in KB
#vacuum_mem = 8192  # min 1024, size in KB

# - Free Space Map -

max_fsm_pages = 300 # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 500 # min 100, ~50 bytes each


#---
# WRITE AHEAD LOG
#---

# - Settings -

#fsync = true   # turns forced synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
# fsync, fdatasync, open_sync, or
open_datasync
wal_buffers = 32# min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 50# in logfile segments, min 1, 16MB each
checkpoint_timeout = 1800   # range 30-3600, in seconds


# - Planner Cost Constants -

effective_cache_size =  262144  # typically 8KB each
#effective_cache_size = 625000  # typically 8KB each
random_page_cost = 2# units are one sequential page fetch cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)


Patrick Hatcher



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Improve BULK insertion

2004-12-04 Thread Patrick Hatcher
I do mass inserts daily into PG. I drop the all indexes except my primary key and then use the COPY FROM command. This usually takes less than 30 seconds. I spend more time waiting for indexes to recreate.PatrickHatcherMacys.Com[EMAIL PROTECTED] wrote: -To: [EMAIL PROTECTED]From: Christopher Browne [EMAIL PROTECTED]Sent by: [EMAIL PROTECTED]Date: 2004-12-04 06:48AMSubject: Re: [PERFORM] Improve BULK insertionIn the last exciting episode, [EMAIL PROTECTED] (Grupos) wrote: Hi ! I need to insert 500.000 records on a table frequently. It´s a bulk insertion from my applicatoin. I am with a very poor performance. PostgreSQL insert very fast until the tuple 200.000 and after it the insertion starts to be really slow. I am seeing on the log and there is a lot of transaction logs, something like : 2004-12-04 11:08:59 LOG: recycled transaction log file "00060012" 2004-12-04 11:08:59 LOG: recycled transaction log file "00060013" 2004-12-04 11:08:59 LOG: recycled transaction log file "00060011" 2004-12-04 11:14:04 LOG: recycled transaction log file "00060015" 2004-12-04 11:14:04 LOG: recycled transaction log file "00060014" 2004-12-04 11:19:08 LOG: recycled transaction log file "00060016" 2004-12-04 11:19:08 LOG: recycled transaction log file "00060017" 2004-12-04 11:24:10 LOG: recycled transaction log file "00060018"It is entirely normal for there to be a lot of transaction log filerecycling when bulk inserts are taking place; that goes through a lotof transaction logs. How can I configure PostgreSQL to have a better performance on this bulk insertions ? I already increased the memory values.Memory is, as likely as not, NOT the issue.Two questions: 1. How are you doing the inserts? Via INSERT statements? Or   via COPY statements? What sort of transaction grouping   is involved?   COPY is way faster than INSERT, and grouping plenty of updates   into a single transaction is generally a "win." 2. What is the schema like? Does the table have a foreign key   constraint? Does it have a bunch of indices?   If there should eventually be lots of indices, it tends to be   faster to create the table with none/minimal indices, and add   indexes afterwards, as long as your "load" process can be trusted   to not break "unique" constraints...   If there is some secondary table with a foreign key constraint,   and _that_ table is growing, it is possible that a sequential   scan is being used to search the secondary table where, if you   did an ANALYZE on that table, an index scan would be preferred   once it grew to larger size...There isn't a particular reason for PostgreSQL to "hit a wall" uponseeing 200K records; I and coworkers routinely load database dumpsthat have millions of (sometimes pretty fat) records, and they don't"choke." That's true whether talking about loading things onto my(somewhat wimpy) desktop PC, or a SMP Xeon system with a small RAIDarray, or higher end stuff involving high end SMP and EMC disk arrays.The latter obviously being orders of magnitude faster than desktopequipment :-).-- (format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")http://www3.sympatico.ca/cbbrowne/unix.htmlRules of the Evil Overlord #207. "Employees will have conjugal visittrailers which they may use provided they call in a replacement andsign out on the timesheet. Given this, anyone caught making out in acloset while leaving their  station unmonitored will be shot."http://www.eviloverlord.com/---(end of broadcast)---TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] Slow update/insert process

2004-10-04 Thread Patrick Hatcher

Thanks for the help.
I found the culprit. The user
had created a function within the function (
pm.pm_price_post_inc(prod.keyp_products)).
Once this was fixed the time dropped dramatically.


Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM





Patrick Hatcher [EMAIL PROTECTED]

Sent by: [EMAIL PROTECTED]
10/01/04 11:14 AM




To
[EMAIL PROTECTED]


cc



Subject
[PERFORM] Slow update/insert
process









Pg: 7.4.5 
RH 7.3 
8g Ram 
200 g drive space 
RAID0+1 
Tables vacuum on a nightly basis 

The following process below takes 8 hours to run on 90k records and I'm
not sure where to being to look for the bottleneck. This isn't the
only updating on this database that seems to take a long time to complete.
Is there something I should be looking for in my conf settings? 


TIA 
Patrick 


SQL: 
---Bring back only selected records to run through the update process.

--Without the function the SQL takes  10secs to return 90,000 records

SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon'))

FROM mdc_upc upc 
JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products

JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product

JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products
and keyf_attribute=22 
WHERE 
upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG
HEUER') 
AND keyf_producttype222 
AND prod.action_publish = 1; 


Function: 

CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, varchar)
 RETURNS bool AS
'DECLARE
  varkeyf_upcALIAS
FOR $1;
  varPasswordALIAS
FOR $2;
  varRealMD5varchar;
  varDeltaMD5varchar;
  varLastTouchDatedate;
  varQuery
text;
  varQuery1
text;
  varQueryMD5text;
  varQueryRecordrecord;
  varFuncStatusboolean := false;
  
BEGIN

-- Check the password
 IF varPassword  \'amazon\' THEN
  Return false;
 END IF;


-- Get the md5 hash for this product
 SELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products))
AS md5
  FROM public.mdc_upc upc
  JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products
  JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products
  WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ;
 

 IF NOT FOUND THEN
  RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc;
 ELSE
  varRealMD5:=varQueryRecord.md5;
 END IF;

-- Check that the product is in the delta table and return its hash
for comparison 
 SELECT into varQueryRecord md5_hash,last_touch_date 
  FROM pm.pm_delta_master_amazon
  WHERE keyf_upc = varkeyf_upc LIMIT 1;

 IF NOT FOUND THEN
  -- ADD and exit
  INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date)
  values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE);
  varFuncStatus:=true;
  RETURN varFuncStatus;
 ELSE
  --Update the record 
   --- If the hash matches then set the record to HOLD
  IF varRealMD5 = varQueryRecord.md5_hash THEN
UPDATE pm.pm_delta_master_amazon
 SET status= \'hold\',
 last_touch_date = CURRENT_DATE
WHERE keyf_upc = varkeyf_upc AND last_touch_date
 CURRENT_DATE; 
varFuncStatus:=true;
  ELSE
-- ELSE mark the item as ADD
UPDATE pm.pm_delta_master_amazon
 SET status= \'add\',
 last_touch_date = CURRENT_DATE
WHERE keyf_upc = varkeyf_upc;
varFuncStatus:=true;
  END IF;
 END IF;

 RETURN varFuncStatus;
END;'
 LANGUAGE 'plpgsql' IMMUTABLE;



TableDef 
CREATE TABLE pm.pm_delta_master_amazon ( 
  keyf_upc   
int4 , 
  status
varchar(6) , 
  md5_hash   
varchar(40) , 
  last_touch_datedate 
  ) 
GO 

CREATE INDEX status_idx 
  ON pm.pm_delta_master_amazon(status)

GO 




CONF 
 
# WRITE AHEAD LOG 
#---


# - Settings - 

#fsync = true 
# turns forced synchronization on or off 
#wal_sync_method = fsync# the default varies
across platforms: 
  
 # fsync, fdatasync, open_sync,
or open_datasync 
wal_buffers = 32#
min 4, 8KB each 

# - Checkpoints - 

checkpoint_segments = 50# in logfile segments,
min 1, 16MB each 
checkpoint_timeout = 600# range 30-3600, in
seconds 
#checkpoint_warning = 30# 0 is off, in seconds

#commit_delay = 0# range
0-10, in microseconds 
#commit_siblings = 5  # range 1-1000



Patrick Hatcher
Macys.Com


[PERFORM] Slow update/insert process

2004-10-01 Thread Patrick Hatcher

Pg: 7.4.5
RH 7.3
8g Ram
200 g drive space
RAID0+1
Tables vacuum on a nightly basis

The following process below takes 8
hours to run on 90k records and I'm not sure where to being to look for
the bottleneck. This isn't the only updating on this database that
seems to take a long time to complete. Is there something I should
be looking for in my conf settings? 

TIA
Patrick


SQL:
---Bring back only selected records
to run through the update process.
--Without the function the SQL takes
 10secs to return 90,000 records
SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon'))
FROM mdc_upc upc
JOIN public.mdc_products prod ON upc.keyf_products
= prod.keyp_products
JOIN public.mdc_price_post_inc price
ON prod.keyp_products = price.keyf_product
JOIN public.mdc_attribute_product ap
on ap.keyf_products = prod.keyp_products and keyf_attribute=22
WHERE 
upper(trim(ap.attributevalue)) NOT IN
('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG
HEUER')
AND keyf_producttype222
AND prod.action_publish = 1;


Function:

CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, varchar)
 RETURNS bool AS
'DECLARE
  varkeyf_upc  
 ALIAS FOR $1;
  varPassword  
 ALIAS FOR $2;
  varRealMD5  
 varchar;
  varDeltaMD5  
 varchar;
  varLastTouchDatedate;
  varQuery  
  text;
  varQuery1  
  text;
  varQueryMD5  
 text;
  varQueryRecordrecord;
  varFuncStatusboolean
:= false;
  
BEGIN

-- Check the password
 IF varPassword  \'amazon\' THEN
  Return false;
 END IF;


-- Get the md5 hash for this product
 SELECT into varQueryRecord md5(upc.keyp_upc || prod.description
|| pm.pm_price_post_inc(prod.keyp_products)) AS md5
  FROM public.mdc_upc upc
  JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products
  JOIN public.mdc_price_post_inc price ON price.keyf_product
= prod.keyp_products
  WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ;
 
 
 IF NOT FOUND THEN
  RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc;
 ELSE
  varRealMD5:=varQueryRecord.md5;
 END IF;

-- Check that the product is in the delta table and return its hash
for comparison 
 SELECT into varQueryRecord md5_hash,last_touch_date 
  FROM pm.pm_delta_master_amazon
  WHERE keyf_upc = varkeyf_upc LIMIT 1;

 IF NOT FOUND THEN
  -- ADD and exit
  INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date)
  values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE);
  varFuncStatus:=true;
  RETURN varFuncStatus;
 ELSE
  --Update the record 
   --- If the hash matches then set the record to HOLD
  IF varRealMD5 = varQueryRecord.md5_hash THEN
UPDATE pm.pm_delta_master_amazon
 SET status= \'hold\',
 last_touch_date = CURRENT_DATE
WHERE keyf_upc = varkeyf_upc
AND last_touch_date  CURRENT_DATE; 
varFuncStatus:=true;
  ELSE
-- ELSE mark the item as ADD
UPDATE pm.pm_delta_master_amazon
 SET status= \'add\',
 last_touch_date = CURRENT_DATE
WHERE keyf_upc = varkeyf_upc;
varFuncStatus:=true;
  END IF;
  END IF;

 RETURN varFuncStatus;
END;'
 LANGUAGE 'plpgsql' IMMUTABLE;



TableDef
CREATE TABLE pm.pm_delta_master_amazon
( 
  keyf_upc  
 int4 ,
  status   
 varchar(6) ,
  md5_hash  
 varchar(40) ,
  last_touch_date 
  date 
  )
GO

CREATE INDEX status_idx
  ON pm.pm_delta_master_amazon(status)
GO





CONF

# WRITE AHEAD LOG
#---

# - Settings -

#fsync = true
 # turns forced synchronization on or
off
#wal_sync_method = fsync  
 # the default varies across platforms:
 
  #
fsync, fdatasync, open_sync, or open_datasync
wal_buffers = 32   
# min 4, 8KB each

# - Checkpoints -

checkpoint_segments = 50  
 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 600  
 # range 30-3600, in seconds
#checkpoint_warning = 30  
 # 0 is off, in seconds
#commit_delay = 0   
# range 0-10, in microseconds
#commit_siblings = 5   
  # range 1-1000


Patrick Hatcher
Macys.Com


Re: [PERFORM] vacuum full max_fsm_pages question

2004-09-23 Thread Patrick Hatcher

I upgraded to 7.4.3 this morning and
did a vacuum full analyze on the problem table and now the indexes show
the correct number of records


Patrick Hatcher
Macys.Com






Josh Berkus [EMAIL PROTECTED]

Sent by: [EMAIL PROTECTED]
09/21/04 10:49 AM




To
Patrick Hatcher
[EMAIL PROTECTED]


cc
Robert Treat
[EMAIL PROTECTED], [EMAIL PROTECTED]


Subject
Re: [PERFORM] vacuum full
 max_fsm_pages question








Patrick,

 Sorry. I wrote PG 7.4.2 and then I erased it to write something
else and
 then forgot to add it back.

Odd. You shouldn't be having to re-vacuum on 7.4.

 And thanks for the Page info. I was getting frustrated and looked
in the
 wrong place.

 So it's probably best to drop and readd the indexes then?

Well, I have to wonder if you've not run afoul of the known 7.4.2 bug 
regarding indexes.  This system hasn't had an improper database shutdown
or 
power-out in the last few weeks, has it?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match



Re: [PERFORM] vacuum full max_fsm_pages question

2004-09-21 Thread Patrick Hatcher
Sorry.  I wrote PG 7.4.2 and then I erased it to write something else and
then forgot to add it back.

And thanks for the Page info.  I was getting frustrated and looked in the
wrong place.

So it's probably best to drop and readd the indexes then?


- Original Message - 
From: Robert Treat [EMAIL PROTECTED]
To: Patrick Hatcher [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]
Sent: Monday, September 20, 2004 11:12 PM
Subject: Re: [PERFORM] vacuum full  max_fsm_pages question


 On Tuesday 21 September 2004 00:01, Patrick Hatcher wrote:
  Hello.
  Couple of questions:
  - Q1: Today I decided to do a vacuum full verbose analyze on a large
table
  that has been giving me slow performance.  And then I did it again.  I
  noticed that after each run the values in my indexes and estimate row
  version changed.  What really got me wondering is the fact my indexes
  report more rows than are in the table and then the estimated rows is
less
  than the actual amount.
 
  The table is a read-only table that is updated 1/wk.  After updating it
is
  vacuumed full.  I've also tried reindexing but the numbers still change.
Is
  this normal?  Below is a partial output for 4 consecutive vacuum full
  analyzes.  No data was added nor was there anyone in the table.
 

 This looks normal to me for a pre 7.4 database, if I am right your running
on
 7.2? Basically your indexes are overgrown, so each time you run vacuum you
 are shrinking the number of pages involved, which will change the row
counts,
 and correspondingly change the count on the table as the sampled pages
 change.


  - Q2: I have about a dozen 5M plus row tables.  I currently have my
  max_fsm_pages set to 300,000.  As you can see in vacuum full output I
  supplied, one table is already over this amount.  Is there a limit on
the
  size of max_fsm_pages?
 

 The limit is based on your memory... each page = 6 bytes.  But according
to
 the output below you are not over 30 pages yet on that table (though
you
 might be on some other tables.)

 
  CONF settings:
  # - Memory -
 
  shared_buffers = 2000   # min 16, at least max_connections*2,
8KB
  each sort_mem = 12288# min 64, size in KB
  #vacuum_mem = 8192  # min 1024, size in KB
 
  # - Free Space Map -
 
  max_fsm_pages = 30  # min max_fsm_relations*16, 6 bytes each
  max_fsm_relations = 500 # min 100, ~50 bytes each
 
 
  Vacuum full information
  #after second vacuum full
  INFO:  index emaildat_fkey now contains 8053743 row versions in 25764
  pages DETAIL:  1895 index row versions were removed.
  0 index pages have been deleted, 0 are currently reusable.
  CPU 2.38s/0.42u sec elapsed 11.11 sec.
  INFO:  analyzing cdm.cdm_email_data
  INFO:  cdm_email_data: 65882 pages, 3000 rows sampled, 392410
estimated
  total rows
 
 
  #after third vacuum full
  INFO:  index emaildat_fkey now contains 8052738 row versions in 25769
  pages DETAIL:  890 index row versions were removed.
  0 index pages have been deleted, 0 are currently reusable.
  CPU 2.08s/0.32u sec elapsed 4.36 sec.
  INFO:  analyzing cdm.cdm_email_data
  INFO:  cdm_email_data: 65874 pages, 3000 rows sampled, 392363
estimated
  total rows
 
 
  #after REINDEX and  vacuum full
  INFO:  index emaildat_fkey now contains 8052369 row versions in 25771
  pages DETAIL:  521 index row versions were removed.
  0 index pages have been deleted, 0 are currently reusable.
  CPU 1.37s/0.35u sec elapsed 4.79 sec.
  INFO:  analyzing cdm.cdm_email_data
  INFO:  cdm_email_data: 65869 pages, 3000 rows sampled, 392333
estimated
  total rows
 
  #After vacuum full(s)
  mdc_oz=# select count(*) from cdm.cdm_email_data;
count
  -
   5433358
  (1 row)
 

 I do think the count(*) seems a bit off based on the vacuum output above.
I'm
 guessing you either have blocking transactions in the way or your not
giving
 us a complete copy/paste of the session involved.

 -- 
 Robert Treat
 Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] vacuum full max_fsm_pages question

2004-09-20 Thread Patrick Hatcher




Hello.
Couple of questions:



- Q1: Today I decided to do a vacuum full verbose 
analyze on a large table that has been giving me slow performance. And 
then I did itagain. I noticed that after each run the values in my 
indexes and estimate row versionchanged. What really got me 
wondering is the fact my indexes report more rows than are in the table and then 
the estimated rows is less than the actual amount.

The table is a read-only table that is updated 
1/wk. After updating it is vacuumed full. I've also tried reindexing 
but the numbers still change.
Is this normal? Below is a partial output for 
4 consecutive vacuum full analyzes. No data was added nor was there anyone 
in the table.

- Q2: I have about a dozen 5M plus row 
tables. I currently have my max_fsm_pages set to 300,000. As you can 
see in vacuum full output I supplied, one table is already over this 
amount. Is there a limit on the size of max_fsm_pages?


CONF settings:
# - Memory -

shared_buffers = 
2000 # min 16, at 
least max_connections*2, 8KB eachsort_mem = 
12288 
# min 64, size in KB#vacuum_mem = 
8192 
# min 1024, size in KB

# - Free Space Map -

max_fsm_pages = 
30 # min 
max_fsm_relations*16, 6 bytes eachmax_fsm_relations = 
500 # min 100, ~50 bytes 
each

Vacuum full information
#after second vacuum full
INFO: index "emaildat_fkey" now contains 
8053743 row versions in 25764 pagesDETAIL: 1895 index row versions 
were removed.0 index pages have been deleted, 0 are currently 
reusable.CPU 2.38s/0.42u sec elapsed 11.11 sec.INFO: analyzing 
"cdm.cdm_email_data"INFO: "cdm_email_data": 65882 pages, 3000 rows 
sampled, 392410 estimated total rows


#after third vacuum full
INFO: index "emaildat_fkey" now contains 
8052738 row versions in 25769 pagesDETAIL: 890 index row versions were 
removed.0 index pages have been deleted, 0 are currently reusable.CPU 
2.08s/0.32u sec elapsed 4.36 sec.INFO: analyzing 
"cdm.cdm_email_data"INFO: "cdm_email_data": 65874 pages, 3000 rows 
sampled, 392363 estimated total rows


#after REINDEX and vacuum full
INFO: index "emaildat_fkey" now contains 
8052369 row versions in 25771 pagesDETAIL: 521 index row versions were 
removed.0 index pages have been deleted, 0 are currently reusable.CPU 
1.37s/0.35u sec elapsed 4.79 sec.INFO: analyzing 
"cdm.cdm_email_data"INFO: "cdm_email_data": 65869 pages, 3000 rows 
sampled, 392333 estimated total rows

#After vacuum full(s)
mdc_oz=# select count(*) from 
cdm.cdm_email_data; count-5433358(1 
row)

TIA
Patrick


[PERFORM] vacuum full 100 mins plus?

2004-07-14 Thread Patrick Hatcher




Should I be concerned that my vacuum process has taken upwards of 100 +
minutes to complete?  I dropped all indexes before starting and also
increased the vacuum_mem before starting.
Looking at the output below, it appears that a vacuum full hasn't been done
on this table for quite sometime.  Would I be better off exporting the data
vacuuming the table and reimporting the data?  I cannot drop the table do
to views attached to the table


mdc_oz=# set vacuum_mem = 10240;
SET
mdc_oz=# vacuum full verbose cdm.cdm_ddw_Tran_item;
INFO:  vacuuming cdm.cdm_ddw_tran_item
INFO:  cdm_ddw_tran_item: found 15322404 removable, 10950460 nonremovable
row versions in 934724 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 233 to 308 bytes long.
There were 1081 unused item pointers.
Total free space (including removable row versions) is 4474020460 bytes.
544679 pages are or will become empty, including 0 at the end of the table.
692980 pages containing 4433398408 free bytes are potential move
destinations.
CPU 29.55s/4.13u sec elapsed 107.82 sec.


TIA
Patrick Hatcher


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] vacuum full 100 mins plus?

2004-07-14 Thread Patrick Hatcher






Answered my own question.  I gave up the vacuum full after 150 mins.  I was
able to export to a file, vacuum full the empty table, and reimport in less
than 10 mins.  I suspect the empty item pointers and the sheer number of
removable rows was causing an issue.



   
 Patrick Hatcher   
 [EMAIL PROTECTED] 
 omTo 
 Sent by:  [EMAIL PROTECTED]
 pgsql-performance  cc 
 [EMAIL PROTECTED] 
 .org  Subject 
   [PERFORM] vacuum full 100 mins  
   plus?   
 07/14/2004 02:29  
 PM
   
   
   
   








Should I be concerned that my vacuum process has taken upwards of 100 +
minutes to complete?  I dropped all indexes before starting and also
increased the vacuum_mem before starting.
Looking at the output below, it appears that a vacuum full hasn't been done
on this table for quite sometime.  Would I be better off exporting the data
vacuuming the table and reimporting the data?  I cannot drop the table do
to views attached to the table


mdc_oz=# set vacuum_mem = 10240;
SET
mdc_oz=# vacuum full verbose cdm.cdm_ddw_Tran_item;
INFO:  vacuuming cdm.cdm_ddw_tran_item
INFO:  cdm_ddw_tran_item: found 15322404 removable, 10950460 nonremovable
row versions in 934724 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 233 to 308 bytes long.
There were 1081 unused item pointers.
Total free space (including removable row versions) is 4474020460 bytes.
544679 pages are or will become empty, including 0 at the end of the table.
692980 pages containing 4433398408 free bytes are potential move
destinations.
CPU 29.55s/4.13u sec elapsed 107.82 sec.


TIA
Patrick Hatcher


---(end of broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Slow vacuum performance

2004-06-17 Thread Patrick Hatcher




Pg:  7.4.2
RedHat 7.3
Ram: 8gig

I have 6 million row table that I vacuum full analyze each night.  The time
seems to be streching out further and further as I add more rows.  I read
the archives and Josh's annotated pg.conf guide that setting the FSM higher
might help.  Currently, my memory settings are set as such.  Does this seem
low?

Last reading from vaccum verbose:
  INFO:  analyzing cdm.cdm_ddw_customer
INFO:  cdm_ddw_customer: 209106 pages, 3000 rows sampled, 6041742
estimated total rows
I think I should now set my max FSM to at least 21 but wanted to make
sure

shared_buffers = 2000   # min 16, at least max_connections*2, 8KB
each
sort_mem = 12288# min 64, size in KB

# - Free Space Map -

max_fsm_pages = 10  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~50 bytes each


TIA

Patrick Hatcher
Macys.Com


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Patrick Hatcher

here's the URL:
http://techdocs.postgresql.org/techdocs/pgsqladventuresep2.php

Patrick Hatcher
Macys.Com
Legacy Integration Developer
415-422-1610 office
HatcherPT - AIM


   
   Patrick 
   Hatcher/MCOM/FDD
To 
   11/10/2003 12:31 PMMarc G. Fournier   
  [EMAIL PROTECTED]@FDS-NOTES   
cc 
  [EMAIL PROTECTED],
  [EMAIL PROTECTED] 
  rg   
   Subject 
  Re: [PERFORM] *very* slow query to   
  summarize data for a month ...   
  (Document link: Patrick Hatcher) 
   
   
   
   
   
   



Do you have an index on ts.bytes?  Josh had suggested this and after I put
it on my summed fields, I saw a speed increase.  I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher



   
   Marc G. Fournier  
   [EMAIL PROTECTED] 
   .orgTo 
   Sent by:   [EMAIL PROTECTED] 
   pgsql-performance-o  cc 
   [EMAIL PROTECTED] 
   Subject 
  [PERFORM] *very* slow query to   
   11/10/2003 12:18 PMsummarize data for a month ...   
   
   
   
   
   
   





Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
 AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
--

 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
   -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
 -  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
 -  Sort  (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Index Scan using tl_month on traffic_logs ts
(cost=0.00

Re: [PERFORM] *very* slow query to summarize data for a month ...

2003-11-10 Thread Patrick Hatcher

Do you have an index on ts.bytes?  Josh had suggested this and after I put
it on my summed fields, I saw a speed increase.  I can't remember the
article was that Josh had written about index usage, but maybe he'll chime
in and supply the URL for his article.
hth

Patrick Hatcher



   
   Marc G. Fournier  
   [EMAIL PROTECTED] 
   .orgTo 
   Sent by:   [EMAIL PROTECTED] 
   pgsql-performance-o  cc 
   [EMAIL PROTECTED] 
   Subject 
  [PERFORM] *very* slow query to   
   11/10/2003 12:18 PMsummarize data for a month ...   
   
   
   
   
   
   





Table structure is simple:

CREATE TABLE traffic_logs (
company_id bigint,
ip_id bigint,
port integer,
bytes bigint,
runtime timestamp without time zone
);

runtime is 'day of month' ...

I need to summarize the month, per company, with a query as:

explain analyze SELECT ts.company_id, company_name, SUM(ts.bytes) AS
total_traffic
FROM company c, traffic_logs ts
   WHERE c.company_id = ts.company_id
 AND month_trunc(ts.runtime) = '2003-10-01'
GROUP BY company_name,ts.company_id;

and the explain looks like:
QUERY PLAN
--

 Aggregate  (cost=32000.94..32083.07 rows=821 width=41) (actual
time=32983.36..47586.17 rows=144 loops=1)
   -  Group  (cost=32000.94..32062.54 rows=8213 width=41) (actual
time=32957.40..42817.88 rows=462198 loops=1)
 -  Sort  (cost=32000.94..32021.47 rows=8213 width=41) (actual
time=32957.38..36261.31 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=31321.45..31466.92 rows=8213 width=41)
(actual time=13983.07..22642.14 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=5.52..7.40 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=0.02..2.78 rows=352 loops=1)
 -  Sort  (cost=31297.04..31317.57 rows=8213 width=16)
(actual time=13977.49..16794.41 rows=462198 loops=1)
   Sort Key: ts.company_id
   -  Index Scan using tl_month on traffic_logs ts
(cost=0.00..30763.02 rows=8213 width=16) (actual time=0.29..5562.25
rows=462198 loops=1)
 Index Cond: (month_trunc(runtime)
= '2003-10-01 00:00:00'::timestamp without time zone)
 Total runtime: 47587.82 msec
(14 rows)

the problem is that we're only taking a few months worth of data, so I
don't think there is much of a way of 'improve performance' on this, but
figured I'd ask quickly before I do something rash ...

Note that without the month_trunc() index, the Total runtime more then
doubles:


QUERY PLAN


 Aggregate  (cost=39578.63..39660.76 rows=821 width=41) (actual
time=87805.47..101251.35 rows=144 loops=1)
   -  Group  (cost=39578.63..39640.23 rows=8213 width=41) (actual
time=87779.56..96824.56 rows=462198 loops=1)
 -  Sort  (cost=39578.63..39599.17 rows=8213 width=41) (actual
time=87779.52..90781.48 rows=462198 loops=1)
   Sort Key: c.company_name, ts.company_id
   -  Merge Join  (cost=38899.14..39044.62 rows=8213 width=41)
(actual time=64073.98..72783.68 rows=462198 loops=1)
 Merge Cond: (outer.company_id = inner.company_id)
 -  Sort  (cost=24.41..25.29 rows=352 width=25)
(actual time=64.66..66.55 rows=348 loops=1)
   Sort Key: c.company_id
   -  Seq Scan on company c  (cost=0.00..9.52
rows=352 width=25) (actual time=1.76..61.70 rows=352 loops=1)
 -  Sort  (cost=38874.73..38895.27 rows=8213 width=16)
(actual

Re: [PERFORM] SELECT's take a long time compared to other DBMS

2003-09-05 Thread Patrick Hatcher

Relaxin,
I can't remember during this thread if you said you were using ODBC or not.
If you are, then your problem is with the ODBC driver.  You will need to
check the Declare/Fetch box or you will definitely bring back the entire
recordset.  For small a small recordset this is not a problem, but the
larger the recordset the slower the data is return to the client.  I played
around with the cache size on the driver and found a value between 100 to
200 provided good results.

HTH
Patrick Hatcher




   
  
Relaxin [EMAIL PROTECTED]  

Sent by:   To: [EMAIL PROTECTED]   
   
[EMAIL PROTECTED]   cc:
   
gresql.org Subject: Re: [PERFORM] 
SELECT's take a long time compared to other DBMS   
   
  
   
  
09/04/2003 07:13 PM
  
   
  




Thank you Christopher.

 Change fsync to true (you want your data to survive, right?) and
 increase shared buffers to something that represents ~10% of your
 system memory, in blocks of 8K.

I turned it off just in the hope that things would run faster.

 None of this is likely to substantially change the result of that one
 query, however, and it seems quite likely that it is because
 PostgreSQL is honestly returning the whole result set of ~100K rows at
 once, whereas the other DBMSes are probably using cursors to return
 only the few rows of the result that you actually looked at.

Finally, someone who will actually assume/admit that it is returning the
entire result set to the client.
Where as other DBMS manage the records at the server.

I hope PG could fix/enhance this issue.

There are several issues that's stopping our company from going with PG
(with paid support, if available), but this seems to big the one at the top
of the list.

The next one is the handling of BLOBS.  PG handles them like no other
system
I have ever come across.

After that is a native Windows port, but we would deal cygwin (for a very
little while) if these other issues were handled.

Thanks





Christopher Browne [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 A long time ago, in a galaxy far, far away, Relaxin [EMAIL PROTECTED]
wrote:
  Have you changed any of the settings yet in postgresql.conf,
  specifically the shared_buffers setting?
 
  fsync = false
  tcpip_socket = true
  shared_buffers = 128

 Change fsync to true (you want your data to survive, right?) and
 increase shared buffers to something that represents ~10% of your
 system memory, in blocks of 8K.

 So, if you have 512MB of RAM, then the total blocks is 65536, and it
 would likely be reasonable to increase shared_buffers to 1/10 of that,
 or about 6500.

 What is the value of effective_cache_size?  That should probably be
 increased a whole lot, too.  If you are mainly just running the
 database on your system, then it would be reasonable to set it to most
 of memory, or
   (* 1/2 (/ (* 512 1024 1024) 8192))
 32768.

 None of this is likely to substantially change the result of that one
 query, however, and it seems quite likely that it is because
 PostgreSQL is honestly returning the whole result set of ~100K rows at
 once, whereas the other DBMSes are probably using cursors to return
 only the few rows of the result that you actually looked at.
 --
 cbbrowne,@,cbbrowne.com
 http://www3.sympatico.ca/cbbrowne/linuxdistributions.html
 Rules of  the Evil Overlord #14. The  hero is not entitled  to a last
 kiss, a last cigarette, or any other form of last request.
 http://www.eviloverlord.com/



---(end of broadcast)---
TIP 8: explain analyze is your friend




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly