Re: [PERFORM] PostgreSQL NetApp and NFS

2008-03-20 Thread Woody Woodring

  I need to know if anyone out there is/has run their PostgreSQL on NetApp
arrays via NFS.  My particular situation is RH Linux 4 servers running
Postgresql 8.1.  I need  
 to provide our Operations manager with specific reasons why we should not
run PostgreSQL over NetApp NFS.  Otherwise, they will go forward with this.

  If you have any real life good or bad stories, I'd love to hear it.
Given the NetApp arrays supposedly being very good NFS platforms, overall,
is this a recommended way  
 to run PostgreSQL, or is it recommended to not run this way. 
 
We have been running Postgres over NFS to a NetApp since 7.1 and we have
nothing but good things to say.  We have 75 databases in 3 clusters all
connected to one netapp.  We don't store a huge amount of data, currently
~43Gig, but it is constantly updated.  
 
We keep the pgsql/data directory on the netapp.  If one of our db servers
ever have a problem, we can just swap out the box, mount the drive and
restart postgres. 
 
We like our support we get from them, the only issue we ever have is having
a drives fail which they get replacements to us promptly.  Our NetApp has an
uptime currently over 2 years.
 
By the way, I though NetApp boxes came with an iSCSI license.  NetApp
downplayed the iSCSI with us because they said you cannot share drives
between servers, but for postgres you don't want that anyway.   It could
have also been that the NetApp is better tuned for NFS throughput and they
want to steer the user toward that.
 
If you want more specifics, feel free to ask. 
 
Woody
iGLASS Networks 
 


Re: [PERFORM] High load and iowait but no disk access

2005-08-30 Thread Woody Woodring



Have you tried a different kernel? We run with a 
netapp over NFS without any issues, but we have seen high IO-wait on other Dell 
boxes (running and not running postgres) and RHES 3. We have 
replaced a Dell PowerEdge 350 running RH 7.3 with a PE750 with more memory 
running RHES3 and it be bogged down with IO waits due to syslog messages writing 
to the disk, the old slower server could handle it fine. I don't know if 
it is a Dell thing or a RH kernel, but we try different kernels on our boxes to 
try to find one that works better. We have not found one that stands out 
over anotherconsistently but we have been moving away from Update 2 kernel 
(2.4.21-15.ELsmp) due to server lockup issues. Unfortunately we get the 
best disk throughput on our few remaining 7.3 boxes.

Woody

IGLASS Networks
www.iglass.net


From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Rémy 
BeaumontSent: Monday, August 29, 2005 9:43 AMTo: 
pgsql-performance@postgresql.orgSubject: [PERFORM] High load and 
iowait but no disk access
We have been trying to pinpoint what originally seem to be a I/O 
bottleneck but which now seems to be an issue with either Postgresql or RHES 
3.We have the following test environment on which we can reproduce the 
problem:1) Test System ADell 6650 Quad Xeon Pentium 48 Gig of 
RAMOS: RHES 3 update 2Storage: NetApp FAS270 connected using an FC card 
using 10 disks2) Test System BDell Dual Xeon Pentium III2 Gig o 
RAMOS: RHES 3 update 2Storage: NetApp FAS920 connected using an FC card 
using 28 disksOur Database size is around 30G. The behavior we 
see is that when running queries that do random reads on disk, IOWAIT goes over 
80% and actual disk IO falls to a crawl at a throughput bellow 3000kB/s (We 
usually average 4 kB/s to 8 kB/s on sequential read operations on the 
netapps)The stats of the NetApp do confirm that it is sitting idle. 
Doing an strace on the Postgresql process shows that is it doing seeks and 
reads.So my question is where is this iowait time spent ?Is there a 
way to pinpoint the problem in more details ?We are able to reproduce this 
behavior with Postgresql 7.4.8 and 8.0.3I have included the output of 
top,vmstat,strace and systat from the Netapp from System B while running a 
single query that generates this behavior.Rémytop 
output:06:27:28 up 5 days, 16:59, 6 users, load average: 1.04, 1.30, 
1.0172 processes: 71 sleeping, 1 running, 0 zombie, 0 stoppedCPU states: 
cpu user nice system irq softirq iowait idletotal 2.7% 0.0% 1.0% 0.1% 0.2% 
46.0% 49.5%cpu00 0.2% 0.0% 0.2% 0.0% 0.2% 2.2% 97.2%cpu01 5.3% 0.0% 1.9% 
0.3% 0.3% 89.8% 1.9%Mem: 2061696k av, 2043936k used, 17760k free, 0k shrd, 
3916k buff1566332k actv, 296648k in_d, 30504k in_cSwap: 16771584k av, 
21552k used, 16750032k free 1933772k cachedPID USER PRI NI SIZE RSS 
SHARE STAT %CPU %MEM TIME CPU COMMAND30960 postgres 15 0 13424 10M 9908 D 
2.7 0.5 2:00 1 postmaster30538 root 15 0 1080 764 524 S 0.7 0.0 0:43 0 
sshd1 root 15 0 496 456 436 S 0.0 0.0 0:08 0 init2 root RT 0 0 0 0 SW 
0.0 0.0 0:00 0 migration/03 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 
migration/14 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 keventd5 root 34 19 0 0 0 
SWN 0.0 0.0 0:00 0 ksoftirqd/06 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 
ksoftirqd/19 root 15 0 0 0 0 SW 0.0 0.0 0:24 1 bdflush7 root 15 0 0 0 0 
SW 0.0 0.0 6:53 1 kswapd8 root 15 0 0 0 0 SW 0.0 0.0 8:44 1 kscand10 
root 15 0 0 0 0 SW 0.0 0.0 0:13 0 kupdated11 root 25 0 0 0 0 SW 0.0 0.0 0:00 
0 mdrecoveryd17 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 ahc_dv_0vmstat 
output procs memory swap io system cpur b swpd free buff cache si so bi 
bo in cs us sy id wa0 1 21552 17796 4872 1931928 2 3 3 1 27 6 2 1 7 30 1 
21552 18044 4880 1931652 0 0 1652 0 397 512 1 2 50 470 1 21552 17976 4896 
1931664 0 0 2468 0 407 552 2 2 50 471 0 21552 17984 4896 1931608 0 0 2124 0 
418 538 3 3 48 460 1 21552 18028 4900 1931536 0 0 1592 0 385 509 1 3 50 
460 1 21552 18040 4916 1931488 0 0 1620 820 419 581 2 2 50 460 1 21552 
17968 4916 1931536 0 4 1708 4 402 554 3 1 50 461 1 21552 18052 4916 1931388 
0 0 1772 0 409 531 3 1 49 470 1 21552 17912 4924 1931492 0 0 1772 0 408 565 
3 1 48 480 1 21552 17932 4932 1931440 0 4 1356 4 391 545 5 0 49 460 1 
21552 18320 4944 1931016 0 4 1500 840 414 571 1 1 48 500 1 21552 17872 4944 
1931440 0 0 2116 0 392 496 1 5 46 480 1 21552 18060 4944 1931232 0 0 2232 0 
423 597 1 2 48 491 1 21552 17684 4944 1931584 0 0 1752 0 395 537 1 1 50 
480 1 21552 18000 4944 1931240 0 0 1576 0 401 549 0 1 50 
49NetApp stats:CPU NFS CIFS HTTP Total Net kB/s Disk kB/s Tape 
kB/s Cache Cache CP CP Disk DAFS FCP iSCSI FCP kB/sin out read write read 
write age hit time ty util in out2% 0 0 0 139 0 0 2788 0 0 0 3 96% 0% - 15% 
0 139 0 3 22772% 0 0 0 144 0 0 2504 0 0 0 3 96% 0% - 18% 0 144 0 3 
21502% 0 0 0 130 0 0 2212 0 0 0 3 96% 0% - 13% 0 130 0 3 18793% 0 0 0 
169 0 0 2937 80 0 0 3 96% 0% - 13% 0 169 0 4 27182% 0 0 0 139 0 0 2448 0 0 0 
3 96% 0% - 12% 0 139 0 3 20962% 0 0 0 137 0 0 2116 0 0 0 3 96% 0% - 10% 0 

Re: [PERFORM] Query Performance and IOWait

2004-11-18 Thread Woody Woodring
 
Andrew,

What version of Redhat are you running?   We have found running Enterprise
Update 3 kernel kills our Dell boxes with IOWait, both NFS and local disk
traffic.  Update 2 kernel does not seem to have the issue, and we are in the
process of trying Update 4 beta to see if it is better.

Woody

iGLASS Networks
www.iglass.net

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Andrew Janian
Sent: Thursday, November 18, 2004 9:02 AM
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Query Performance and IOWait

Actually, unfortunately, that won't work.  The subquery gets a list of
message IDs and then the outer query gets the responses to those messages.

Also, I dumped this data and imported it all to ms sql server and then ran
it there.  The query ran in 2s.

Andrew

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Sent: Thursday, November 18, 2004 7:57 AM
To: Andrew Janian
Cc: [EMAIL PROTECTED];
[EMAIL PROTECTED]
Subject: Re: [PERFORM] Query Performance and IOWait


Andrew,

It seems that you could combine the subquery's WHERE clause with the main
query's to produce a simpler query, i.e. one without a subquery.

Rick




 

  Andrew Janian

  [EMAIL PROTECTED]To:
[EMAIL PROTECTED]

  Sent by:   cc:

  [EMAIL PROTECTED]Subject:  [PERFORM]
Query Performance and IOWait
  tgresql.org

 

 

  11/18/2004 08:42 AM

 

 





Hello All,

I have a setup with a Dell Poweredge 2650 with Red Hat and Postgres 7.4.5
with a database with about 27GB of data.  The table in question has about
35 million rows.

I am running the following query:

SELECT *
FROM mb_fix_message
WHERE msg_client_order_id IN (
 SELECT msg_client_order_id
 FROM mb_fix_message
 WHERE msg_log_time = '2004-06-01'
 AND msg_log_time  '2004-06-01 13:30:00.000'
 AND msg_message_type IN ('D','G')
 AND mb_ord_type = '1'
 )
 AND msg_log_time  '2004-06-01'
 AND msg_log_time  '2004-06-01 23:59:59.999'
 AND msg_message_type = '8'
 AND (mb_raw_text LIKE '%39=1%' OR mb_raw_text LIKE '%39=2%');

with the following plan:

QUERY PLAN
Nested Loop IN Join  (cost=0.00..34047.29 rows=1 width=526)
  -  Index Scan using mfi_log_time on mb_fix_message  (cost=0.00..22231.31
rows=2539 width=526)
   Index Cond: ((msg_log_time  '2004-06-01 00:00:00'::timestamp without
time zone) AND (msg_log_time  '2004-06-01 23:59:59.999'::timestamp without
time zone))
   Filter: (((msg_message_type)::text = '8'::text) AND
(((mb_raw_text)::text ~~ '%39=1%'::text) OR ((mb_raw_text)::text ~~
'%39=2%'::text)))
  -  Index Scan using mfi_client_ordid on mb_fix_message
(cost=0.00..445.56 rows=1 width=18)
   Index Cond: ((outer.msg_client_order_id)::text =
(mb_fix_message.msg_client_order_id)::text)
   Filter: ((msg_log_time = '2004-06-01 00:00:00'::timestamp without
time zone) AND (msg_log_time  '2004-06-01 13:30:00'::timestamp without time
zone) AND ((msg_message_type)::text = 'D'::text) OR
((msg_message_type)::text = 'G'::text)) AND ((mb_ord_type)::text =
'1'::text))

While running, this query produces 100% iowait usage on its processor and
takes a ungodly amount of time (about an hour).

The postgres settings are as follows:

shared_buffers = 32768  # min 16, at least max_connections*2, 8KB
each
sort_mem = 262144   # min 64, size in KB

And the /etc/sysctl.conf has:
kernel.shmall = 274235392
kernel.shmmax = 274235392

The system has 4GB of RAM.

I am pretty sure of these settings, but only from my reading of the docs and
others' recommendations online.

Thanks,

Andrew Janian
OMS Development
Scottrade Financial Services
(314) 965-1555 x 1513
Cell: (314) 369-2083

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query Performance and IOWait

2004-11-18 Thread Woody Woodring
From our experience it is not just a postgres issue, but all IO with the
Update 3 kernel.

We have a box with Update 3 that queries a remote postgres database(Running
RH7.3, RH3 Update2) and writes to a file on an NFS server.   The update 3
box does half the work with 2-3 times the load as our update 1 and 2 boxes.
Looking at top the box is always above 90% IO Wait on the CPU.  When we
downgrade the kernel to Update 2 it seems to fix the issue.

We several Update 3 boxes that run postgres locally and they all struggle
compared to the Update 2 boxes

We have tried the Fedora Core 3 with not much more success and we are going
to try the Update 4 beta kernel next week to see if it is any better.

There are several threads on the Taroon mailing list discussing the issue.

Woody

-Original Message-
From: Josh Berkus [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 18, 2004 1:34 PM
To: [EMAIL PROTECTED]
Cc: Woody Woodring; 'Andrew Janian'
Subject: Re: [PERFORM] Query Performance and IOWait

Woody,

 What version of Redhat are you running?   We have found running 
 Enterprise Update 3 kernel kills our Dell boxes with IOWait, both NFS 
 and local disk traffic.  Update 2 kernel does not seem to have the 
 issue, and we are in the process of trying Update 4 beta to see if it is
better.

This is interesting; do you have more to say about it?   I've been having
some 
mysterious issues with RHES that I've not been able to pin down.

--
Josh Berkus
Aglio Database Solutions
San Francisco



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Help with query plan inconsistencies

2004-03-23 Thread Woody Woodring
Hello,

I am using postgres 7.4.2 as a backend for geocode data for a mapping
application.  My question is why can't I get a consistent use of my indexes
during a query, I tend to get a lot of seq scan results.

I use a standard query:

SELECT lat, long, mac, status FROM (
   SELECT text(mac) as mac, lat long, CASE status WHEN 0 THEN 0 WHEN 1 THEN
1 ELSE -1 END 
  as status FROM cable_billing LEFT OUTER JOIN davic USING(mac) WHERE
boxtype='d'
)AS FOO WHERE (long=X1) AND (long=X2) AND (lat=Y1) AND (lat=Y2)

Where X1,X2,Y1,Y2 are the coordinates for the rectangle of the map viewing
area.

QUERY PLAN #1  #2 are from when I get a view from 10 miles out, sometimes
it uses the index(#1) and most of the time not(#2).  I do run into plans
that seq scan both sides of the join.

QUERY PLAN #3 is when I view from 5 miles out, and I have much greater
chance of getting index scans ( about 90% of the time).

I have listed information about the database below.

Cable_billing ~500,000 rows updated once per day
Davic  ~500,000 rows, about 100 rows update per minute

Any info or suggestions would be appreciated.

Woody


twc-ral-overview=# \d cable_billing;
 Table public.cable_billing
 Column  |  Type  | Modifiers 
-++---
 cable_billingid | integer| not null
 mac | macaddr| not null
 account | integer| 
 number  | character varying(10)  | 
 address | character varying(200) | 
 region  | character varying(30)  | 
 division| integer| 
 franchise   | integer| 
 node| character varying(10)  | 
 lat | numeric| 
 long| numeric| 
 trunk   | character varying(5)   | 
 ps  | character varying(5)   | 
 fd  | character varying(5)   | 
 le  | character varying(5)   | 
 update  | integer| 
 boxtype | character(1)   | 
Indexes: cable_billing_pkey primary key btree (mac),
 cable_billing_account_index btree (account),
 cable_billing_lat_long_idx btree (lat, long),
 cable_billing_node_index btree (node),
 cable_billing_region_index btree (region)

twc-ral-overview=# \d davic
Table public.davic
 Column  | Type  | Modifiers 
-+---+---
 davicid | integer   | not null
 mac | macaddr   | not null
 source  | character varying(20) | 
 status  | smallint  | 
 updtime | integer   | 
 type| character varying(10) | 
 avail1  | integer   | 
Indexes: davic_pkey primary key btree (mac)



twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long=-78.70723462816063) AND
(long=-78.53096764204116) AND (lat=35.5741118787) AND
(lat=35.66366331376857);
QUERY PLAN #1



-
 Nested Loop Left Join  (cost=0.00..23433.18 rows=1871 width=34) (actual
time=0.555..5095.434 rows=3224 loops=1)
   -  Index Scan using cable_billing_lat_long_idx on cable_billing
(cost=0.00..12145.85 rows=1871 width=32) (actual time=0.431..249.931
rows=3224 loops=1)
 Index Cond: ((lat = 35.5741118787) AND (lat =
35.66366331376857) AND (long = -78.70723462816063) AND (long =
-78.53096764204116))
 Filter: (boxtype = 'd'::bpchar)
   -  Index Scan using davic_pkey on davic  (cost=0.00..6.01 rows=1
width=8) (actual time=1.476..1.480 rows=1 loops=3224)
 Index Cond: (outer.mac = davic.mac)
 Total runtime: 5100.028 ms
(7 rows)



twc-ral-overview=# vacuum analyze;
VACUUM
twc-ral-overview=# explain analyze SELECT lat, long, mac, status FROM
(SELECT text(mac) as mac, lat, long, CASE status WHEN 0 THEN 0 WHEN 1 THEN 1
ELSE -1 END as status FROM cable_billing LEFT OUTER JOIN davic USING(mac)
WHERE boxtype='d') AS foo  WHERE (long=-78.87878592206046) AND
(long=-78.70220280717479) AND (lat=35.71703190638861) AND
(lat=35.80658335998006);
QUERY PLAN #2



---
 Nested Loop Left Join  (cost=0.00..76468.90 rows=9223 width=34) (actual
time=0.559..17387.427 rows=19997 loops=1)
   -  Seq Scan on cable_billing  (cost=0.00..20837.76 rows=9223 width=32)
(actual time=0.290..7117.799 rows=19997 loops=1)
 Filter: ((boxtype = 'd'::bpchar) AND (long = -78.87878592206046)
AND (long =