Dan Harris wrote:

On Aug 10, 2005, at 12:49 AM, Steve Poe wrote:

Dan,

Do you mean you did RAID 1 + 0 (RAID 10) or RAID 0 + 1? Just a
clarification, since RAID 0 is still a single-point of failure even if
RAID1 is on top of RAID0.


Well, you tell me if I stated incorrectly. There are two raid enclosures with 7 drives in each. Each is on its own bus on a dual- channel controller. Each box has a stripe across its drives and the enclosures are mirrors of each other. I understand the controller could be a single point of failure, but I'm not sure I understand your concern about the RAID structure itself.

In this configuration, if you have a drive fail on both controllers, the entire RAID dies. Lets label them A1-7, B1-7, because you stripe within a set, if a single one of A dies, and a single one of B dies, you have lost your entire mirror.

The correct way of doing it, is to have A1 be a mirror of B1, and then stripe above that. Since you are using 2 7-disk enclosures, I'm not sure how you can do it well, since it is not an even number of disks. Though if you are using software RAID, there should be no problem.

The difference is that in this scenario, *all* of the A drives can die, and you haven't lost any data. The only thing you can't lose is a matched pair (eg losing both A1 and B1 will cause complete data loss)

I believe the correct notation for this last form is RAID 1 + 0 (RAID10) since you have a set of RAID1 drives, with a RAID0 on-top of them.



How many users are connected when your update / delete queries are
hanging? Have you done an analyze verbose on those queries?


Most of the traffic is from programs we run to do analysis of the data and managing changes. At the time I noticed it this morning, there were 10 connections open to the database. That rarely goes above 20 concurrent. As I said in my other response, I believe that the log will only contain the query at the point the query finishes, so if it never finishes...


Have you made changes to the postgresql.conf? kernel.vm settings? IO
scheduler?


I set shmmax appropriately for my shared_buffers setting, but that's the only kernel tweak.


If you're not doing so already, you may consider running sar  (iostat) to
monitor when the hanging occurs if their is a memory / IO bottleneck
somewhere.


I will try that.  Thanks


When you discover that an update is hanging, can you get into the database, and see what locks currently exist? (SELECT * FROM pg_locks)

That might help you figure out what is being locked and possibly preventing your updates.

It is also possible that your UPDATE query is trying to do something funny (someone just recently was talking about an UPDATE that wanted to do a hash join against 12M rows). Which probably meant that it had to spill to disk, where a merge join would have worked better.

John
=:->

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to