[PERFORM] seq scan cache vs. index cache smackdown
Hi All, I have boiled my situation down to the following simple case: (postgres version 7.3) * Query 1 is doing a sequential scan over a table (courtesy of field ILIKE 'foo%') and index joins to a few others * Query 2 is doing a functional index scan over the same table (lower(field) LIKE 'foo%') and index joins to a few others * neither query has an order by clause * for the purpose of testing, both queries are designed to return the same result set Obviously Q2 is faster than Q1, but if I ever run them both at the same time (lets say I run two of Q1 and one of Q2 at the same time) then Q2 consistently returns WORSE times than Q1 (explain analyze confirms that it is using the index). My assumption is that the sequential scan is blowing the index from any cache it might live in, and simultaneously stealing all the disk IO that is needed to access the index on disk (the table has 200,000 rows). If I simplify the case to not do the index joins (ie. operate on the one table only) the situation is not as dramatic, but similar. My thoughts are: 1) kill the sequential scan - but unfortunately I don't have direct control over that code 2) change the way the server allocates/prioritizes different caches - i don't know enough about how postgres caches work to do this (if it's possible) 3) try it on postgres 7.4 - possible, but migrating the system to 7.4 in production will be hard because the above code that I am not responsible for has a lot of (slightly wacky) implicit date casts 4) ask the fine people on the mailing list for other suggestions! -- Mark Aufflick e [EMAIL PROTECTED] w www.pumptheory.com (work) w mark.aufflick.com (personal) p +61 438 700 647 f +61 2 9436 4737 iBurst Wireless Broadband from $34.95/month www.platformnetworks.net Forward undetected SPAM to: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Performance over a LAN
I don't think that's the advice being looked for here - if this behaviour is repeatable, then there is something askew with the inet protocol. What is the client application written in? Do you know what version of the postgres network protocol your driver code is using? Are the inserts inside a transaction? I'm very interested in this issue since the environment I now work in has a lot of network connected databases and the performance is much less than I am used to with local databases. Mark. -- Mark Aufflick e [EMAIL PROTECTED] w www.pumptheory.com (work) w mark.aufflick.com (personal) p +61 438 700 647 On 23/07/2004, at 4:02 PM, Christopher Kings-Lynne wrote: But with the server running on one machine and the client running on another, the two machines being connected by a 100 Mb ethernet, with nothing else on the network, this test takes 17 minutes to run. I have tried changing the frequency of COMMIT operations, but with only a small effect. Are you using separate INSERT statements? Try using COPY instead, it's much faster. chris ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly === = Pain free spam virus protection by: www.mailsecurity.net.au Forward undetected SPAM to: [EMAIL PROTECTED] === = Pain free spam virus protection by: www.mailsecurity.net.au Forward undetected SPAM to: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] hardware raid suggestions
Not sure what your hw platform is, but I always used to get fantastic performance from Compaq Smart Array battery backed cards. Note that I haven't bought any recently so HP may have hp invent-ed them... But whatever the brand - if you get a swag of battery backed cache you won't know yourself. It's fun to install an OS on them as well - watch the drive format and verify take 10 seconds ;) Another option to look at is outboard raid boxes that present a single drive interface to the server - I know people who swear by them. -- Mark Aufflick e [EMAIL PROTECTED] w www.pumptheory.com (work) w mark.aufflick.com (personal) p +61 438 700 647 On 16/07/2004, at 4:07 AM, Brian Hirt wrote: I've been using the adaptec ZCR raid cards in our servers for a while now, mostly small systems with 3 or 6 disks, and we've been very happy with them. However, we're building a new DB machine with 14 U320 15K SCA drives, and we've run into a performance bottlenkeck with the ZCR card where it just won't scale well. Without going into too many details, we've tested RAID5, RAID10 and RAID50 on pretty much every array size from 4-14 disks (raid 50 tests used more drives), using JFS, reiserfs and EXT3. With every different configuration, performance didn't improve after array size became greater than 6 disks.. We used various benchmarks, including pgbench with scale factors of 10, 100, 1000, 5000 and clients of 10, 15, 30 and 45. We've done many other tests and monitoring tools, and we've come to the conclusion that the ZCR is the problem. We're looking into getting an Adaptec 2200S or the Megaraid 320 2x which have better processors, and hopefully better performance. We feel that the use of the AIC7930 as the CPU on the ZCR just doesn't cut it and a faster raid controller would work better. Does anyone out there have any experience with these cards with postgresql and linux? If so, would you be willing to share your experiences and possibly give a recommendation? --brian ---(end of broadcast)--- TIP 8: explain analyze is your friend === = Pain free spam virus protection by: www.mailsecurity.net.au Forward undetected SPAM to: [EMAIL PROTECTED] === = Pain free spam virus protection by: www.mailsecurity.net.au Forward undetected SPAM to: [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] PREPAREing statements versus compiling PLPGSQL
Hi all, I am optimizing some code that does a lot of iterative selects and inserts within loops. Because of the exception handling limitations in postgres and with no ability to twiddle autocommit, just about every operation is standalone. over 5000 odd lines this gets very slow (5-10 minutes including processing). In seeking to speed it up I am PREPARing the most common inserts and selects. I have a few operations already inside plpgsql functions. EXECUTE means something different within a plpgsql funtion, so I am wondering if there is a way to execute a pre-prepared query inside a function. Or is this even necessary - are queries within plpgsql functions automatically prepared when the function is first compiled? On a similar note, is there any benefit in PREPAREing a select from a plpgsql function? Or does anyone have any smart ways to turn off autocommit? (I have already played with commit_delay and commit_siblings). My empirical testing has proven inconclusive (other than turning off fsync which makes a huge difference, but not possible on the live system, or using a fat copmaq raid card). Thanks for any help, Mark. -- Mark Aufflick e: [EMAIL PROTECTED] w: www.pumptheory.com (business) w: mark.aufflick.com (personal) p: +61 438 700 647 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])