This got long, but I structured it so one can stop reading when one gets bored or mired in minutia, and constructively answer based on what was read so far.
So feel free to stop reading after only a couple paragraphs and fire off a response :-) "+++" marks good stopping-points for this cumulative post/response deal. I'll take any other random suggestions for how to drasticly improve full-text index/search performance, of course! I examined the PostgreSQL contrib/ directory solutions -- fulltextindex didn't help, and the other option required software/processes we're not able to maintain at this time. THANKS!!! ---------------------------------------- Under what real-world conditions does a read-only replication of a DB on a RAM disk make sense? Any quick tips for what NOT to do if I just want to run a quick crude benchmark trial sort of thing using PostgreSQL and PHP? +++ A Google for "PostgreSQL RAMdisk" turned up only one brief article that seemed geared toward a very small dataset for a hobbyist type site where performance was more about principle than real-world need. (Though it was a VERY nice article with very concise directions and great warnings about "what if") Perhaps I should have Googled for "MySQL RAMdisk" and hoped the parallels were accurate? I'll give it a shot, but am leery of invalid information from that cross-db assumption. Any comment on the likelihood of non-PostgreSQL info being applicable? Mailing List archives turned up some interesting discussions, but were generally not conclusive. +++ Or, to make this easier to answer, if somewhat less useful generally in the archive, do you predict it would significantly boost performance (and why or why not) for: Intel Pentium III, 1.4 G MHz RAM 1G PostgreSQL 7.3 currently *NEEDS* shared_buffers = 256M, has 512M as safety also set sort_mem = 16384 - not sure it helped/hurt also tried to turn query/perofrmance logging on, but never did find the darn log files to view anything useful (Help!) PHP 4.2.3 The database itself is sucking down 2.6 Gig of hard drive space: du -hc . 3.6M ./base/1 3.6M ./base/16975 4.0k ./base/16976/pgsql_tmp 2.3G ./base/16976 4.0k ./base/3087869/pgsql_tmp 120M ./base/3087869 2.5G ./base 140k ./global 113M ./pg_xlog 720k ./pg_clog 2.6G . 2.6G total +++ Obviously I cannot currently replicate that whole thing in RAM. However, a large chunk (1 Gig-ish) of that is a 12 million row table of (int4+int4+int4) == 96 byte records that I was hoping would allow a self-join of word position (concordance) which would be faster than the RegEx across 18000 articles of 0 to 270K in length. Another large chunk (750 Meg-ish) is a 10 million (int4+int4+smallint) = 80 byte records which are currently being used (and probably must remain) to cross index a dictionary with each article and its frequency in the article. This 750 Meg table took queries ranging from two minutes to an unknown time down to the 10 second magnitude. I doubt that the RAM disk is going to be enough boon to beat that, right? I believe that before I added these tables, we were looking at ~200 Meg of data, which makes a little more sense on a RAM disk. +++ An answer that involves paying for more RAM and making it 40X as fast would be fine, of course. So if I trade the 1.8 Gig concordance[s] for a RAM disk will the RAM disk give 4X to ??X performance improvement to "make up" for the concordance gains I've already achieved? +++ Big Picture: It's a custom search engine of texts from over 50 years' worth of a magazine using some specific meta-data to provide searching. Full-text index searching has been the biggest challenge. No surprise there -- But I had hoped for more definitive solutions "out there" in algorithms and such-like. There were off-the-shelf projects (eg ht://dig), but they didn't "fit in" with some of the meta-data and design constraints. I've managed to create a concordance of article/word/frequency that has sped up most queries from minutes to ~10 seconds. Alas, the support for "a NEAR b" is at risk, because the RegEx on 18000 rows of text from 0 to 270K takes too long. (We can drop NEAR if we have to, maybe, but would rather not.) Only now am I realizing I probably underestimated that 12 million number -- It probably is closer to 20 million or even 40 million. Hard to say for sure until the actual text is indexed or I run a random sampling at least. I don't even have real data in that table yet -- Just used semi-random numbers to populate it for benchmarking to see what sort of gains I might get. (Answer: 4X is great, but it's not enough.) And the concordance with full positional information (eg word "foo" appears in article 57 at offset 42) requires an [under-]estimated 12 million row self-join. A sample table of that magnitude clocks in at 30 seconds just for two words "a NEAR b" without even trying (yet) to correlate those with the actual articles via the dictionary/concordance lookup. And while 30 seconds sure beats two minutes, it's not good enough, particularly when I haven't even factored in the actual article lookup in that crude benchmark. I think I've just about exhausted the query-munging solutions and am now casting farther afield (EG this RAM disk idea) for solutions. As a long-time Open Source advocate, I am hoping to publish some of my experience, pending (A) successful resolution and (B) client->client approval. They've been pretty good on that sort of thing so far, but you never know. I'll post here if that works out. +++ Excruciating hardware detail: cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 11 model name : Intel(R) Pentium(R) III CPU family 1400MHz stepping : 1 cpu MHz : 1406.020 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 2804.94 cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 921235456 916316160 4919296 4612096 87429120 706134016 Swap: 2097143808 786432 2096357376 MemTotal: 899644 kB MemFree: 4804 kB MemShared: 4504 kB Buffers: 85380 kB Cached: 688816 kB SwapCached: 768 kB Active: 319436 kB Inact_dirty: 33096 kB Inact_clean: 426936 kB Inact_target: 229376 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 899644 kB LowFree: 4804 kB SwapTotal: 2047992 kB SwapFree: 2047224 kB free -m total used free shared buffers cached Mem: 878 872 5 4 83 670 -/+ buffers/cache: 119 759 Swap: 1999 0 1999 If you've read this far, you deserve a Cookie or something! :-) THANK YOU!!! -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php