> Well, what problem exactly are you trying to solve? > Having large tables itself isn't a problem, but it often > tends to imply other things that might be problematic:
I'm trying to troubleshoot a very low cache hit rate as returned by this query: SELECT sum(heap_blks_read) as heap_read, sum(heap_blks_hit) as heap_hit, sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio FROM pg_statio_user_tables; So I think that's your fourth option: > - blowing out your buffer cache with useless dirty pages - this is where you > might want to look into separate servers. So it sounds like I'm on the right track. But a separate cluster/server seems like a drastic solution. Thanks, Paul On Mon, Mar 11, 2013 at 12:17 PM, Ben Chobot <be...@silentmedia.com> wrote: > > On Mar 9, 2013, at 11:54 AM, Paul Jungwirth wrote: > > Hello, > > I'm running a specialized search engine that indexes a few tens of millions > of web pages, keeping everything in Postgres, and one problem I'm starting to > see is poor cache hit rates. My database has two or three tables just for the > text of the scraped pages, with one row every time a page is scraped and a > `text` column for the HTML. These tables are almost-but-not-quite write-only. > They are only read by one job, which uses them to create other objects in the > system. I'd like the rest of the database to be in-memory all the time, but I > don't really care if I have to read these tables from disk. To keep my > problem tables from dragging down performance on the rest of the system, I'm > thinking of splitting them out. I suppose I could save the HTML outside of > Postgres entirely, but I'm wondering if a nicer solution would be to keep > these tables in a separate "cluster" (i.e. > /var/lib/postgresql/9.1/something_else -- I hope this is the right > terminology). Then I could tune that cluster differently from the main > cluster, or even put it on a different machine. And I could still use dblink > to query both clusters conveniently (I think; this isn't really that > important). Does this seem like a worthwhile approach? Is there a better way > to deal with a few out-sized tables? > > > Well, what problem exactly are you trying to solve? Having large tables > itself isn't a problem, but it often tends to imply other things that might > be problematic: > > - large vacuum or analyze times. (Probably fixable in your case with > per-table autovacuum thresholds.) > - large disk usage. (Probably fixable in your case with tablespaces.) > - slow selects or updates. (Probably fixable in your case with partitioning.) > - blowing out your buffer cache with useless dirty pages - this is where you > might want to look into separate servers. -- _________________________________ Pulchritudo splendor veritatis. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general