> Well, what problem exactly are you trying to solve?
> Having large tables itself isn't a problem, but it often
> tends to imply other things that might be problematic:
I'm trying to troubleshoot a very low cache hit rate as returned by this query:
SELECT sum(heap_blks_read) as heap_read,
sum(heap_blks_hit) as heap_hit,
sum(heap_blks_hit) / (sum(heap_blks_hit) +
sum(heap_blks_read)) as ratio
FROM pg_statio_user_tables;
So I think that's your fourth option:
> - blowing out your buffer cache with useless dirty pages - this is where you
> might want to look into separate servers.
So it sounds like I'm on the right track. But a separate
cluster/server seems like a drastic solution.
Thanks,
Paul
On Mon, Mar 11, 2013 at 12:17 PM, Ben Chobot <[email protected]> wrote:
>
> On Mar 9, 2013, at 11:54 AM, Paul Jungwirth wrote:
>
> Hello,
>
> I'm running a specialized search engine that indexes a few tens of millions
> of web pages, keeping everything in Postgres, and one problem I'm starting to
> see is poor cache hit rates. My database has two or three tables just for the
> text of the scraped pages, with one row every time a page is scraped and a
> `text` column for the HTML. These tables are almost-but-not-quite write-only.
> They are only read by one job, which uses them to create other objects in the
> system. I'd like the rest of the database to be in-memory all the time, but I
> don't really care if I have to read these tables from disk. To keep my
> problem tables from dragging down performance on the rest of the system, I'm
> thinking of splitting them out. I suppose I could save the HTML outside of
> Postgres entirely, but I'm wondering if a nicer solution would be to keep
> these tables in a separate "cluster" (i.e.
> /var/lib/postgresql/9.1/something_else -- I hope this is the right
> terminology). Then I could tune that cluster differently from the main
> cluster, or even put it on a different machine. And I could still use dblink
> to query both clusters conveniently (I think; this isn't really that
> important). Does this seem like a worthwhile approach? Is there a better way
> to deal with a few out-sized tables?
>
>
> Well, what problem exactly are you trying to solve? Having large tables
> itself isn't a problem, but it often tends to imply other things that might
> be problematic:
>
> - large vacuum or analyze times. (Probably fixable in your case with
> per-table autovacuum thresholds.)
> - large disk usage. (Probably fixable in your case with tablespaces.)
> - slow selects or updates. (Probably fixable in your case with partitioning.)
> - blowing out your buffer cache with useless dirty pages - this is where you
> might want to look into separate servers.
--
_________________________________
Pulchritudo splendor veritatis.
--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general