Re: [PERFORM] Are bitmap index scans slow to start?

Jeff Janes Wed, 27 Feb 2013 10:04:38 -0800

On Tue, Feb 26, 2013 at 4:33 PM, Carlo Stonebanks <
stonec.regis...@sympatico.ca> wrote:


> <<Is each of these write operations just covering a single row?  Does this
> description apply to just one of the many (how many?) databases, so that
> there are really 14*N concurrent sessions?****
>
> >>** **
>
> ** **
>
> All writes are single row. All DB’s have exactly the same structure, only
> the content is different. Currently the server is hosting five active DB’s
> – although there 14 DB’s actually on the host, the balance are backups and
> or testing environments.
>

I had thought you were saying that any one ETL procedure into one database
used 14 concurrent threads.  But really, each ETL procedure is
single-threaded, and there can be up to 5 (or theoretically up to 14) of
them running at a time into different databases?



> When a feed comes in, it can be anything from dozens to millions of rows,
> and may take minutes or days to run. I had asked that PG bouncer be
> installed in front of the host to act as a traffic cop. Try as I may to
> convince the engineering team that fewer sessions running faster is
> optimal, they say that the 14 concurrent sessions is based on real-world
> experience of what imports the fastest.
>


pgbouncer is more for making connections line up single-file when the line
is moving at a very fast clip, say 0.01 second per turn.  If I were trying
to make tasks that can each last for hours or days line up and take turns,
I don't think pgbouncer would be the way to go.



> ****
>
> ** **
>
> << You really need to know whether those reads and writes are
> concentrated in a small region (relative to the amount of your RAM), or
> widely scattered.  If you are reading and writing intensively (which you do
> seem to be doing) but only within a compact region, then it should not
> drive other data out of the cache.  But, since you do seem to have IO
> problems from cache misses, and you do have a high level of activity, the
> easy conclusion is that you have too little RAM to hold the working size of
> your data.
> >>****
>
> ** **
>
> It won’t be a problem of physical RAM, I believe there is at least 32GB of
> RAM. What constitutes “a compact region”?
>

If you have 14 actively going on simultaneously, I'd say a compact region
would then be about 512 MB.
(32GB/ 14 / margin of safety of 4).  Again, assuming that that is the
problem.


> The ETL process takes the feed and distributes it to 85 core tables.  I
> have been through many PG configuration cycles with the generous help of
> people in this forum. I think the big problem when getting help has been
> this issue of those offering assistance understanding that the whopping
> majority of the time, the system is performing single row reads and writes.
> The assumption tends to be that the end point of an ETL should just be a
> series of COPY statements, and it should all happen very quickly in classic
> SQL bulk queries.****
>

That is often a reasonable assumption, as ETL does end with L :)

Is the original query you posted part of the transform process, rather than
being the production query you run after the ETL is over?

If so, maybe you need a EL(S)TL process, were you first load the data to
staging table in bulk, and then transform it in bulk rather than one row at
a time.

Cheers,

Jeff

Re: [PERFORM] Are bitmap index scans slow to start?

Reply via email to