On Thu, Feb 21, 2013 at 8:57 AM, Carlo Stonebanks <
stonec.regis...@sympatico.ca> wrote:

> (Sorry moderators for any double posts, I keep making subscription errors.
> Hopefully this one gets through)
>
> Hi speed freaks,
>
> Can anyone tell me why the bitmap heap scan takes so long to start for this
> query? (SQL and EXPLAIN ANALYZE follows).
>

It is probably reading data from disk.  explain (analyze, buffers) would be
more helpful, especially if track_io_timing were also turned on.  In the
absence of that, my thoughts below are just best-guesses.


>
> The big culprit in this appears to be:
> ->  Bitmap Index Scan on log_2013_01_session_idx  (cost=0.00..63186.52
> rows=2947664 width=0) (actual time=32611.918..32611.918 rows=2772042
> loops=1)"
> Index Cond: (session_id = 27)"
>
> I can't see anything that occurs between actual time 0.0..32611.918 that
> this could be waiting on. Is it building the bitmap?
>

Yes.  More importantly, it is reading the index data needed in order to
build the bitmap.


>
> Running the query a second time yields this:
>
> ->  Bitmap Index Scan on log_2013_01_session_idx  (cost=0.00..63186.52
> rows=2947664 width=0) (actual time=2896.601..2896.601 rows=2772042
> loops=1)"
> Index Cond: (session_id = 27)"
>
> Does the bitmap then get cached?


No, the bitmap itself doesn't get cached.  But the data needed to construct
the bitmap does get cached.  It gets cached by the generic caching methods
of PG and the OS, not through something specific to bitmaps.


> These queries are built dynamically and
> called rarely, so their first-time performance is important.


What is going on during the interregnum?  Whatever it is, it seems to be
driving the log_2013_01_session_idx index out of the cache, but not the
log_2013_01 table.  (Or perhaps the table visit is getting the benefit of
effective_io_concurrency?)

Rebuilding the index might help, as it would put all the leaf pages holding
values for session_id=27 adjacent to each other, so they would read from
disk faster.  But with a name like "session_id", I don't know how long such
clustering would last though.

If I'm right about the index disk-read time, then switching to a plain
index scan rather than a bitmap index scan would make no difference--either
way the data has to come off the disk.




> I'd prefer a
> strategy that allowed fast performance the first time, rather than slow the
> first time and extremely fast subsequently.
>


So would PG, but it can't find such a strategy.  PG optimizes all top-level
queries in isolation, it never assumes you will execute the same query
repeatedly and build that assumption into the costing process.  (This is
not true of subqueries, where it does account for repeated executions in
the cost)

Cheers,

Jeff

Reply via email to