Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-13 Thread Robert Haas
On Thu, Jan 9, 2014 at 10:57 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I wrote:
 Robert Haas wrote:
  Hmm, fair point.  But I'm still not convinced that we really need to
  add extra accounting for this.  What's wrong with just reporting the
  number of exact and lossy pages?

 No.  I intended to show the desired memory space for a TIDBitmap rather
 than the peak memory usage for that TIDBitmap.  And I thought it'd be
 better
 for the latter to be displayed as additional information.  However, I've
 removed the functionality for showing the desired memory space due to
 technical problems.  Now I should probably remove the functionality for
 showing the peak memory usage too.

 Yes, as Andres mentioned, showing the peak memory usage is not a bad idea,
 I think.  But I start to think it's not necessarily worth complicating the
 code ...

 If there are no objections of others, I'll remove extra accounting for
 showing the peak memory usage.

 Done.  Please find attached a patch.

Looks good to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-09 Thread Etsuro Fujita
I wrote:
 Robert Haas wrote:
  Hmm, fair point.  But I'm still not convinced that we really need to
  add extra accounting for this.  What's wrong with just reporting the
  number of exact and lossy pages?

 No.  I intended to show the desired memory space for a TIDBitmap rather
 than the peak memory usage for that TIDBitmap.  And I thought it'd be
better
 for the latter to be displayed as additional information.  However, I've
 removed the functionality for showing the desired memory space due to
 technical problems.  Now I should probably remove the functionality for
 showing the peak memory usage too.

 Yes, as Andres mentioned, showing the peak memory usage is not a bad idea,
 I think.  But I start to think it's not necessarily worth complicating the
 code ...

 If there are no objections of others, I'll remove extra accounting for
 showing the peak memory usage.

Done.  Please find attached a patch.

Thanks,

Best regards,
Etsuro Fujita


explain-bitmapscan-20140110.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-07 Thread Robert Haas
On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Robert Haas wrote:
 I spent some time looking at this tonight.  I don't think the value that
 is displayed for the bitmap memory tracking will be accurate in complex
 cases.  The bitmap heap scan may sit on top of one or more bitmap-and or
 bitmap-or nodes.  When a bitmap-and operation happens, one of the two
 bitmaps being combined will be thrown out and the number of entries in the
 other map will, perhaps, be decreased.  The peak memory usage for the
 surviving bitmap will be reflected in the number displayed for the bitmap
 heap scan, but the peak memory usage for the discarded bitmap will not.
 This is wholly arbitrary because both bitmaps existed at the same time,
 side by side, and which one we keep and which one we throw out is
 essentially
 random.

 Thank you for taking time to look at this patch.  The peak memory usage for
 the discarded bitmap *can* be reflected in the number displayed for the
 bitmap heap scan by the following code in tbm_union() or tbm_intersect():

   tbm_union(TIDBitmap *a, const TIDBitmap *b)
   {
 Assert(!a-iterating);
 +   if (a-nentriesPeak  b-nentriesPeak)
 +   a-nentriesPeak = b-nentriesPeak;
 /* Nothing to do if b is empty */
 if (b-nentries == 0)
 return;
 ***

   tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
   {
 Assert(!a-iterating);
 +   if (a-nentriesPeak  b-nentriesPeak)
 +   a-nentriesPeak = b-nentriesPeak;
 /* Nothing to do if a is empty */
 if (a-nentries == 0)
 return;
 ***

 Sorry for the delay.

Hmm, fair point.  But I'm still not convinced that we really need to
add extra accounting for this.  What's wrong with just reporting the
number of exact and lossy pages?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-07 Thread Etsuro Fujita
Robert Haas wrote:
 On Mon, Jan 6, 2014 at 9:40 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp
 wrote:
  Thank you for taking time to look at this patch.  The peak memory
  usage for the discarded bitmap *can* be reflected in the number
  displayed for the bitmap heap scan by the following code in tbm_union()
 or tbm_intersect():

 Hmm, fair point.  But I'm still not convinced that we really need to add
 extra accounting for this.  What's wrong with just reporting the number
 of exact and lossy pages?

No.  I intended to show the desired memory space for a TIDBitmap rather than
the peak memory usage for that TIDBitmap.  And I thought it'd be better for
the latter to be displayed as additional information.  However, I've removed
the functionality for showing the desired memory space due to technical
problems.  Now I should probably remove the functionality for showing the
peak memory usage too.

Yes, as Andres mentioned, showing the peak memory usage is not a bad idea, I
think.  But I start to think it's not necessarily worth complicating the
code ...

If there are no objections of others, I'll remove extra accounting for
showing the peak memory usage.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-06 Thread Etsuro Fujita
Robert Haas wrote:
 I spent some time looking at this tonight.  I don't think the value that
 is displayed for the bitmap memory tracking will be accurate in complex
 cases.  The bitmap heap scan may sit on top of one or more bitmap-and or
 bitmap-or nodes.  When a bitmap-and operation happens, one of the two
 bitmaps being combined will be thrown out and the number of entries in the
 other map will, perhaps, be decreased.  The peak memory usage for the
 surviving bitmap will be reflected in the number displayed for the bitmap
 heap scan, but the peak memory usage for the discarded bitmap will not.
 This is wholly arbitrary because both bitmaps existed at the same time,
 side by side, and which one we keep and which one we throw out is
essentially
 random.

Thank you for taking time to look at this patch.  The peak memory usage for
the discarded bitmap *can* be reflected in the number displayed for the
bitmap heap scan by the following code in tbm_union() or tbm_intersect():

  tbm_union(TIDBitmap *a, const TIDBitmap *b)
  {
Assert(!a-iterating);
+   if (a-nentriesPeak  b-nentriesPeak)
+   a-nentriesPeak = b-nentriesPeak;
/* Nothing to do if b is empty */
if (b-nentries == 0)
return;
***

  tbm_intersect(TIDBitmap *a, const TIDBitmap *b)
  {
Assert(!a-iterating);
+   if (a-nentriesPeak  b-nentriesPeak)
+   a-nentriesPeak = b-nentriesPeak;
/* Nothing to do if a is empty */
if (a-nentries == 0)
return;
***

Sorry for the delay.

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-02 Thread Andres Freund
On 2014-01-01 21:15:46 -0500, Robert Haas wrote:
 [ sensible reasoning ] However, I'm not sure it's really worth it.
 I think what people really care about is knowing whether the bitmap
 lossified or not, and generally how much got lossified.  The counts of
 exact and lossy pages are sufficient for that, without anything
 additional

Showing the amount of memory currently required could tell you how soon
accurate bitmap scans will turn into lossy scans though. Which is not a
bad thing to know, some kinds of scans (e.g. tsearch over expression
indexes, postgis) can get ridiculously slow once lossy.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-02 Thread Robert Haas
On Thu, Jan 2, 2014 at 4:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-01-01 21:15:46 -0500, Robert Haas wrote:
 [ sensible reasoning ] However, I'm not sure it's really worth it.
 I think what people really care about is knowing whether the bitmap
 lossified or not, and generally how much got lossified.  The counts of
 exact and lossy pages are sufficient for that, without anything
 additional

 Showing the amount of memory currently required could tell you how soon
 accurate bitmap scans will turn into lossy scans though. Which is not a
 bad thing to know, some kinds of scans (e.g. tsearch over expression
 indexes, postgis) can get ridiculously slow once lossy.

Hmm, interesting.  I have not encountered that myself.  If we want
that, I'm tempted to think that we should display statistics for each
bitmap index scan - but I'd be somewhat inclined to see if we could
get by with the values that are already stored in a TIDBitmap rather
than adding new ones - e.g. show npages (the number of exact entries),
nchunks (the number of lossy entries), and maxentries.  From that, you
can work out the percentage of available entries that were actually
used.  The only thing that's a bit annoying about that is that we'd
probably have to copy those values out of the tid bitmap and into an
executor state node, because the tid bitmap will subsequently get
modified destructively.  But I think that's probably OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2014-01-01 Thread Robert Haas
On Fri, Dec 27, 2013 at 1:47 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I wrote:
  Robert Haas wrote:
   I'd be wary of showing a desired value unless it's highly likely to
   be accurate.

  The desired value is accurately estimated based on (a) the total
  number of exact/lossy pages stored in the TIDBitmap and (b) the
  following equation in tbm_create(), except for the GIN case where
  lossy pages are added to the TIDBitmap by tbm_add_page().

 I've found there is another risk of overestimating the desired memory space
 for a BitmapAnded TIDBitmap.  I'm inclined to get rid of the estimation
 functionality from the patch completely, and leave it for future work.
 Attached is a new version of the patch, which shows only fetch block
 information and memory usage information.  I'll add this to the upcoming CF.

I spent some time looking at this tonight.  I don't think the value
that is displayed for the bitmap memory tracking will be accurate in
complex cases.  The bitmap heap scan may sit on top of one or more
bitmap-and or bitmap-or nodes.  When a bitmap-and operation happens,
one of the two bitmaps being combined will be thrown out and the
number of entries in the other map will, perhaps, be decreased.  The
peak memory usage for the surviving bitmap will be reflected in the
number displayed for the bitmap heap scan, but the peak memory usage
for the discarded bitmap will not.  This is wholly arbitrary because
both bitmaps existed at the same time, side by side, and which one we
keep and which one we throw out is essentially random.

I think we could report the results in a more principled way if we
reported the value for each bitmap *index* scan node rather than each
bitmap *heap* scan node.  However, I'm not sure it's really worth it.
I think what people really care about is knowing whether the bitmap
lossified or not, and generally how much got lossified.  The counts of
exact and lossy pages are sufficient for that, without anything
additional - so I'm inclined to think that the best course of action
might be to remove from the patch everything that's concerned with
trying to measure memory usage and just keep the exact/lossy page
counts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-12-26 Thread Etsuro Fujita
 I wrote:
  Robert Haas wrote:
   I'd be wary of showing a desired value unless it's highly likely to
   be accurate.

  The desired value is accurately estimated based on (a) the total
  number of exact/lossy pages stored in the TIDBitmap and (b) the
  following equation in tbm_create(), except for the GIN case where
  lossy pages are added to the TIDBitmap by tbm_add_page().

I've found there is another risk of overestimating the desired memory space
for a BitmapAnded TIDBitmap.  I'm inclined to get rid of the estimation
functionality from the patch completely, and leave it for future work.
Attached is a new version of the patch, which shows only fetch block
information and memory usage information.  I'll add this to the upcoming CF.

Thanks,

Best regards,
Etsuro Fujita


explain-bitmapscan-20131227.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-12-13 Thread Etsuro Fujita
I wrote:
 Robert Haas wrote:
  I'd be wary of showing a desired value unless it's highly likely to be
  accurate.

 The desired value is accurately estimated based on (a) the total number
 of exact/lossy pages stored in the TIDBitmap and (b) the following
equation
 in tbm_create(), except for the GIN case where lossy pages are added to
 the TIDBitmap by tbm_add_page().

 /*
  * Estimate number of hashtable entries we can have within maxbytes.
...
  */
 nbuckets = maxbytes /
 (MAXALIGN(sizeof(HASHELEMENT)) +
 MAXALIGN(sizeof(PagetableEntry))
  + sizeof(Pointer) + sizeof(Pointer));

 In the GIN case, however, the version under development has a risk of the
 overestimation.  (And in that case, in my understanding, we can't
guarantee
 non-lossy storage of the TIDBitmap any more.)  So, for that case, I think
 to change the message for the desired value a bit.  I'll submit the patch
 later.

On second thoughts, I've modified the patch so that the EXPLAIN ANALYZE
command shows not only the desired value but the total number of exact/lossy
heap blocks that have been fetched in query execution because ISTM the
latter is also useful for tuning work_mem, when an available memory capacity
is not so large as the desired value, or when non-lossy storage of the
TIDBitmap can't be guaranteed as mentioned above.  Here is an example.
Attached is an updated version of the patch, though a sufficient test hasn't
been performed.

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
0.02 ;
QUERY PLAN

---
 Bitmap Heap Scan on demo  (cost=2072.10..100674.45 rows=97528 width=42)
(actual time=27.387..1677.511 rows=99833 loops=1)
   Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double
precision))
   Rows Removed by Index Recheck: 5581690
   Heap Blocks: exact=8585 lossy=52980
   Bitmap Memory Usage: 1025kB (4810kB desired)
   -  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2047.71 rows=97528
width=0) (actual time=25.884..25.884 rows=99833 loops=1)
 Index Cond: ((col2 = 0.01::double precision) AND (col2 =
0.02::double precision))
 Total runtime: 1687.047 ms
(8 rows)

Thanks,

Best regards,
Etsuro Fujita


explain-bitmapscan-20131213.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-12-08 Thread Etsuro Fujita
Robert Haas wrote:
 On Fri, Dec 6, 2013 at 5:02 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp
 wrote:
  Though at first I agreed on this, while working on this I start to
  think information about (2) is enough for tuning work_mem.  Here are
  examples using a version under development, where Bitmap Memory
  Usage means (peak) memory space used by a TIDBitmap, and Desired
  means the memory required to guarantee non-lossy storage of a TID set,
  which is shown only when the TIDBitmap has been lossified.  (work_mem
  = 1MB.)

 I'd be wary of showing a desired value unless it's highly likely to be
 accurate.

Thank you for the comments!

The desired value is accurately estimated based on (a) the total number of
exact/lossy pages stored in the TIDBitmap and (b) the following equation in
tbm_create(), except for the GIN case where lossy pages are added to the
TIDBitmap by tbm_add_page().

/*
 * Estimate number of hashtable entries we can have within maxbytes. ...
 */
nbuckets = maxbytes /
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
 + sizeof(Pointer) + sizeof(Pointer));

In the GIN case, however, the version under development has a risk of the
overestimation.  (And in that case, in my understanding, we can't guarantee
non-lossy storage of the TIDBitmap any more.)  So, for that case, I think to
change the message for the desired value a bit.  I'll submit the patch
later.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-12-07 Thread Robert Haas
On Fri, Dec 6, 2013 at 5:02 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Though at first I agreed on this, while working on this I start to think 
 information about (2) is enough for tuning work_mem.  Here are examples using 
 a version under development, where Bitmap Memory Usage means (peak) memory 
 space used by a TIDBitmap, and Desired means the memory required to 
 guarantee non-lossy storage of a TID set, which is shown only when the 
 TIDBitmap has been lossified.  (work_mem = 1MB.)

I'd be wary of showing a desired value unless it's highly likely to be accurate.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-12-06 Thread Etsuro Fujita
I wrote:
 Amit Khandekar wrote:
  Yes, I agree that rather than looking at the bitmap heap scan to track
  the number of pages, we should look somewhere in the underlying index
  scan. Yes, we should get a constant number of index pages regardless
  of the actual parent table rows.

 I agree with you.  I'll modify the patch to show 1) the number of the
 exact/lossy pages in a TIDBitmap by examining the underlying index scan,
 not the number of these pages that have been fetched in the bitmap heap
 scan, and 2) the memory requirement.

Though at first I agreed on this, while working on this I start to think 
information about (2) is enough for tuning work_mem.  Here are examples using a 
version under development, where Bitmap Memory Usage means (peak) memory 
space used by a TIDBitmap, and Desired means the memory required to guarantee 
non-lossy storage of a TID set, which is shown only when the TIDBitmap has been 
lossified.  (work_mem = 1MB.)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.0001 and 
0.0005 ;
 QUERY PLAN
-
 Bitmap Heap Scan on demo  (cost=77.14..12142.69 rows=3581 width=42) (actual 
time=1.748..53.203 rows=4112 loops=1)
   Recheck Cond: ((col2 = 0.0001::double precision) AND (col2 = 
0.0005::double precision))
   Bitmap Memory Usage: 315kB
   -  Bitmap Index Scan on demo_col2_idx  (cost=0.00..76.25 rows=3581 width=0) 
(actual time=1.113..1.113 rows=4112 loops=1)
 Index Cond: ((col2 = 0.0001::double precision) AND (col2 = 
0.0005::double precision))
 Total runtime: 53.804 ms
(6 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.05 ;
 QUERY PLAN
-
 Bitmap Heap Scan on demo  (cost=8307.41..107635.14 rows=391315 width=42) 
(actual time=84.818..2709.015 rows=400172 loops=1)
   Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.05::double 
precision))
   Rows Removed by Index Recheck: 8815752
   Bitmap Memory Usage: 1025kB (desired 20573kB)
   -  Bitmap Index Scan on demo_col2_idx  (cost=0.00..8209.58 rows=391315 
width=0) (actual time=83.664..83.664 rows=400172 loops=1)
 Index Cond: ((col2 = 0.01::double precision) AND (col2 = 
0.05::double precision))
 Total runtime: 2747.088 ms
(7 rows)

We should look at (1) as well?  (Honestly, I don't know what to show about (1) 
when using a bitmap scan on the inside of a nestloop join.  For memory usage 
and desired memory I think the maximum values would be fine.)  I re-wish to 
know your opinion.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-11-26 Thread Etsuro Fujita
Amit Khandekar wrote:
 On 25 November 2013 13:37, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:

 So, my question is, we should show the number of exact/lossy pages in a 
 TIDBitmap, not the number of these pages that has been fetched in the bitmap 
 heap scan?

 Yes, I agree that rather than looking at the bitmap heap scan to track the 
 number of pages, we should look somewhere in the underlying index scan. Yes, 
 we should get a constant number of index pages regardless of the actual 
 parent table rows. I can see that btgetbitmap() adds all the tuples into the 
 bitmap, so somewhere below under btgetbitmap() might be the right place to 
 track.  Somewhere in tbm_create_pagetable(), but not sure.

Thank you for the comment!

I agree with you.  I'll modify the patch to show 1) the number of the 
exact/lossy pages in a TIDBitmap by examining the underlying index scan, not 
the number of these pages that have been fetched in the bitmap heap scan, and 
2) the memory requirement.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-11-25 Thread Etsuro Fujita
From: Amit Khandekar [mailto:amit.khande...@enterprisedb.com]
 On 1 November 2013 16:32, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:
 From: Fujii Masao [mailto:masao.fu...@gmail.com]

 I'm not sure if it's good idea to show the number of the fetches because it
 seems difficult to tune work_mem from that number. How can we calculate how
 much to increase work_mem to avoid lossy bitmap from the number of the 
 fetches
 in EXPLAIN output?

 We can calculate that from the following equation in tbm_create():

   nbuckets = maxbytes /
 (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
 + sizeof(Pointer) + sizeof(Pointer)),

 where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
 designated by work_mem, and nbuckets is the estimated number of hashtable
 entries we can have within maxbytes.  From this, the size of work_mem within
 which we can have every hashtable entry as an exact bitmap is calculated as
 follows:

   work_mem = (the number of exact pages + the number of lossy pages) *
 (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
 + sizeof(Pointer) + sizeof(Pointer)) /
 (1024 * 1024).

 I am yet to give more thought on the above formula 
 (particularly exact_pages + lossy_pages), but  I was also wondering if the 
 user 
 would indeed be able to figure out the above way to estimate the memory, or 
 the 
 explain itself should show the estimated memory  required for the bitmap. For 
 hash joins we do show the memory taken by the hash table in show_hash_info(). 
 We 
 can show the memory requirement in addition to the number of exact/lossy 
 pages. 

Thank you for the review!

Reconsidering that, I wish to know your opinion.  The patch shows the number of 
exact/lossy pages that has been fetched in a bitmap heap scan.  But the number 
varies with the fraction of tuples to be retrieved like the following.

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
 QUERY PLAN

 Bitmap Heap Scan on demo  (cost=2187.35..101419.96 rows=102919 width=42) 
(actual time=23.684..1302.382 rows=99803 loops=1)
   Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double 
precision))
   Rows Removed by Index Recheck: 6279502
   Heap Blocks: exact=1990 lossy=59593
   -  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 rows=102919 
width=0) (actual time=23.330..23.330 rows=99803 loops=1)
 Index Cond: ((col2 = 0.01::double precision) AND (col2 = 
0.02::double precision))
 Total runtime: 1311.949 ms
(7 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02 
LIMIT 5000;
QUERY PLAN
--
 Limit  (cost=2187.35..7008.26 rows=5000 width=42) (actual time=23.543..86.093 
rows=5000 loops=1)
   -  Bitmap Heap Scan on demo  (cost=2187.35..101419.96 rows=102919 width=42) 
(actual time=23.542..85.196 rows=5000 loops=1)
 Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 
0.02::double precision))
 Rows Removed by Index Recheck: 312179
 Heap Blocks: exact=99 lossy=2963
 -  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 
rows=102919 width=0) (actual time=23.189..23.189 rows=99803 loops=1)
   Index Cond: ((col2 = 0.01::double precision) AND (col2 = 
0.02::double precision))
 Total runtime: 86.626 ms
(8 rows)

So, my question is, we should show the number of exact/lossy pages in a 
TIDBitmap, not the number of these pages that has been fetched in the bitmap 
heap scan?

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-11-25 Thread Amit Khandekar
On 25 November 2013 13:37, Etsuro Fujita fujita.ets...@lab.ntt.co.jpwrote:


 Reconsidering that, I wish to know your opinion.  The patch shows the
 number of exact/lossy pages that has been fetched in a bitmap heap scan.
  But the number varies with the fraction of tuples to be retrieved like the
 following.

 postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
 0.02;
  QUERY PLAN

 
  Bitmap Heap Scan on demo  (cost=2187.35..101419.96 rows=102919 width=42)
 (actual time=23.684..1302.382 rows=99803 loops=1)
Recheck Cond: ((col2 = 0.01::double precision) AND (col2 =
 0.02::double precision))
Rows Removed by Index Recheck: 6279502
Heap Blocks: exact=1990 lossy=59593
-  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62 rows=102919
 width=0) (actual time=23.330..23.330 rows=99803 loops=1)
  Index Cond: ((col2 = 0.01::double precision) AND (col2 =
 0.02::double precision))
  Total runtime: 1311.949 ms
 (7 rows)

 postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
 0.02 LIMIT 5000;
 QUERY PLAN

 --
  Limit  (cost=2187.35..7008.26 rows=5000 width=42) (actual
 time=23.543..86.093 rows=5000 loops=1)
-  Bitmap Heap Scan on demo  (cost=2187.35..101419.96 rows=102919
 width=42) (actual time=23.542..85.196 rows=5000 loops=1)
  Recheck Cond: ((col2 = 0.01::double precision) AND (col2 =
 0.02::double precision))
  Rows Removed by Index Recheck: 312179
  Heap Blocks: exact=99 lossy=2963
  -  Bitmap Index Scan on demo_col2_idx  (cost=0.00..2161.62
 rows=102919 width=0) (actual time=23.189..23.189 rows=99803 loops=1)
Index Cond: ((col2 = 0.01::double precision) AND (col2 =
 0.02::double precision))
  Total runtime: 86.626 ms
 (8 rows)

 So, my question is, we should show the number of exact/lossy pages in a
 TIDBitmap, not the number of these pages that has been fetched in the
 bitmap heap scan?


Yes, I agree that rather than looking at the bitmap heap scan to track the
number of pages, we should look somewhere in the underlying index scan.
Yes, we should get a constant number of index pages regardless of the
actual parent table rows. I can see that btgetbitmap() adds all the tuples
into the bitmap, so somewhere below under btgetbitmap() might be the right
place to track.  Somewhere in tbm_create_pagetable(), but not sure.


 Thanks,

 Best regards,
 Etsuro Fujita




Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-11-24 Thread Amit Khandekar
On 1 November 2013 16:32, Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:

  From: Fujii Masao [mailto:masao.fu...@gmail.com]

  This is what I'm looking for! This feature is really useful for tuning
 work_mem
  when using full text search with pg_trgm.
 
  I'm not sure if it's good idea to show the number of the fetches because
 it
  seems difficult to tune work_mem from that number. How can we calculate
 how
  much to increase work_mem to avoid lossy bitmap from the number of the
 fetches
  in EXPLAIN output?

 We can calculate that from the following equation in tbm_create():

   nbuckets = maxbytes /
 (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
 + sizeof(Pointer) + sizeof(Pointer)),

 where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
 designated by work_mem, and nbuckets is the estimated number of hashtable
 entries we can have within maxbytes.  From this, the size of work_mem
 within
 which we can have every hashtable entry as an exact bitmap is calculated as
 follows:

   work_mem = (the number of exact pages + the number of lossy pages) *
 (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
 + sizeof(Pointer) + sizeof(Pointer)) /
 (1024 * 1024).


I am yet to give more thought on the above formula (particularly
exact_pages + lossy_pages), but  I was also wondering if the user would
indeed be able to figure out the above way to estimate the memory, or the
explain itself should show the estimated memory  required for the bitmap.
For hash joins we do show the memory taken by the hash table in
show_hash_info(). We can show the memory requirement in addition to the
number of exact/lossy pages.


 I'll show you an example.  The following is the result for work_mem = 1MB:

   postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01
 and
  0.02;
  QUERY PLAN
   --
   --
   
Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766
   width=34) (actual
   time=24.907..1119.961 rows=100047 loops=1)
  Recheck Cond: ((col2 = 0.01::double precision) AND (col2 =
   0.02::double
   precision))
  Rows Removed by Index Recheck: 5484114
  Heap Blocks: exact=11975 lossy=46388
  -  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766
   width=0) (actual time=22.821..22.821 rows=100047 loops=1)
Index Cond: ((col2 = 0.01::double precision) AND (col2 =
   0.02::double
   precision))
Total runtime: 1129.334 ms
   (7 rows)

 So, by setting work_mem to

   work_mem = (11975 + 46388) *
 (MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
 + sizeof(Pointer) + sizeof(Pointer)) /
 (1024 * 1024),

 which is about 5MB, we have the following (Note that no lossy heap pages!):

 postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
 0.02;
QUERY PLAN


 
 
 
  Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34)
 (actual
 time=42.981..120.252 rows=1
 00047 loops=1)
Recheck Cond: ((col2 = 0.01::double precision) AND (col2 =
 0.02::double
 precision))
Heap Blocks: exact=58363
-  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766
 width=0)
 (actual time=26.023..26.023 r
 ows=100047 loops=1)
  Index Cond: ((col2 = 0.01::double precision) AND (col2 =
 0.02::double
 precision))
  Total runtime: 129.304 ms
 (6 rows)

 BTW, as the EXPLAIN ANALYZE output, the number of exact/lossy heap pages
 would
 be fine with me.

  Anyway, could you add the patch into next CF?

 Done.

 Thanks,

 Best regards,
 Etsuro Fujita



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-11-01 Thread Etsuro Fujita
 From: Fujii Masao [mailto:masao.fu...@gmail.com]

 This is what I'm looking for! This feature is really useful for tuning
work_mem
 when using full text search with pg_trgm.
 
 I'm not sure if it's good idea to show the number of the fetches because it
 seems difficult to tune work_mem from that number. How can we calculate how
 much to increase work_mem to avoid lossy bitmap from the number of the fetches
 in EXPLAIN output?

We can calculate that from the following equation in tbm_create():

  nbuckets = maxbytes /
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
+ sizeof(Pointer) + sizeof(Pointer)),

where maxbytes is the size of memory used for the hashtable in a TIDBitmap,
designated by work_mem, and nbuckets is the estimated number of hashtable
entries we can have within maxbytes.  From this, the size of work_mem within
which we can have every hashtable entry as an exact bitmap is calculated as
follows:

  work_mem = (the number of exact pages + the number of lossy pages) *
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
+ sizeof(Pointer) + sizeof(Pointer)) /
(1024 * 1024).

I'll show you an example.  The following is the result for work_mem = 1MB:

  postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and
 0.02;
 QUERY PLAN
  --
  --
  
   Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766
  width=34) (actual
  time=24.907..1119.961 rows=100047 loops=1)
 Recheck Cond: ((col2 = 0.01::double precision) AND (col2 =
  0.02::double
  precision))
 Rows Removed by Index Recheck: 5484114
 Heap Blocks: exact=11975 lossy=46388
 -  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766
  width=0) (actual time=22.821..22.821 rows=100047 loops=1)
   Index Cond: ((col2 = 0.01::double precision) AND (col2 =
  0.02::double
  precision))
   Total runtime: 1129.334 ms
  (7 rows)

So, by setting work_mem to

  work_mem = (11975 + 46388) *
(MAXALIGN(sizeof(HASHELEMENT)) + MAXALIGN(sizeof(PagetableEntry))
+ sizeof(Pointer) + sizeof(Pointer)) /
(1024 * 1024),

which is about 5MB, we have the following (Note that no lossy heap pages!):

postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 0.02;
   QUERY PLAN




 Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34) (actual
time=42.981..120.252 rows=1
00047 loops=1)
   Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double
precision))
   Heap Blocks: exact=58363
   -  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766 width=0)
(actual time=26.023..26.023 r
ows=100047 loops=1)
 Index Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double
precision))
 Total runtime: 129.304 ms
(6 rows)

BTW, as the EXPLAIN ANALYZE output, the number of exact/lossy heap pages would
be fine with me.

 Anyway, could you add the patch into next CF?

Done.

Thanks,

Best regards,
Etsuro Fujita



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan

2013-10-31 Thread Fujii Masao
On Thu, Oct 31, 2013 at 7:54 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 Hi,

 I think that lossy-heap-block information for a bitmap heap scan, not just 
 Rows
 Removed by Index Recheck information, would also be a clue used to tune
 work_mem for better performance especially when the bitmap heap scan uses an
 index such as gin or gist, not btree.

 So here's a patch that adds the information to the EXPLAIN ANALYZE output.  
 The
 following shows an example.  The number of lossy-heap-block fetches (ie
 tbmres-ntuples = -1) as well as that of exact-heap-block fetches (ie
 tbmres-ntuples = 0) are shown in the Heap Blocks line.

 postgres=# EXPLAIN ANALYZE SELECT * FROM demo WHERE col2 between 0.01 and 
 0.02;
QUERY PLAN
 
 
  Bitmap Heap Scan on demo  (cost=2716.54..92075.46 rows=105766 width=34) 
 (actual
 time=24.907..1119.961 rows=100047 loops=1)
Recheck Cond: ((col2 = 0.01::double precision) AND (col2 = 0.02::double
 precision))
Rows Removed by Index Recheck: 5484114
Heap Blocks: exact=11975 lossy=46388
-  Bitmap Index Scan on demo_idx  (cost=0.00..2690.09 rows=105766 width=0)
 (actual time=22.821..22.821 rows=100047 loops=1)
  Index Cond: ((col2 = 0.01::double precision) AND (col2 = 
 0.02::double
 precision))
  Total runtime: 1129.334 ms
 (7 rows)

 Comments welcome.

This is what I'm looking for! This feature is really useful for tuning work_mem
when using full text search with pg_trgm.

I'm not sure if it's good idea to show the number of the fetches because
it seems difficult to tune work_mem from that number. How can we calculate
how much to increase work_mem to avoid lossy bitmap from the number of
the fetches in EXPLAIN output?

Anyway, could you add the patch into next CF?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers