Re: [HACKERS] pageinspect: Hash index support

Ashutosh Sharma Sat, 28 Jan 2017 18:10:01 -0800

> The heap tuple is on page 3407872 at line pointer offset 12584?
> That's an awfully but not implausibly big page number (about 26GB into
> the relation) and an awfully and implausibly large line pointer offset
> (how do we fit 12584 index tuples into an 8192-byte page?).  Also, the
> fact that this example has two index entries with the same hash code
> pointing at the same heap TID seems wrong; wouldn't that result in
> index scans returning duplicates?  I think what's actually happening
> here is that (3407872,12584) is actually the attempt to interpret some
> chunk of memory containing the text representation of a TID as an
> actual TID.  When I print those numbers as hex, I get 0x343128, and
> those are the digits "4" and "1" and an opening parenthesis ")" as
> ASCII, so that might fit this theory.


I too had a similar observations when debugging hash_page_items() and
I think we were trying to represent tid as a text rather than tid
itself which was not correct. Attched patch fixes this bug. Below is
what i observed and it is somewhat similar to your explanation in the
above comment:

(gdb) p PageGetItemId(uargs->page, uargs->offset)
$2 = (ItemIdData *) 0x1d84754

(gdb) p *(ItemIdData *) 0x1d84754
$3 = {lp_off = 1664, lp_flags = 1, lp_len = 16}

(gdb) p *(IndexTuple) (page + 1664)
$4 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 839}, ip_posid = 137},
t_info = 16}

(gdb) p BlockIdGetBlockNumber(&(itup->t_tid.ip_blkid))
$5 = 839

(gdb) p (itup->t_tid.ip_posid)
$6 = 137

(gdb) p CStringGetTextDatum(psprintf("(%u,%u)", 839, 137))
$7 = 30959896

(gdb) p DatumGetCString(30959896)
$8 = 0x1d86918 "4"

(gdb) p (char *)0x1d86918
$23 = 0x1d86918 "4"

>
> The code that tries to extract the hashcode from the item also looks
> suspicious.  I'm not 100% sure whether it's wrong or just
> strangely-written, but it doesn't make much sense to extract the item
> contents, then using sprintf() to turn that into a hex string of
> bytes, then parse the hex string using strtol() to get an integer
> back.  I think what it ought to be doing is getting a pointer to the
> index tuple and then calling _hash_get_indextuple_hashkey.
>

I think there is nothing wrong with the hashcode being shown but i do
agree that it is a bit lengthly method to find a hashcode considering
that we already have a extern function _hash_get_indextuple_hashkey()
that can be used to find the hashcode. I have corrected this in the
attached patch.

> Another minor complaint about hash_page_items is that it doesn't
> pfree(uargs->page).  I'm not sure it's necessary to pfree anything
> here, but if we're going to do it I think we might as well be
> consistent with the btree case.  Anyway it can hardly make sense to
> free the 8-byte structure and not the 8192-byte page to which it's
> pointing.
>

In case of btree, we are copying entire page into uargs->page.

<btreefuncs.c>
memcpy(uargs->page, BufferGetPage(buffer), BLCKSZ);
</btrrefuncs.c>

But in our case we have a raw_page and uargs->page contains pointer to
the page so no need to pfree uargs->page separately.

> In hash_bimap_info(), we go to the trouble of creating a raw page but
> never do anything with it.  I guess the idea here is just to error out
> if the supplied page number is not an overflow page, but there are no
> comments explaining that.  Anyway, I'm not sure it's a very good idea,
> because it means that if you try to write a query to interrogate the
> status of all the bitmap pages, it's going to read all of the overflow
> pages to which they point, which makes the operation a LOT more
> expensive.  I think it would be better to leave this part out; if the
> user wants to know which pages are actually overflow pages, they can
> use hash_page_type() to figure it out.  Let's make it the job of this
> function just to check the available/free status of the page in the
> bitmap, without reading the bitmap itself.

Point taken. I am now checking the status of an overflow page without
reading bitmap page.

>
> In doing that, I think it should either return (bitmapblkno,
> bitmapbit, bit) or just bit.  Returning bitmapblkno but not bitmapbit
> seems strange.  Also, I think it should return bit as a bool, not
> int4.
>

The new bitmap_info() now returns bitmapblkno, bitmapbit and bitstatus as bool.

> Another general note is that, in general, if you use the
> BlahGetDatum() function to construct a datum, the SQL type should be
> match the macro you picked - e.g. if the SQL type is int4, the macro
> used should be Int32GetDatum(), not UInt32GetDatum() or anything else.
> If the SQL type is bool, you should be using BoolGetDatum().

Sorry to mention but I didn't find any SQL datatype equivalent to
uint32 or uint16 in 'C'. So, currently i am using int4 for uint16 and
int8 for uint32.


> Apart from the above, I did a little work to improve the reporting
> when a page of the wrong type is verified.  Updated patch with those
> changes attached.

okay. Thanks. I have done changes on top of this patch.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

0001-Add-support-for-hash-index-in-pageinspect-contrib-mo_v9.patch
Description: invalid/octet-stream

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

Reply via email to