Hi!

On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <w...@imaginaryindustries.com>
wrote:

> I'm looking at implementing a custom indexing scheme, and I've been having
> trouble understanding the proper approach.
>
> Basically, I need a BK tree, which is a tree-structure useful for indexing
> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> across the hamming edit-distance of perceptual hashes, for fuzzy image
> searching). I'm *pretty* sure a SP-GiST index is the correct index type,
> as my tree is intrinsically unbalanced.
>

Yes, SP-GiST is appropriate index type for BK tree.  I'm pretty sure BK
tree could be implemented as SP-GiST opclass.
The only thing worrying me is selection pivot values for nodes.  SP-GiST
builds by insertion of index tuples on by one.  First pivot value for root
node in SP-GIST would be created once first leaf page overflows.  Thus, you
would have to select this pivot value basing on very small fraction in the
beginning of dataset.
As I know, BK tree is most efficient when root pivot value is selected
after looking in whole dataset and then hierarchically to subtrees.

BTW, did you try my extension for searching similar images.  It's quite
primitive, but works for some cases.
https://github.com/postgrespro/imgsmlr
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf

I have a functional stand-alone implementation of a BK-Tree, and it works
> very well, but the complexity of managing what is basically a external
> index for my database has reached the point where it's significantly
> problematic, and it seems to be it should be moved into the database.
>

Sure, moving this index to the database is right decision.

Anyways, looking at the contents of postgres/src/backend/access/spgist, it
> looks pretty straightforward in terms of the actual C implementation, but
> I'm stuck understanding how to "install" a custom SP-GiST implementation.
> There are several GiST indexing implementations in the contrib directory,
> but no examples for how I'd go about implementing a loadable SP-GiST index.
>
> Basically, my questions are:
>
>    - Is it possible to implement a SP-GiST indexing scheme as a loadable
>    module?
>
>  Yes, it's possible to define SP-GiST.

>
>    - If so, how?
>
> The pretty same way as GiST opclass extension.  You have to define
supporting functions and operators and then define operator class over them.

>
>    - And is there an example I can base my implementation off of?
>
>
>
> I'm relatively comfortable with C (much moreso with C++), but I haven't
> spent a lot of time looking at the postgresql codebase.  I don't think I
> could start from a empty folder and make a properly-implemented module in
> any reasonable period of time, so if I have a working example for some sort
> of index that uses the same interfaces that would really help a lot
>
I don't think there is an example in PostgreSQL source code tree or on
github.  But I've attached by early experiment with VP-tree (seems to be
pretty same as BK tree) using SP-GiST (see vptree.tar.gz).  Basing on this
experiment I realized that it's important to select root pivot value basing
on the whole dataset.  However, for your metric/dataset/queries it might
appear to be different.

It also would be nice to someday improve SP-GiST to support some global
strategies on index creation.  In particular, it would allow to resolve
selection of pivot values problem that I mention above.  Right now my
colleagues and me don't have time for that.  But I can assist you with
advises if you will decide to implement that.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment: vptree.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to