I was mostly unclear on how I'd go about attaching the extension functions
to the relevant indexing mechanism. From the looks of the vptree.tar.gz
file (which is really, *really* helpful, incidentally!), a it's done via a
custom operator class, which then gets passed to the actual index creation
mechanism when you're declaring the index.

I think I had looked at that at one point, but it's been a while. In my
case, I'm using discrete-cosine-transform based perceptual hashes for
searching. They are nice and compact (64-bits per hash), while still
producing good search results. I have a dataset of ~36 million images, and
it does searches in < 50 milliseconds with a hamming distance of 4, while
touching ~0.25% of the tree (And occupying ~18 GB of ram).

My BK tree is up on github here
if anyone needs something like that (BSD licensed, pretty well tested).
There's also a python wrapper for it.

I'll probably not have time to poke about until this weekend, but thanks!

On Mon, Oct 30, 2017 at 4:50 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi!
> On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <
> w...@imaginaryindustries.com> wrote:
>> I'm looking at implementing a custom indexing scheme, and I've been
>> having trouble understanding the proper approach.
>> Basically, I need a BK tree, which is a tree-structure useful for
>> indexing arbitrary discrete metric-spaces (in my case, I'm interested in
>> indexing across the hamming edit-distance of perceptual hashes, for fuzzy
>> image searching). I'm *pretty* sure a SP-GiST index is the correct index
>> type, as my tree is intrinsically unbalanced.
> Yes, SP-GiST is appropriate index type for BK tree.  I'm pretty sure BK
> tree could be implemented as SP-GiST opclass.
> The only thing worrying me is selection pivot values for nodes.  SP-GiST
> builds by insertion of index tuples on by one.  First pivot value for root
> node in SP-GIST would be created once first leaf page overflows.  Thus, you
> would have to select this pivot value basing on very small fraction in the
> beginning of dataset.
> As I know, BK tree is most efficient when root pivot value is selected
> after looking in whole dataset and then hierarchically to subtrees.
> BTW, did you try my extension for searching similar images.  It's quite
> primitive, but works for some cases.
> https://github.com/postgrespro/imgsmlr
> https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf
> I have a functional stand-alone implementation of a BK-Tree, and it works
>> very well, but the complexity of managing what is basically a external
>> index for my database has reached the point where it's significantly
>> problematic, and it seems to be it should be moved into the database.
> Sure, moving this index to the database is right decision.
> Anyways, looking at the contents of postgres/src/backend/access/spgist,
>> it looks pretty straightforward in terms of the actual C implementation,
>> but I'm stuck understanding how to "install" a custom SP-GiST
>> implementation. There are several GiST indexing implementations in the
>> contrib directory, but no examples for how I'd go about implementing a
>> loadable SP-GiST index.
>> Basically, my questions are:
>>    - Is it possible to implement a SP-GiST indexing scheme as a loadable
>>    module?
>>  Yes, it's possible to define SP-GiST.
>>    - If so, how?
>> The pretty same way as GiST opclass extension.  You have to define
> supporting functions and operators and then define operator class over them.
>>    - And is there an example I can base my implementation off of?
>> I'm relatively comfortable with C (much moreso with C++), but I haven't
>> spent a lot of time looking at the postgresql codebase.  I don't think I
>> could start from a empty folder and make a properly-implemented module in
>> any reasonable period of time, so if I have a working example for some sort
>> of index that uses the same interfaces that would really help a lot
> I don't think there is an example in PostgreSQL source code tree or on
> github.  But I've attached by early experiment with VP-tree (seems to be
> pretty same as BK tree) using SP-GiST (see vptree.tar.gz).  Basing on this
> experiment I realized that it's important to select root pivot value basing
> on the whole dataset.  However, for your metric/dataset/queries it might
> appear to be different.
> It also would be nice to someday improve SP-GiST to support some global
> strategies on index creation.  In particular, it would allow to resolve
> selection of pivot values problem that I mention above.  Right now my
> colleagues and me don't have time for that.  But I can assist you with
> advises if you will decide to implement that.
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to