Ok, I've managed to get my custom index working.

It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if
anyone else needs a fuzzy-image searching system
that can integrate into postgresql..

It should be a pretty good basis for anyone else to use if they want to
implement a SP-GiST index too.

Thanks!

On Sun, Nov 5, 2017 at 8:10 PM, Connor Wolf <conn...@imaginaryindustries.com
> wrote:

> Never mind, it turns out the issue boiled down to me declaring the
> wrong prefixType in my config function.
>
> TL;DR - PEBKAC
>
> On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw@imaginaryindustries.
> com> wrote:
>
>> Ok, I've got everything compiling and it installs properly, but I'm
>> running into problems that I think are either a side-effect of implementing
>> picksplit incorrectly (likely), or a bug in SP-GiST(?).
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me
>> mcpy-sse2-unaligned.S:159
>> 159     ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such
>> file or directory.
>> (gdb) bt
>> #0  __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me
>> mcpy-sse2-unaligned.S:159
>> #1  0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>,
>> __dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53
>> #2  memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4,
>> datum=datum@entry=18445692987396472528) at spgutils.c:587
>> #3  0x00000000004ee06b in spgFormInnerTuple (state=state@entry
>> =0x7fff327325e0, hasPrefix=<optimized out>, prefix=18445692987396472528,
>> nNodes=8,
>>     nodes=nodes@entry=0x13bd340) at spgutils.c:741
>> #4  0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98,
>> state=state@entry=0x7fff327325e0, current=current@entry=0x7fff32732020,
>>     parent=parent@entry=0x7fff32732040, 
>> newLeafTuple=newLeafTuple@entry=0x13b9f00,
>> level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at
>> spgdoinsert.c:913
>> #5  0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98,
>> state=state@entry=0x7fff327325e0, heapPtr=heapPtr@entry=0x12e672c,
>> datum=12598555199787281,
>>     isnull=0 '\000') at spgdoinsert.c:2053
>> #6  0x00000000004ee5cc in spgistBuildCallback (index=index@entry
>> =0x7f2cf9de7f98, htup=htup@entry=0x12e6728, values=values@entry
>> =0x7fff327321e0,
>>     isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entry=1
>> '\001', state=state@entry=0x7fff327325e0) at spginsert.c:56
>> #7  0x0000000000534e8d in IndexBuildHeapRangeScan
>> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
>> indexRelation=indexRelation@entry=0x7f2cf9de7f98,
>>     indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
>> '\001', anyvisible=anyvisible@entry=0 '\000',
>> start_blockno=start_blockno@entry=0,
>>     numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>,
>> callback_state=0x7fff327325e0) at index.c:2609
>> #8  0x0000000000534f52 in IndexBuildHeapScan
>> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
>> indexRelation=indexRelation@entry=0x7f2cf9de7f98,
>>     indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1
>> '\001', callback=callback@entry=0x4ee573 <spgistBuildCallback>,
>>     callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182
>> #9  0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8,
>> index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140
>> #10 0x0000000000535e55 in index_build 
>> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
>> indexRelation=indexRelation@entry=0x7f2cf9de7f98,
>>     indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0
>> '\000', isreindex=isreindex@entry=0 '\000') at index.c:2043
>> #11 0x0000000000536ee8 in index_create 
>> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8,
>> indexRelationName=indexRelationName@entry=0x12dd600 "int8idx_2",
>>     indexRelationId=16416, indexRelationId@entry=0, relFileNode=0,
>> indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@en
>> try=0x1390f40,
>>     accessMethodObjectId=4000, tableSpaceId=0,
>> collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58,
>> reloptions=0, isprimary=0 '\000',
>>     isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
>> allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
>>     is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116
>> #12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413,
>> stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationId@entry=0,
>>     is_alter_table=is_alter_table@entry=0 '\000',
>> check_rights=check_rights@entry=1 '\001', check_not_in_use=check_not_in_
>> use@entry=1 '\001', skip_build=0 '\000',
>>     quiet=0 '\000') at indexcmds.c:667
>> #13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry
>> =0x12dd450, pstmt=pstmt@entry=0x12db108,
>>     queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON
>> int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry=PROCESS_
>> UTILITY_TOPLEVEL,
>>     params=params@entry=0x0, queryEnv=queryEnv@entry=0x0,
>> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:1326
>> #14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108,
>> queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a
>> vptree_ops );",
>>     context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
>> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928
>> #15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108,
>> queryString=<optimized out>, context=context@entry=PROCESS_
>> UTILITY_TOPLEVEL,
>>     params=<optimized out>, queryEnv=<optimized out>, 
>> dest=dest@entry=0x12db200,
>> completionTag=0x7fff32732ed0 "") at utility.c:357
>> #16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80,
>> pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001',
>>     setHoldSnapshot=setHoldSnapshot@entry=0 '\000', 
>> dest=dest@entry=0x12db200,
>> completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1178
>> #17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80,
>> isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapsho
>> t@entry=0 '\000',
>>     dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200,
>> completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1324
>> #18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80,
>> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1
>> '\001',
>>     run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200,
>> altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at
>> pquery.c:799
>> #19 0x000000000077bc12 in exec_simple_query 
>> (query_string=query_string@entry=0x12da0a0
>> "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );")
>>     at postgres.c:1120
>> #20 0x000000000077d95c in PostgresMain (argc=<optimized out>,
>> argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression",
>> username=<optimized out>)
>>     at postgres.c:4139
>> #21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at
>> postmaster.c:4364
>> #22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at
>> postmaster.c:4036
>> #23 0x0000000000701112 in ServerLoop () at postmaster.c:1755
>> #24 0x00000000007023af in PostmasterMain (argc=argc@entry=8,
>> argv=argv@entry=0x12ba7c0) at postmaster.c:1363
>> #25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228
>>
>>
>>
>> It's segfaulting when trying to build the inner tuple after the picksplit
>> operation.
>>
>> Adding debugging output to the print function, I see:
>>
>> NOTICE:  Memcopying from 0000000000000000 to 00000000013d7938 with len 16
>>
>> The first item in my input data file is zero, and if I change it to 1:
>>
>> NOTICE:  Memcopying from 0000000000000001 to 0000000001b45938 with len 16
>>
>> So pretty clearly, I'm trying to copy from the literal data
>> representation of the data as an address.
>> Following the data, this is the value I'm assigning to out->prefixDatum in
>> my picksplit call. I can confirm this by hard-coding the
>> value of out->prefixDatum in my picksplit call to a known value, it
>> shows up as the address in the memcopy call.
>>
>> However, as far as I can tell, I'm assigning it correctly:  out->prefixDatum
>> = Int64GetDatum(val);
>>
>> This is similar to how the other spgist implementations work.
>> spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord);
>> for example.
>>
>> I think this is the SP-GiST core failing to handle certain types being
>> pass-by-value? I'm not totally certain.
>>
>> As I understand it, the "maybe-pass-by-reference" parameter is a global
>> flag (USE_FLOAT8_BYVAL), but I'd like to
>> keep that enabled. What's the proper approach for adding support for this
>> in the SP-GiST core?
>>
>> My (somewhat messy) extension module is here
>> <https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if
>> it's relevant.
>>
>> Connor
>>
>>
>>
>>
>> On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov <
>> a.korot...@postgrespro.ru> wrote:
>>
>>> On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
>>> conn...@imaginaryindustries.com> wrote:
>>>
>>>> EDIT: That's actually exactly how the example I'm working off of works.
>>>> DERP. The SQL is
>>>>
>>>> CREATE TYPE vptree_area AS
>>>> (
>>>>     center _int4,
>>>>     distance float8
>>>> );
>>>>
>>>> CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area)
>>>> RETURNS boolean AS
>>>> 'MODULE_PATHNAME','vptree_area_match'
>>>> LANGUAGE C IMMUTABLE STRICT;
>>>>
>>>> CREATE OPERATOR <@ (
>>>> LEFTARG = _int4,
>>>> RIGHTARG = vptree_area,
>>>> PROCEDURE = vptree_area_match,
>>>> RESTRICT = contsel,
>>>> JOIN = contjoinsel);
>>>>
>>>> so I just need to understand how to parse out the custom type in my
>>>> index operator.
>>>>
>>>
>>> You can see the implementation of vptree_area_match function located in
>>> vptree.c.  It just calls GetAttributeByNum() function.
>>>
>>> There is also alternative approach for that implemented in pg_trgm
>>> contrib module.  It has "text % text" operator which checks if two strings
>>> are similar enough.  The similarity threshold is defined by
>>> pg_trgm.similarity_threshold GUC.  Thus, you can also define GUC with
>>> threshold distance value.  However, it would place some limitations.  For
>>> instance, you wouldn't be able to use different distance threshold in the
>>> same query.
>>>
>>> ------
>>> Alexander Korotkov
>>> Postgres Professional: http://www.postgrespro.com
>>> The Russian Postgres Company
>>>
>>>
>>
>>
>

Reply via email to