Ok, I've managed to get my custom index working. It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if anyone else needs a fuzzy-image searching system that can integrate into postgresql..
It should be a pretty good basis for anyone else to use if they want to implement a SP-GiST index too. Thanks! On Sun, Nov 5, 2017 at 8:10 PM, Connor Wolf <conn...@imaginaryindustries.com > wrote: > Never mind, it turns out the issue boiled down to me declaring the > wrong prefixType in my config function. > > TL;DR - PEBKAC > > On Sun, Nov 5, 2017 at 1:09 AM, Connor Wolf <connorw@imaginaryindustries. > com> wrote: > >> Ok, I've got everything compiling and it installs properly, but I'm >> running into problems that I think are either a side-effect of implementing >> picksplit incorrectly (likely), or a bug in SP-GiST(?). >> >> Program received signal SIGSEGV, Segmentation fault. >> __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me >> mcpy-sse2-unaligned.S:159 >> 159 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such >> file or directory. >> (gdb) bt >> #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/me >> mcpy-sse2-unaligned.S:159 >> #1 0x00000000004ecd66 in memcpy (__len=16, __src=<optimized out>, >> __dest=0x13c9dd8) at /usr/include/x86_64-linux-gnu/bits/string3.h:53 >> #2 memcpyDatum (target=target@entry=0x13c9dd8, att=att@entry=0x7fff327325f4, >> datum=datum@entry=18445692987396472528) at spgutils.c:587 >> #3 0x00000000004ee06b in spgFormInnerTuple (state=state@entry >> =0x7fff327325e0, hasPrefix=<optimized out>, prefix=18445692987396472528, >> nNodes=8, >> nodes=nodes@entry=0x13bd340) at spgutils.c:741 >> #4 0x00000000004f508b in doPickSplit (index=index@entry=0x7f2cf9de7f98, >> state=state@entry=0x7fff327325e0, current=current@entry=0x7fff32732020, >> parent=parent@entry=0x7fff32732040, >> newLeafTuple=newLeafTuple@entry=0x13b9f00, >> level=level@entry=0, isNulls=0 '\000', isNew=0 '\000') at >> spgdoinsert.c:913 >> #5 0x00000000004f6976 in spgdoinsert (index=index@entry=0x7f2cf9de7f98, >> state=state@entry=0x7fff327325e0, heapPtr=heapPtr@entry=0x12e672c, >> datum=12598555199787281, >> isnull=0 '\000') at spgdoinsert.c:2053 >> #6 0x00000000004ee5cc in spgistBuildCallback (index=index@entry >> =0x7f2cf9de7f98, htup=htup@entry=0x12e6728, values=values@entry >> =0x7fff327321e0, >> isnull=isnull@entry=0x7fff32732530 "", tupleIsAlive=tupleIsAlive@entry=1 >> '\001', state=state@entry=0x7fff327325e0) at spginsert.c:56 >> #7 0x0000000000534e8d in IndexBuildHeapRangeScan >> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, >> indexRelation=indexRelation@entry=0x7f2cf9de7f98, >> indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1 >> '\001', anyvisible=anyvisible@entry=0 '\000', >> start_blockno=start_blockno@entry=0, >> numblocks=4294967295, callback=0x4ee573 <spgistBuildCallback>, >> callback_state=0x7fff327325e0) at index.c:2609 >> #8 0x0000000000534f52 in IndexBuildHeapScan >> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, >> indexRelation=indexRelation@entry=0x7f2cf9de7f98, >> indexInfo=indexInfo@entry=0x1390ad8, allow_sync=allow_sync@entry=1 >> '\001', callback=callback@entry=0x4ee573 <spgistBuildCallback>, >> callback_state=callback_state@entry=0x7fff327325e0) at index.c:2182 >> #9 0x00000000004eeb74 in spgbuild (heap=0x7f2cf9ddc6c8, >> index=0x7f2cf9de7f98, indexInfo=0x1390ad8) at spginsert.c:140 >> #10 0x0000000000535e55 in index_build >> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, >> indexRelation=indexRelation@entry=0x7f2cf9de7f98, >> indexInfo=indexInfo@entry=0x1390ad8, isprimary=isprimary@entry=0 >> '\000', isreindex=isreindex@entry=0 '\000') at index.c:2043 >> #11 0x0000000000536ee8 in index_create >> (heapRelation=heapRelation@entry=0x7f2cf9ddc6c8, >> indexRelationName=indexRelationName@entry=0x12dd600 "int8idx_2", >> indexRelationId=16416, indexRelationId@entry=0, relFileNode=0, >> indexInfo=indexInfo@entry=0x1390ad8, indexColNames=indexColNames@en >> try=0x1390f40, >> accessMethodObjectId=4000, tableSpaceId=0, >> collationObjectId=0x12e6b18, classObjectId=0x12e6b38, coloptions=0x12e6b58, >> reloptions=0, isprimary=0 '\000', >> isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000', >> allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000', >> is_internal=0 '\000', if_not_exists=0 '\000') at index.c:1116 >> #12 0x00000000005d8fe6 in DefineIndex (relationId=relationId@entry=16413, >> stmt=stmt@entry=0x12dd568, indexRelationId=indexRelationId@entry=0, >> is_alter_table=is_alter_table@entry=0 '\000', >> check_rights=check_rights@entry=1 '\001', check_not_in_use=check_not_in_ >> use@entry=1 '\001', skip_build=0 '\000', >> quiet=0 '\000') at indexcmds.c:667 >> #13 0x0000000000782057 in ProcessUtilitySlow (pstate=pstate@entry >> =0x12dd450, pstmt=pstmt@entry=0x12db108, >> queryString=queryString@entry=0x12da0a0 "CREATE INDEX int8idx_2 ON >> int8tmp_2 USING spgist ( a vptree_ops );", context=context@entry=PROCESS_ >> UTILITY_TOPLEVEL, >> params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, >> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:1326 >> #14 0x00000000007815ef in standard_ProcessUtility (pstmt=0x12db108, >> queryString=0x12da0a0 "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a >> vptree_ops );", >> context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, >> dest=0x12db200, completionTag=0x7fff32732ed0 "") at utility.c:928 >> #15 0x00000000007816a7 in ProcessUtility (pstmt=pstmt@entry=0x12db108, >> queryString=<optimized out>, context=context@entry=PROCESS_ >> UTILITY_TOPLEVEL, >> params=<optimized out>, queryEnv=<optimized out>, >> dest=dest@entry=0x12db200, >> completionTag=0x7fff32732ed0 "") at utility.c:357 >> #16 0x000000000077de2e in PortalRunUtility (portal=portal@entry=0x1391a80, >> pstmt=pstmt@entry=0x12db108, isTopLevel=isTopLevel@entry=1 '\001', >> setHoldSnapshot=setHoldSnapshot@entry=0 '\000', >> dest=dest@entry=0x12db200, >> completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1178 >> #17 0x000000000077e98e in PortalRunMulti (portal=portal@entry=0x1391a80, >> isTopLevel=isTopLevel@entry=1 '\001', setHoldSnapshot=setHoldSnapsho >> t@entry=0 '\000', >> dest=dest@entry=0x12db200, altdest=altdest@entry=0x12db200, >> completionTag=completionTag@entry=0x7fff32732ed0 "") at pquery.c:1324 >> #18 0x000000000077f782 in PortalRun (portal=portal@entry=0x1391a80, >> count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 >> '\001', >> run_once=run_once@entry=1 '\001', dest=dest@entry=0x12db200, >> altdest=altdest@entry=0x12db200, completionTag=0x7fff32732ed0 "") at >> pquery.c:799 >> #19 0x000000000077bc12 in exec_simple_query >> (query_string=query_string@entry=0x12da0a0 >> "CREATE INDEX int8idx_2 ON int8tmp_2 USING spgist ( a vptree_ops );") >> at postgres.c:1120 >> #20 0x000000000077d95c in PostgresMain (argc=<optimized out>, >> argv=argv@entry=0x12e9948, dbname=0x12bca10 "contrib_regression", >> username=<optimized out>) >> at postgres.c:4139 >> #21 0x00000000006fecf4 in BackendRun (port=port@entry=0x12de030) at >> postmaster.c:4364 >> #22 0x0000000000700e32 in BackendStartup (port=port@entry=0x12de030) at >> postmaster.c:4036 >> #23 0x0000000000701112 in ServerLoop () at postmaster.c:1755 >> #24 0x00000000007023af in PostmasterMain (argc=argc@entry=8, >> argv=argv@entry=0x12ba7c0) at postmaster.c:1363 >> #25 0x00000000006726c1 in main (argc=8, argv=0x12ba7c0) at main.c:228 >> >> >> >> It's segfaulting when trying to build the inner tuple after the picksplit >> operation. >> >> Adding debugging output to the print function, I see: >> >> NOTICE: Memcopying from 0000000000000000 to 00000000013d7938 with len 16 >> >> The first item in my input data file is zero, and if I change it to 1: >> >> NOTICE: Memcopying from 0000000000000001 to 0000000001b45938 with len 16 >> >> So pretty clearly, I'm trying to copy from the literal data >> representation of the data as an address. >> Following the data, this is the value I'm assigning to out->prefixDatum in >> my picksplit call. I can confirm this by hard-coding the >> value of out->prefixDatum in my picksplit call to a known value, it >> shows up as the address in the memcopy call. >> >> However, as far as I can tell, I'm assigning it correctly: out->prefixDatum >> = Int64GetDatum(val); >> >> This is similar to how the other spgist implementations work. >> spgkdtreeproc.c does out->prefixDatum = Float8GetDatum(coord); >> for example. >> >> I think this is the SP-GiST core failing to handle certain types being >> pass-by-value? I'm not totally certain. >> >> As I understand it, the "maybe-pass-by-reference" parameter is a global >> flag (USE_FLOAT8_BYVAL), but I'd like to >> keep that enabled. What's the proper approach for adding support for this >> in the SP-GiST core? >> >> My (somewhat messy) extension module is here >> <https://github.com/fake-name/pg-spgist_hamming/tree/master/vptree>, if >> it's relevant. >> >> Connor >> >> >> >> >> On Fri, Nov 3, 2017 at 3:12 PM, Alexander Korotkov < >> a.korot...@postgrespro.ru> wrote: >> >>> On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf < >>> conn...@imaginaryindustries.com> wrote: >>> >>>> EDIT: That's actually exactly how the example I'm working off of works. >>>> DERP. The SQL is >>>> >>>> CREATE TYPE vptree_area AS >>>> ( >>>> center _int4, >>>> distance float8 >>>> ); >>>> >>>> CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) >>>> RETURNS boolean AS >>>> 'MODULE_PATHNAME','vptree_area_match' >>>> LANGUAGE C IMMUTABLE STRICT; >>>> >>>> CREATE OPERATOR <@ ( >>>> LEFTARG = _int4, >>>> RIGHTARG = vptree_area, >>>> PROCEDURE = vptree_area_match, >>>> RESTRICT = contsel, >>>> JOIN = contjoinsel); >>>> >>>> so I just need to understand how to parse out the custom type in my >>>> index operator. >>>> >>> >>> You can see the implementation of vptree_area_match function located in >>> vptree.c. It just calls GetAttributeByNum() function. >>> >>> There is also alternative approach for that implemented in pg_trgm >>> contrib module. It has "text % text" operator which checks if two strings >>> are similar enough. The similarity threshold is defined by >>> pg_trgm.similarity_threshold GUC. Thus, you can also define GUC with >>> threshold distance value. However, it would place some limitations. For >>> instance, you wouldn't be able to use different distance threshold in the >>> same query. >>> >>> ------ >>> Alexander Korotkov >>> Postgres Professional: http://www.postgrespro.com >>> The Russian Postgres Company >>> >>> >> >> >