Re: [HACKERS] semi-PoC: kNN-gist for cubes
On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote: I have a rough proof-of-concept for getting nearest-neighbor searches working with cubes. Please attach such patches to the email when posting them :) Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] semi-PoC: kNN-gist for cubes
On Tue, Feb 07, 2012 at 05:56:52AM -0800, David Fetter wrote: On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote: I have a rough proof-of-concept for getting nearest-neighbor searches working with cubes. Please attach such patches to the email when posting them :) And here's a cleaned-up version of the patch that at least passes make check. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/contrib/cube/cube.c --- b/contrib/cube/cube.c *** *** 73,78 PG_FUNCTION_INFO_V1(g_cube_penalty); --- 73,79 PG_FUNCTION_INFO_V1(g_cube_picksplit); PG_FUNCTION_INFO_V1(g_cube_union); PG_FUNCTION_INFO_V1(g_cube_same); + PG_FUNCTION_INFO_V1(g_cube_distance); Datum g_cube_consistent(PG_FUNCTION_ARGS); Datum g_cube_compress(PG_FUNCTION_ARGS); *** *** 81,86 Datumg_cube_penalty(PG_FUNCTION_ARGS); --- 82,88 Datum g_cube_picksplit(PG_FUNCTION_ARGS); Datum g_cube_union(PG_FUNCTION_ARGS); Datum g_cube_same(PG_FUNCTION_ARGS); + Datum g_cube_distance(PG_FUNCTION_ARGS); /* ** B-tree support functions *** *** 649,654 g_cube_same(PG_FUNCTION_ARGS) --- 651,671 } /* + ** Distance method + */ + Datum + g_cube_distance(PG_FUNCTION_ARGS) + { + GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0); + NDBOX *query = PG_GETARG_NDBOX(1); + double distance; + + distance = DatumGetFloat8(DirectFunctionCall2(cube_distance, entry-key, PointerGetDatum(query))); + + PG_RETURN_FLOAT8(distance); + } + + /* ** SUPPORT ROUTINES */ bool -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] semi-PoC: kNN-gist for cubes
I have a rough proof-of-concept for getting nearest-neighbor searches working with cubes. When I say rough, I mean I have no idea what I'm doing and I haven't written C for 15 years but I hear it got standardized please don't hurt me. It seems to be about 400x faster for a 3D cube with 1 million rows, more like 10-30x for a 6D cube with 10 million rows. The patch adds operator - (which is just the existing cube_distance function) and support function 8, distance (which is just g_cube_distance, a wrapper around cube_distance). The code is in no way production-quality; it is in fact right around look! it compiles!, complete with pasted-in, commented-out code from something I was mimicking. I thought I'd share at this early stage in the hopes I might get some pointers, such as: - What unintended consequences should I be looking for? - What benchmarks should I do? - What kind of edge cases might I consider? - I'm just wrapping cube_distance and calling it through DirectFunctionCall; it's probably more proper to extract out the real function and call it from both cube_distance and g_cube_distance. Right? - What else don't I know? (Besides C, funny man.) The patch, such as it is, is at: https://github.com/jaylevitt/postgres/commit/9cae4ea6bd4b2e582b95d7e1452de0a7aec12857 with an even-messier test at https://github.com/jaylevitt/postgres/commit/daa33e30acaa2c99fe554d88a99dd7d78ff6c784 I initially thought this patch made inserting and indexing slower, but then I realized the fast version was doing 1 million rows, and the slow one did 10 million rows. Which means: dinnertime. Jay Levitt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers