Re: [HACKERS] semi-PoC: kNN-gist for cubes

2012-02-07 Thread David Fetter
On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote:
 I have a rough proof-of-concept for getting nearest-neighbor
 searches working with cubes.

Please attach such patches to the email when posting them :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] semi-PoC: kNN-gist for cubes

2012-02-07 Thread David Fetter
On Tue, Feb 07, 2012 at 05:56:52AM -0800, David Fetter wrote:
 On Mon, Feb 06, 2012 at 06:25:33PM -0500, Jay Levitt wrote:
  I have a rough proof-of-concept for getting nearest-neighbor
  searches working with cubes.
 
 Please attach such patches to the email when posting them :)

And here's a cleaned-up version of the patch that at least passes
make check.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
*** a/contrib/cube/cube.c
--- b/contrib/cube/cube.c
***
*** 73,78  PG_FUNCTION_INFO_V1(g_cube_penalty);
--- 73,79 
  PG_FUNCTION_INFO_V1(g_cube_picksplit);
  PG_FUNCTION_INFO_V1(g_cube_union);
  PG_FUNCTION_INFO_V1(g_cube_same);
+ PG_FUNCTION_INFO_V1(g_cube_distance);
  
  Datum g_cube_consistent(PG_FUNCTION_ARGS);
  Datum g_cube_compress(PG_FUNCTION_ARGS);
***
*** 81,86  Datumg_cube_penalty(PG_FUNCTION_ARGS);
--- 82,88 
  Datum g_cube_picksplit(PG_FUNCTION_ARGS);
  Datum g_cube_union(PG_FUNCTION_ARGS);
  Datum g_cube_same(PG_FUNCTION_ARGS);
+ Datum g_cube_distance(PG_FUNCTION_ARGS);
  
  /*
  ** B-tree support functions
***
*** 649,654  g_cube_same(PG_FUNCTION_ARGS)
--- 651,671 
  }
  
  /*
+ ** Distance method
+ */
+ Datum
+ g_cube_distance(PG_FUNCTION_ARGS)
+ {
+   GISTENTRY   *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+   NDBOX   *query = PG_GETARG_NDBOX(1);
+   double  distance;
+ 
+   distance = DatumGetFloat8(DirectFunctionCall2(cube_distance, 
entry-key, PointerGetDatum(query)));
+ 
+   PG_RETURN_FLOAT8(distance);
+ }
+ 
+ /*
  ** SUPPORT ROUTINES
  */
  bool

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] semi-PoC: kNN-gist for cubes

2012-02-06 Thread Jay Levitt
I have a rough proof-of-concept for getting nearest-neighbor searches 
working with cubes.  When I say rough, I mean I have no idea what I'm 
doing and I haven't written C for 15 years but I hear it got standardized 
please don't hurt me.  It seems to be about 400x faster for a 3D cube with 
1 million rows, more like 10-30x for a 6D cube with 10 million rows.


The patch adds operator - (which is just the existing cube_distance 
function) and support function 8, distance (which is just g_cube_distance, a 
wrapper around cube_distance).


The code is in no way production-quality; it is in fact right around look! 
it compiles!, complete with pasted-in, commented-out code from something I 
was mimicking.  I thought I'd share at this early stage in the hopes I might 
get some pointers, such as:


- What unintended consequences should I be looking for?
- What benchmarks should I do?
- What kind of edge cases might I consider?
- I'm just wrapping cube_distance and calling it through DirectFunctionCall; 
it's probably more proper to extract out the real function and call it 
from both cube_distance and g_cube_distance. Right?

- What else don't I know?  (Besides C, funny man.)

The patch, such as it is, is at:

https://github.com/jaylevitt/postgres/commit/9cae4ea6bd4b2e582b95d7e1452de0a7aec12857

with an even-messier test at

https://github.com/jaylevitt/postgres/commit/daa33e30acaa2c99fe554d88a99dd7d78ff6c784

I initially thought this patch made inserting and indexing slower, but then 
I realized the fast version was doing 1 million rows, and the slow one did 
10 million rows.  Which means: dinnertime.


Jay Levitt

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers