On Apr23, 2013, at 23:25 , Alexander Korotkov <aekorot...@gmail.com> wrote:
> I've taken a brief look on the paper and implementation. As I can see 
> iDistance implements some global building strategy. I mean, for example, it 
> selects some point, calculates distances from selected point to all points in 
> dataset etc. So, it uses the whole dataset at the same time. 
> 
> However you can try to implement global index building in GiST or SP-GiST. In 
> this case I think you should carefully estimate your capabilities during 
> single GSoC project. You would need to extend GiST or SP-GiST interface and 
> write completely new implementation of tree building algorithm. Question of 
> how to exactly extend GiST or SP-GiST interface for this case could appear to 
> be very hard even theoretically.

+1. That seemed to be a major roadblock to me too when I read the paper.

You could work around that by making partition identification a separate step. 
You'd have a function

  idist_analyze(cfg name, table name, field name)

which'd identify suitable partitions for the data distribution in table.field 
and store them somewhere. Such a set of pre-identified partitions would be akin 
to a tsearch configuration, i.e. all other parts of the iDistance machinery 
would use it to map points to index keys and queries to ranges of those keys. 
You'll want to look at how tsearch handles that, and check if the method can 
indeed be applied to iDistance.

In a first cut, you'd probably only allow inserts into index which don't change 
the maximum distances from the partition centers that idist_analyze() found. 
With that restriction in place, you might get away without GiST or SP-GiSt, and 
simply use postgres' standard btree, if you find a way to map queries of the 
form "field <idistance-knn-operator> idistance_knn_function(point, distance)" 
to "where (idstance_keys(field) between P1_lowerbound AND P2_upperbound) OR 
(idistance_keys(field) between P2_lowerbuild AND P2_upperbound) …" or something 
equivalent. Or you could start with a GiST-based btree and extend it to support 
"distance" function. Looking at contrib/btree_gist and the built-in GiST 
operator class for point types should help.

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to