Re: [PERFORM] index structure for 114-dimension vector

Arjen van der Meijden Thu, 26 Apr 2007 23:09:27 -0700

On 21-4-2007 1:42 Mark Kirkwood wrote:

I don't think that will work for the vector norm i.e:


|x - y| = sqrt(sum over j ((x[j] - y[j])^2))

I don't know if this is usefull here, but I was able to rewrite thatalgorithm for a set of very sparse vectors (i.e. they had very littleoverlapping factors) to something like:

|x - y| = sum over j (x[j]^2) + sum over j (y[j]^2)

+ for each j where x[j] and y[j] are both non-zero: - (x[j]^2 +y[j]^2) + (x[j] - y[j])^2

The first two parts sums can be calculated only once. So if you havevery little overlap, this is therefore much more efficient (if there isno overlap at all you end up with x[j]^2 + y[j]^2 anyway). Besides, thisrewritten calculation allows you to store the X and Y vectors using atrivial table-layout vector(x,i,value) which is only filled withnon-zero's and which you can trivially self-join to find the closestmatches. You don't care about the j's where there is either no x ory-value anyway with this rewrite.

I can compare over 1000 y's of on average 100 elements to two x's ofover 1000 elements on just a single 1.8Ghz amd processor. (I use it fora bi-kmeans algorithm, so there are only two buckets to compare to).

So it might be possible to rewrite your algorithm to be lesscalculation-intensive. Obviously, with a dense-matrix this isn't goingto work, but there may be other ways to circumvent parts of thealgorithm or to cache large parts of it.It might also help to extract only the 6 relevant columns into aseperate temporary table which will have much smaller records and thuscan fit more records per page.


Best regards,

Arjen

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [PERFORM] index structure for 114-dimension vector

Reply via email to