Hi,
Sorry for being so abtuse; my 9 year old has been helping me with my
work today. :)
The hamming distance between two bit vectors can also be found as the
sum of distances between sub-vectors. Let's assume you're looking
for all vectors less than d bits away from a search vector, and we'll
d
Hi,
Try breaking the vector into 4 bigint columns and building a multi-column
index, with index columns going from the most evenly distributed to the
least. Depending on the distribution of your data, you may only need 2
or 3 columns in the index. If you can cluster the table in that order,
it
Hrm, I don't understand. Can you give me an example with some
reasonably sized vectors?
On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote:
Hi,
Try breaking the vector into 4 bigint columns and building a multi-
column
index, with index columns going from the most evenly distributed to
the
l
Hi,
It may be that I don't understand your problem. :)
Are you searching the table for the closest vector? If so, is
"closeness" defined only as the number of bits that are different?
Or, do you need to know which bits as well?
-- todd
Ben wrote:
Hrm, I don't understand. Can you give me an e
Just the number of bits, not which ones. Basically, the hamming
distance.
On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote:
Hi,
It may be that I don't understand your problem. :)
Are you searching the table for the closest vector? If so, is
"closeness" defined only as the number of bits tha
Ben wrote:
Just the number of bits, not which ones. Basically, the hamming distance.
I see. Could you pre-compute the bit counts for the vectors in the table?
You could count the bits in the search vector as Martijn suggested, and then
do a lookup based on the count.
-- todd
Yeah, I thought about this, but damn my 3-dimensional mind, I just
can't convince myself this will work for 256 dimensions. :) Or
rather, I can see how it could, *given* good placements of the grid
points to snap each vector to. Unfortunately, I don't know enough
theory to know what good pl
Sure, but unless I can figure out some way to choose a small number
of vectors, I'm left with computing the full N^2 set. Given that I'm
shooting for N to be 4 million or larger, that's a lot of data to
store.
On Oct 2, 2005, at 12:14 PM, Todd A. Cook wrote:
Ben wrote:
Just the numbe
On Sun, Oct 02, 2005 at 10:17:10AM -0700, Ben wrote:
> Yeah, I thought about this, but damn my 3-dimensional mind, I just
> can't convince myself this will work for 256 dimensions. :) Or
> rather, I can see how it could, *given* good placements of the grid
> points to snap each vector to. Unf
Hrm, I'll think about it. Thanks!
On Oct 2, 2005, at 12:50 PM, Martijn van Oosterhout wrote:
On Sun, Oct 02, 2005 at 10:17:10AM -0700, Ben wrote:
Yeah, I thought about this, but damn my 3-dimensional mind, I just
can't convince myself this will work for 256 dimensions. :) Or
rather, I can see
On Sun, Oct 02, 2005 at 08:28:53AM -0700, Ben wrote:
> Yes, that's the straightforward way to do it. But given that my
> vectors are 256 bits in length, and that I'm going to eventually have
> about 4 million of them to search through, I was hoping greater minds
> than mine had figured out ho
Yes, that's the straightforward way to do it. But given that my
vectors are 256 bits in length, and that I'm going to eventually have
about 4 million of them to search through, I was hoping greater minds
than mine had figured out how to do it faster, or how compute some
kind of indexing
On Sat, Oct 01, 2005 at 07:12:13PM -0700, Ben wrote:
> I'm looking for a quick way to find the number of bits that are
> different between two bitmasks of the same size. I'll be doing this a
> lot, so speed is desirable some kind of indexing would be even
> better. It seems like this prob
I'm looking for a quick way to find the number of bits that are
different between two bitmasks of the same size. I'll be doing this a
lot, so speed is desirable some kind of indexing would be even
better. It seems like this problem must have been solved before
--
have a table
title, description, keywords
which I am searching (from PHP) using a keyword
What I want to do is sort the results based on the number of hits nd scoring
based on where the hit is. For example, a hit in keywords is worth 5, title is
worth 3, description is worth 1.\
I curren
Any tips on how to efficiently score fields with altavista-like query
strings?
I currently use the following PL/Perl function, which unfortunatly is
rather slow, even though I have already simplified it quite a bit...
# Example: SELECT id, score(description, 'a?pha -"beta gamma"') FROM
table;
C
16 matches
Mail list logo