Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook
Hi, Sorry for being so abtuse; my 9 year old has been helping me with my work today. :) The hamming distance between two bit vectors can also be found as the sum of distances between sub-vectors. Let's assume you're looking for all vectors less than d bits away from a search vector, and we'll d

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook
Hi, Try breaking the vector into 4 bigint columns and building a multi-column index, with index columns going from the most evenly distributed to the least. Depending on the distribution of your data, you may only need 2 or 3 columns in the index. If you can cluster the table in that order, it

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Ben
Hrm, I don't understand. Can you give me an example with some reasonably sized vectors? On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote: Hi, Try breaking the vector into 4 bigint columns and building a multi- column index, with index columns going from the most evenly distributed to the l

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook
Hi, It may be that I don't understand your problem. :) Are you searching the table for the closest vector? If so, is "closeness" defined only as the number of bits that are different? Or, do you need to know which bits as well? -- todd Ben wrote: Hrm, I don't understand. Can you give me an e

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Ben
Just the number of bits, not which ones. Basically, the hamming distance. On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote: Hi, It may be that I don't understand your problem. :) Are you searching the table for the closest vector? If so, is "closeness" defined only as the number of bits tha

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Todd A. Cook
Ben wrote: Just the number of bits, not which ones. Basically, the hamming distance. I see. Could you pre-compute the bit counts for the vectors in the table? You could count the bits in the search vector as Martijn suggested, and then do a lookup based on the count. -- todd

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Ben
Yeah, I thought about this, but damn my 3-dimensional mind, I just can't convince myself this will work for 256 dimensions. :) Or rather, I can see how it could, *given* good placements of the grid points to snap each vector to. Unfortunately, I don't know enough theory to know what good pl

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Ben
Sure, but unless I can figure out some way to choose a small number of vectors, I'm left with computing the full N^2 set. Given that I'm shooting for N to be 4 million or larger, that's a lot of data to store. On Oct 2, 2005, at 12:14 PM, Todd A. Cook wrote: Ben wrote: Just the numbe

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Martijn van Oosterhout
On Sun, Oct 02, 2005 at 10:17:10AM -0700, Ben wrote: > Yeah, I thought about this, but damn my 3-dimensional mind, I just > can't convince myself this will work for 256 dimensions. :) Or > rather, I can see how it could, *given* good placements of the grid > points to snap each vector to. Unf

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Ben
Hrm, I'll think about it. Thanks! On Oct 2, 2005, at 12:50 PM, Martijn van Oosterhout wrote: On Sun, Oct 02, 2005 at 10:17:10AM -0700, Ben wrote: Yeah, I thought about this, but damn my 3-dimensional mind, I just can't convince myself this will work for 256 dimensions. :) Or rather, I can see

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Martijn van Oosterhout
On Sun, Oct 02, 2005 at 08:28:53AM -0700, Ben wrote: > Yes, that's the straightforward way to do it. But given that my > vectors are 256 bits in length, and that I'm going to eventually have > about 4 million of them to search through, I was hoping greater minds > than mine had figured out ho

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Ben
Yes, that's the straightforward way to do it. But given that my vectors are 256 bits in length, and that I'm going to eventually have about 4 million of them to search through, I was hoping greater minds than mine had figured out how to do it faster, or how compute some kind of indexing

Re: [GENERAL] scoring differences between bitmasks

2005-10-02 Thread Martijn van Oosterhout
On Sat, Oct 01, 2005 at 07:12:13PM -0700, Ben wrote: > I'm looking for a quick way to find the number of bits that are > different between two bitmasks of the same size. I'll be doing this a > lot, so speed is desirable some kind of indexing would be even > better. It seems like this prob

[GENERAL] scoring differences between bitmasks

2005-10-01 Thread Ben
I'm looking for a quick way to find the number of bits that are different between two bitmasks of the same size. I'll be doing this a lot, so speed is desirable some kind of indexing would be even better. It seems like this problem must have been solved before --

[GENERAL] scoring select results

2003-07-28 Thread Dave [Hawk-Systems]
have a table title, description, keywords which I am searching (from PHP) using a keyword What I want to do is sort the results based on the number of hits nd scoring based on where the hit is. For example, a hit in keywords is worth 5, title is worth 3, description is worth 1.\ I curren

[GENERAL] Scoring

2000-07-05 Thread Eric Jain
Any tips on how to efficiently score fields with altavista-like query strings? I currently use the following PL/Perl function, which unfortunatly is rather slow, even though I have already simplified it quite a bit... # Example: SELECT id, score(description, 'a?pha -"beta gamma"') FROM table; C