custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
Hi all. I am trying to make some experiments in an algorithm that scores results by counting how many words of the query submited are in a document. For example if i enter the query A B D A The similarities I want to get for the documents follows: A A C F D (2-found A and D) A B D S S A (3 -

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
Before I make this questions I have been looking the list for over 2 hours and I didn't find something to make me understand how to do what I want. After you sent the message I made a quick pass through all your messages, but I didn't find something. I also searched for FakeNormsIndexReader and s

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
I feel kind of stupid...I don't get what hossman says in his post. I got the thing abou the OMMIT_NORMS and I tried to do it by calling Field.setOmitNorms(true); before adding a field in the index. After that I re-indexed my collection but still not making any difference. Tell me if I got it rig

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
But i don't want to get the frequency of each term in the doc. what I want is 1 if the term exists in the doc and 0 if it doesn't. After this, I want all thes 1s and 0s to be summed and give me a number to use as a score. If I set the TF value as 1 or 0, as I described above, I get the right num

Re: custom similarity based on tf but greater than 1.0

2007-01-18 Thread Vagelis Kotsonis
It is 4 in the morning here in Greece, so I will try it tomorrow...sometime I must sleep! I will come up with the results tomorrow. Thanks! Vagelis markrmiller wrote: > > A...I brushed over your example too fast...looked like normal > counting to me...I see now what you mean. So OMIT_NORM

Re: custom similarity based on tf but greater than 1.0

2007-01-23 Thread Vagelis Kotsonis
eed. It really comes down to makeing a > FakeNormsIndexReader. The problem you are having is a result of the > field size normalization. > > - mark > > Vagelis Kotsonis wrote: >> Hi all. >> I am trying to make some experiments in an algorithm that scores results >>

Re: custom similarity based on tf but greater than 1.0

2007-01-23 Thread Vagelis Kotsonis
obably did > work. Are you getting the results through hits? Hits will normalize. Use > topdocs or a hitcollector. > > - Mark > > Vagelis Kotsonis wrote: >> But i don't want to get the frequency of each term in the doc. >> >> what I want is 1 if the term e