DisMaxQuery calculating too high sumOfSquaredWeights?

2010-11-26 Thread Jan Kurella
Hi there, I was composing a Query like the Solr.DisMaxQueryHandler would do on my own as I needed a different Tokenizing strategy for non whitespace separated languages and more. The concept I took from http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/ Assume now the following:

Re: DisMaxQuery calculating too high sumOfSquaredWeights?

2010-11-26 Thread Jan Kurella
On 26.11.2010 14:39, ext Jan Kurella wrote: Hi there, I was composing a Query like the Solr.DisMaxQueryHandler would do on my own as I needed a different Tokenizing strategy for non whitespace separated languages and more. The concept I took from http://www.lucidimagination.com/blog/2010/05/2

Re: DisMaxQuery calculating too high sumOfSquaredWeights?

2010-11-26 Thread Jan Kurella
On 26.11.2010 14:50, ext Jan Kurella wrote: On 26.11.2010 14:39, ext Jan Kurella wrote: Hi there, I was composing a Query like the Solr.DisMaxQueryHandler would do on my own as I needed a different Tokenizing strategy for non whitespace separated languages and more. The concept I took from h

Re: not indexing analyzed field

2010-11-26 Thread Erick Erickson
So, you're using Solr, right? And have a custom analyzer? If that's the case, Uwe pointed you in the right direction and I think everything may be working fine, or at least as I'd expect. Specifying stored="true" puts a verbatim, unanalyzed copy of the data in the index. When you display a field i

TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Hi All I was wondering whether I can use TermRangeQuery for my use case. I have a collection of ids (represented as XDF-123) and I would like to do a search for all the ids (might be in the range of 1) and for each matching id I want to get the corresponding data that is stored in the inde

Re: best practice: 1.4 billions documents

2010-11-26 Thread Yonik Seeley
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler wrote: > (Fuzzy scores on > MultiSearcher and Solr are totally wrong because each shard uses another > rewritten query). Hmmm, really? I thought that fuzzy scoring should just rely on edit distance? Oh wait, I think I see - it's because we can use

Re: TermRangeQuery

2010-11-26 Thread Ian Lea
Absolutely, as long as your ids will sort as you expect. I'm not clear what you mean by XDF-123 but if you've got AAA-123 AAA-124 ... ABC-123 ABC-234 etc. then you'll be fine. If they don't sort so neatly you can use the TermRangeQuery constructor that takes a Collator but note the performance

Re: not indexing analyzed field

2010-11-26 Thread Bernd Fehling
Hi Erik, I see my problem, caused by a misunderstanding of the indexing by lucene. I guess its due to the fact that FAST Data Search has real processing pipelines. Youre right I use Solr but, as a matter of fact, in this special case I really want to change the indexed _and_ stored data. For secu

Re: TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Hi Basically test my ids look like: AAA-231 AAD-234 ADD-123 Didn't now about the collator, i was going to do a custom sort based on the number part of the id. Thanks Amin On 26 Nov 2010, at 14:39, Ian Lea wrote: > Absolutely, as long as your ids will sort as you expect. > > I'm not clear w

Re: not indexing analyzed field

2010-11-26 Thread Erick Erickson
Can you "define the problem away"? That is, why do you want to store it at all? If there's no value to the users in seeing the encoded value, just don't store it. You can still search on the encoded value if in that case Which is a way of saying that I don't know, off the top of my head, how y

Re: TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Hi Unfortunately my range query approach did not work. It seems to be related to the ids themselves. The list has ids that look this: ID-NYC-1234 ID-LND-1234 TX-NYC-1334 TX-NYC-BBC-123 The ids may range from 90 to 1000. Is there another approach I could take? I tried building a string wi

RE: best practice: 1.4 billions documents

2010-11-26 Thread Uwe Schindler
This is the problem for Fuzzy: each searcher expands the fuzzy query to a different Boolean Query and so the scores are not comparable - MultiSearcher (but not Solr) tries to combine the resulting rewritten queries into one query, so every searcher has the same query. And here starts the second bu

Re: best practice: 1.4 billions documents

2010-11-26 Thread Robert Muir
On Fri, Nov 26, 2010 at 12:49 PM, Uwe Schindler wrote: > This is the problem for Fuzzy: each searcher expands the fuzzy query to a > different Boolean Query and so the scores are not comparable - MultiSearcher > (but not Solr) tries to combine the resulting rewritten queries into one > query, so e

Re: TermRangeQuery

2010-11-26 Thread Ian Lea
What sort of ranges are you trying to use? Maybe you could store a separate field, just for these queries, with some normalized form of the ids, with all numbers padded out to the same length etc. -- Ian. On Fri, Nov 26, 2010 at 4:34 PM, Amin Mohammed-Coleman wrote: > Hi > > Unfortunately my ra

Re: TermRangeQuery

2010-11-26 Thread Amin Mohammed-Coleman
Essentially I'd like to construct a query which is almost like SQL in clause. The lucene document contains the id and a string value. I'd like to get the string value based on the id key. The ids may range within 1000. Is this possible to do? Thanks Amin Sent from my iPhone On 26 Nov 2010,

Re: not indexing analyzed field

2010-11-26 Thread Bernd Fehling
Hi Erik, the "problem" can be described as follows: - we have a database for users - users can search and mark/store records for watching - the record marker is the unique path to the source and also the unique record id of the database - therefore we decided to sha256 the id as backreference fro