Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'
We use a dismax handler with mm 1 in our Solr installation. I have a fieldType defined that creates shingles to handle space variations in the input strings and user queries. This fieldType can successfully handle cases where the query is 'thunderbolt' and the document contains the string 'thunder bolt' (the shingle results in the token 'thunderbolt' created during indexing). However, due to the pre-analysis whitespace tokenization done by lucene query parser, the reverse is not handled well - document with string 'thunderbolt' being matched to query 'thunder bolt'. I find that in our dismax handler the shingle field records a match and scores on the 'pf' but the document is not returned as none of the fields in 'qf' record a match (mm is 1). I am looking for suggestions on how to handle this scenario. Using a synonym will obviously work but it seems a rather hackish solution. Is there a more elegant way of achieving a similar effect? Alternatively, is there a way to get the 'mm' parameter to factor in matches on 'pf' also? Kindly help. Regards, Prasanna
Re: slow highlighting because of stemming
Hi, Thanks for the answer! I am doing some logging about stemming, and what I can see is that a lot of tokens are stemmed for the highlighting. It is the strange part, since I don't understand why does any highlighter need stemming again. Anyway my docments are not really large, just a few kilobytes, but thanks for this suggestion. If you could help me in how could I just ignore the stemming for highlighting thing it would be very great! Thanks, Gyuri 2011/7/29 Mike Sokolov soko...@ifactory.com I'm not sure I would identify stemming as the culprit here. Do you have very large documents? If so, there is a patch for FVH committed to limit the number of phrases it looks at; see hl.phraseLimit, but this won't be available until 3.4 is released. You can also limit the amount of each document that is analyzed by the regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to FVH? not sure) Using RegexFragmenter is also probably slower than something like SimpleFragmenter. There is work to implement faster highlighting for Solr/Lucene, but it depends on some basic changes to the search architecture so it might be a while before that becomes available. See https://issues.apache.org/** jira/browse/LUCENE-3318https://issues.apache.org/jira/browse/LUCENE-3318if you're interested in following that development. -Mike On 07/29/2011 04:55 AM, Orosz György wrote: Dear all, I am quite new about using Solr, but would like to ask your help. I am developing an application which should be able to highlight the results of a query. For this I am using regex fragmenter: highlighting fragmenter name=regex class=org.apache.solr.**highlight.RegexFragmenter lst name=defaults int name=hl.fragsize500/int float name=hl.regex.slop0.5/**float str name=hl.pre![CDATA[b]]**/str str name=hl.post![CDATA[/b]]**/str str name=hl.**useFastVectorHighlighter**true/str str name=hl.regex.pattern[-\w ,/\n\']{20,300}[.?!]/str str name=hl.fldokumentum_syn_**query/str /lst /fragmenter /highlighting The field is indexed with term vectors and offsets: field name=dokumentum_syn_query type=huntext_syn indexed=true stored=true multiValued=true termVectors=on termPositions=on termOffsets=on/ fieldType name=huntext_syn class=solr.TextField stored=true indexed=true positionIncrementGap=100 analyzer type=index tokenizer class=com.morphologic.solr.**huntoken.HunTokenizerFactory/** filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_query.txt enablePositionIncrements=**true / filter class=com.morphologic.solr.**hunstem.**HumorStemFilterFactory lex=/home/oroszgy/workspace/**morpho/solrplugins/data/lex cache=alma/ filter class=solr.**LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.**StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_query.txt enablePositionIncrements=**true / filter class=com.morphologic.solr.**hunstem.**HumorStemFilterFactory lex=/home/oroszgy/workspace/**morpho/solrplugins/data/lex cache=alma/ filter class=solr.**SynonymFilterFactory synonyms=synonyms_query.txt ignoreCase=true expand=true/ filter class=solr.**LowerCaseFilterFactory/ /analyzer /fieldType The highlighting works well, excepts that its really slow. I realized that this is because the highlighter/fragmenter does stemming for all the results documents again. Could you please help me why does it happen an how should I avoid this? (I thought that using fastvectorhighlighter will solve my problem, but it didn't) Thanks in advance! Gyuri Orosz
fragsize for highlighting
Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm still getting snippets being returned with 10 characters (I think I'm getting the full text back). I also tried specifying hl.fragsize in the querystring, but the same thing happens. Any idea why fragsize is not getting picked up? Thanks!
Re: slow highlighting because of stemming
I am doing some logging about stemming, and what I can see is that a lot of tokens are stemmed for the highlighting. It is the strange part, since I don't understand why does any highlighter need stemming again. Highlighting do re-analyze the text being highlighted. Anyway my docments are not really large, just a few kilobytes, but thanks for this suggestion. If you could help me in how could I just ignore the stemming for highlighting thing it would be very great! If you store term vectors, the this re-analyze is skipped. http://wiki.apache.org/solr/FieldOptionsByUseCase
Re: fragsize for highlighting
Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm still getting snippets being returned with 10 characters (I think I'm getting the full text back). I also tried specifying hl.fragsize in the querystring, but the same thing happens. Any idea why fragsize is not getting picked up? May be you are setting it twice? What is the output of echoParams=all?
Re: Autocomplete with Solr 3.1
According to http://www.lucidimagination.com/blog/2011/04/08/solr-powered-isfdb-part-9/ it should be possible to set spellcheck.maxCollations to 5. This doesn't work for me in 4.0, nor does it work with the regular spellchecker, unless I set spellcheck.maxCollationTries to a value like 10. Then I get a list of collations. However adding these parameters to the suggester doesn't do anything. Is this common behavior? Or is my Solr borked? -- View this message in context: http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3211775.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow highlighting because of stemming
On 7/30/2011 3:46 AM, Orosz György wrote: Hi, Thanks for the answer! I am doing some logging about stemming, and what I can see is that a lot of tokens are stemmed for the highlighting. It is the strange part, since I don't understand why does any highlighter need stemming again. Consider that the highlighter needs to match terms from the query with terms from the document, just like search. If the indexed document has been stemmed, then the query also needs to be stemmed, or you won't see matches. -Mike
Re: fragsize for highlighting
I'm a bit of a newbie- adding echoParams=all to my querystring isn't yielding additional info (does solr 1.4 support it?). Here's a query (also tried adding hl.fragsize=10): http://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:(Task)hl=ondefType=dismaxrows=30echoParams=all response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=hl.fragsize10/str str name=fl* score/str str name=start0/str str name=qimmanu/str str name=qfdescription_texts/str str name=hl.simple.pre@@@hl@@@/str str name=hl.simple.post@@@endhl@@@/str str name=fqtype:(Task)/str str name=hlon/str str name=defTypedismax/str str name=rows30/str /lst /lst lst name=highlighting ... str @@@hl@@@some s@@@endhl@@@uper long piece of text. long interesting stuff and text gofish found /str /arr ... /response On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm still getting snippets being returned with 10 characters (I think I'm getting the full text back). I also tried specifying hl.fragsize in the querystring, but the same thing happens. Any idea why fragsize is not getting picked up? May be you are setting it twice? What is the output of echoParams=all?
Re: Solr Incremental Indexing
I always have a field in my databases called datelastmodified, so whenever I update that record, i set it to getdate() - mssql func - and then get all latest records order by that field. 2011/7/29 Mohammed Lateef Hussain mohammedlateefh...@gmail.com Hi Need some help in Solr incremental indexing approch. I have built my Solr index using SolrJ API and now want to update the index whenever any changes has been made in database. My requirement is not to use DB triggers to call any update events. I want to update my index on the fly whenever my application updates any record in database. Note: My indexing logic to get the required data from DB is some what complex and involves many tables. Please suggest me how can I proceed here. Thanks Lateef -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: fragsize for highlighting
I suspected that you set fragsize twice, but from what you paste thats not the case. e.g. f.description_texts.hl.fragsize=100hl.fragsize=10 However the response you pasted is not coming from that URL. It will be better to see compatible URL and response. echoParams=all displays all parameters used. Both defaults defined in solrconfig.xml and the ones in URL. http://wiki.apache.org/solr/CoreQueryParameters#echoParams --- On Sat, 7/30/11, Frank Chiu frank.c...@gmail.com wrote: From: Frank Chiu frank.c...@gmail.com Subject: Re: fragsize for highlighting To: Ahmet Arslan iori...@yahoo.com Cc: solr-user@lucene.apache.org Date: Saturday, July 30, 2011, 9:35 PM I'm a bit of a newbie- adding echoParams=all to my querystring isn't yielding additional info (does solr 1.4 support it?). Here's a query (also tried adding hl.fragsize=10): http://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:(Task)hl=ondefType=dismaxrows=30echoParams=all response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=hl.fragsize10/str str name=fl* score/str str name=start0/str str name=qimmanu/str str name=qfdescription_texts/str str name=hl.simple.pre@@@hl@@@/str str name=hl.simple.post@@@endhl@@@/str str name=fqtype:(Task)/str str name=hlon/str str name=defTypedismax/str str name=rows30/str /lst /lst lst name=highlighting ... str @@@hl@@@some s@@@endhl@@@uper long piece of text. long interesting stuff and text gofish found /str /arr ... /response On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm still getting snippets being returned with 10 characters (I think I'm getting the full text back). I also tried specifying hl.fragsize in the querystring, but the same thing happens. Any idea why fragsize is not getting picked up? May be you are setting it twice? What is the output of echoParams=all?
Re: fragsize for highlighting
I ended up removing the EdgeNGramFilterFactory and the highlighting seems to work okay. Thanks for your help, echoParams is useful. On Sat, Jul 30, 2011 at 2:07 PM, Ahmet Arslan iori...@yahoo.com wrote: I suspected that you set fragsize twice, but from what you paste thats not the case. e.g. f.description_texts.hl.fragsize=100hl.fragsize=10 However the response you pasted is not coming from that URL. It will be better to see compatible URL and response. echoParams=all displays all parameters used. Both defaults defined in solrconfig.xml and the ones in URL. http://wiki.apache.org/solr/CoreQueryParameters#echoParams --- On Sat, 7/30/11, Frank Chiu frank.c...@gmail.com wrote: From: Frank Chiu frank.c...@gmail.com Subject: Re: fragsize for highlighting To: Ahmet Arslan iori...@yahoo.com Cc: solr-user@lucene.apache.org Date: Saturday, July 30, 2011, 9:35 PM I'm a bit of a newbie- adding echoParams=all to my querystring isn't yielding additional info (does solr 1.4 support it?). Here's a query (also tried adding hl.fragsize=10): http://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:(Task)hl=ondefType=dismaxrows=30echoParams=allhttp://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:%28Task%29hl=ondefType=dismaxrows=30echoParams=all response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=hl.fragsize10/str str name=fl* score/str str name=start0/str str name=qimmanu/str str name=qfdescription_texts/str str name=hl.simple.pre@@@hl@@@/str str name=hl.simple.post@@@endhl@@@/str str name=fqtype:(Task)/str str name=hlon/str str name=defTypedismax/str str name=rows30/str /lst /lst lst name=highlighting ... str @@@hl@@@some s@@@endhl@@@uper long piece of text. long interesting stuff and text gofish found /str /arr ... /response On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm still getting snippets being returned with 10 characters (I think I'm getting the full text back). I also tried specifying hl.fragsize in the querystring, but the same thing happens. Any idea why fragsize is not getting picked up? May be you are setting it twice? What is the output of echoParams=all?
Solr request filter and indexing process
Hello,Dear friends, I have got an problem in developing with solr. In My Application ,It must sends multiple query to solr server after the page is loaded. Then I found a problem: some request will return statusCode:0 and QTime:0, The solr has accepted the request, but It does not return a result document. If I send each request one by one manually ,It will return the result. But If I send the request frequently in a very short times, It will return nothing only statusCode:0 and QTime:0. I think this may be a stratege for solr. but i can't find any documents or discussions on the internet. so i want you can help me. edited on 2011-07-28 and now I have a new problem, I am developing on php, so I connect solr through solrPhpClient( an opensource project on google code). I find the speed of add many documents is very slow. when I add ten documents to an solr index, It must takes more than 5 minutes(Because of the commit process ) anybody can help me?