Is it possible to apply index-time synonyms just for a section of the index
I've posted a few questions on synonyms before and finally understood how it worked and settled with index-time synonyms. Seems to work much better than query time synonyms. But now @ my work, they have a special request. They want certain synonyms to be applied only to certain sections of the index. For example, we have legal faqs, forms etc and we have attorneys in our index. The following synonyms for example, california,san diego florida,miami So for a search 'real estate san diego', it makes sense to return all faqs, forms for 'california' in the index but doesn't make sense to return a real estate attorney elsewhere in california (like burbank) besides just restricting to san diego attorneys. To be more clear I want to be able to return all california faqs forms for 'real estate san diego' but not all california attorneys for the same. That means, i should index the faqs, forms with the state = city mappings as above but not for attorneys. Well I could index all other resources like faqs, forms first with these synonyms, then remove them and index attorneys. But that wouldn't work well in my case because we have a scheduler set up that runs every night to index any new resources from our database. Can someone suggest a good solution for this? -- View this message in context: http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24209490.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to apply index-time synonyms just for a section of the index
That's right. Simple. I can very well do that. Why didn't I think of it. Thanks. rswart wrote: What is stopping you from defining different field types for faqs and attorneys? One with index time synomyms and one without. anuvenk wrote: I've posted a few questions on synonyms before and finally understood how it worked and settled with index-time synonyms. Seems to work much better than query time synonyms. But now @ my work, they have a special request. They want certain synonyms to be applied only to certain sections of the index. For example, we have legal faqs, forms etc and we have attorneys in our index. The following synonyms for example, california,san diego florida,miami So for a search 'real estate san diego', it makes sense to return all faqs, forms for 'california' in the index but doesn't make sense to return a real estate attorney elsewhere in california (like burbank) besides just restricting to san diego attorneys. To be more clear I want to be able to return all california faqs forms for 'real estate san diego' but not all california attorneys for the same. That means, i should index the faqs, forms with the state = city mappings as above but not for attorneys. Well I could index all other resources like faqs, forms first with these synonyms, then remove them and index attorneys. But that wouldn't work well in my case because we have a scheduler set up that runs every night to index any new resources from our database. Can someone suggest a good solution for this? -- View this message in context: http://www.nabble.com/Is-it-possible-to-apply-index-time-synonyms-just-for-a-section-of-the-index-tp24209490p24210788.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms
I happened to revisit this post that I had started long time back. I'm still using the same query time synonyms. Now i want to be able to map cities to states in the synonyms and continuing to have this issue with the multi-word synonyms. Could you please explain what you've done to overcome this issue again please. I didn't quite understand what HIER_FAMILIY_01, SYN_FAMILY_01 are. Thanks. lorenzo zhak wrote: Hi, I had to work with this kind of sides effects reguarding multiwords synonyms. We installed solr on our project that extensively uses synonyms, a big list that sometimes could bring out some wrong match as the one noticed by Anuvenk for instance dui = drunk driving defense or dui,drunk driving defense,drunk driving law query for dui matches dui = drunk driving defense and dui,drunk driving defense,drunk driving law in order to prevent this kind of behavior I gave for every synonyms family (saying a single line in the file) a unique identifier, so the list looks like : dui = HIER_FAMILIY_01 drunk driving defense = HIER_FAMILIY_01 SYN_FAMILY_01, dui,drunk driving defense,drunk driving law I also set the synonyms filter at index time with expand=false, and at query time with expand=false so in this way, the matched synonyms (multi words or single words) in documents are replaced with their family identifier, and not all the possibilities. Indexing with expand=true will add words in documents that could be matched alone, ignoring the fact that they belong to multiwords expression, and this could end up with a wrong match (intending syns mix) at query time. so in this way a query for dui, will be changed by the synonym filter at query time with HIER_FAMILIY_01 or SYN_FAMILY_01 so documents that contains only single words like drunk, driving or law will not be matched since only a document with the phrase drunk driving law would have been indexed with SYN_FAMILY_01. The approach worked pretty good on our project and we do not notice any sides effects on the searches, it only removes matched documents that were considered as noise of the synonyms mix issue. I think this could be usefull to add this kind of approach on the solr synoyms filter section of the wiki, Cheers Laurent On Dec 2, 2007 3:41 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi (changing to solr-user list) Yes it is, especially if the terms left of = are multi-spaced. Check out the Wiki, one page there explains this nicely. Otis - Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk anuvenkat...@hotmail.com To: solr-...@lucene.apache.org Sent: Saturday, December 1, 2007 1:21:49 AM Subject: Re: synonyms Ideally, would it be a good idea to pass the index data through the synonyms filter while indexing? Also, say i have this mapping dui = drunk driving defense or dui,drunk driving defense,drunk driving law so matches for dui, will also bring up matches for drunk driving law (the whole phrase) or does it also bring up all matches for 'drunk' , 'driving','law' ? Yonik Seeley wrote: On Nov 30, 2007 5:39 PM, anuvenk anuvenkat...@hotmail.com wrote: Should data be re-indexed everytime synonyms like word1,word2 or word1 = word2 are added to synonyms.txt Yes, if it changes the index (if it's used in the index anaylzer as opposed to just the query analyzer). -Yonik -- View this message in context: http://www.nabble.com/synonyms-tf4925232.html#a14100346 Sent from the Solr - Dev mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Re%3A-synonyms-tp14116132p23860862.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there Downside to a huge synonyms file?
I tried adding some city to state mappings in the synonyms file. I'm using the dismax handler for phrase matching. So as when i add more more city to state mappings, I end up with zero results for state based searches. Eg: ca,california,los angeles ca,california,san diego ca,california,san francisco ca,california,burbankand so on now a city based search returns a few other california results but a state based search like dui california is returning zero results. I checked the parsedquery_toString and I see no 'OR' although the default operator is 'OR' in schema. It looks like its trying to find matches for all those cities as they are mapped to 'california' and hence returns zero results. How to force dismax to use 'OR' and not 'AND' even though the schema has 'OR'. Or is this how dismax works? Can someone explain how to overcome this problem. Here is my custom request handler that extends dismax requestHandler name=qfacet class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfname^2.0 text^0.8/str !-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 - 5 shld match; above 6 - 90% match -- str name=mm3lt;-1 4lt;-1 5lt;-1 6lt;90%/str str name=pf text^0.8 name^2.0 /str int name=qs4/int int name=ps4/int str name=fl *,score /str /lst lst name=invariants !--str name=facet.fieldresourceType/str str name=facet.fieldcategory/str str name=facet.fieldstateName/str-- str name=facet.sortfalse/str int name=facet.mincount1/int /lst /requestHandler Thanks. Otis Gospodnetic wrote: Hello, 300K is a pretty small index. I wouldn't worry about the number of synonyms unless you are turning a single term into dozens of ORed terms. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk anuvenkat...@hotmail.com To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 11:28:43 PM Subject: Re: Is there Downside to a huge synonyms file? I'm using query time synonyms. I have more fields in my index though. This is just an example or sample of data from my index. Yes, we don't have millions of documents. Could be around 300,000 and might increase in future. The reason i'm using query time synonyms is because of the nature of my data. I can't re-index the data everytime i add or remove a synonym. But for this particular requirement is it best to have index time synonyms because of the multi-word synonym nature. Again if i add more cities list to the synonym file, I can't be re-indexing all the data over and over again. anuvenk wrote: In my index i have legal faqs, forms, legal videos etc with a state field for each resource. Now if i search for real estate san diego, I want to be able to return other 'california' results i.e results from san francisco. I have the following fields in the index title state description... real estate san diego example 1 california some description real estate carlsbad example 2 california some desc so when i search for real estate san francisco, since there is no match, i want to be able to return the other real estate results in california instead of returning none. Because sometimes they might be searching for a real estate form and city probably doesn't matter. I have two things in mind. One is adding a synonym mapping san diego, california carlsbad, california san francisco, california (which probably isn't the best way) hoping that search for san francisco real estate would map san francisco to california and hence return the other two california results OR adding the mapping of city to state in the index itself like.. title state city description... real estate san diego eg 1california carlsbad, san francisco, san diegosome description real estate carlsbad eg 2 california carlsbad, san francisco, san diegosome description which of the above two is better. Does a huge synonym file affect performance. Or Is there a even better way? I'm sure there is but I can't put my finger on it yet I'm not familiar with java either. -- View this message in context: http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23844761.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23861631.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there Downside to a huge synonyms file?
A small addition to my earlier post. I wonder if its because of the 'mm' param, which requires that until 3 words in search phrase, all the words should be matched. If i alter this now, i'd get ir-relevant results for a lot of popular 1, 2, 3 word search terms. How to solve for this? anuvenk wrote: I tried adding some city to state mappings in the synonyms file. I'm using the dismax handler for phrase matching. So as when i add more more city to state mappings, I end up with zero results for state based searches. Eg: ca,california,los angeles ca,california,san diego ca,california,san francisco ca,california,burbankand so on now a city based search returns a few other california results but a state based search like dui california is returning zero results. I checked the parsedquery_toString and I see no 'OR' although the default operator is 'OR' in schema. It looks like its trying to find matches for all those cities as they are mapped to 'california' and hence returns zero results. How to force dismax to use 'OR' and not 'AND' even though the schema has 'OR'. Or is this how dismax works? Can someone explain how to overcome this problem. Here is my custom request handler that extends dismax requestHandler name=qfacet class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfname^2.0 text^0.8/str !-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 - 5 shld match; above 6 - 90% match -- str name=mm3lt;-1 4lt;-1 5lt;-1 6lt;90%/str str name=pf text^0.8 name^2.0 /str int name=qs4/int int name=ps4/int str name=fl *,score /str /lst lst name=invariants !--str name=facet.fieldresourceType/str str name=facet.fieldcategory/str str name=facet.fieldstateName/str-- str name=facet.sortfalse/str int name=facet.mincount1/int /lst /requestHandler Thanks. Otis Gospodnetic wrote: Hello, 300K is a pretty small index. I wouldn't worry about the number of synonyms unless you are turning a single term into dozens of ORed terms. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk anuvenkat...@hotmail.com To: solr-user@lucene.apache.org Sent: Tuesday, June 2, 2009 11:28:43 PM Subject: Re: Is there Downside to a huge synonyms file? I'm using query time synonyms. I have more fields in my index though. This is just an example or sample of data from my index. Yes, we don't have millions of documents. Could be around 300,000 and might increase in future. The reason i'm using query time synonyms is because of the nature of my data. I can't re-index the data everytime i add or remove a synonym. But for this particular requirement is it best to have index time synonyms because of the multi-word synonym nature. Again if i add more cities list to the synonym file, I can't be re-indexing all the data over and over again. anuvenk wrote: In my index i have legal faqs, forms, legal videos etc with a state field for each resource. Now if i search for real estate san diego, I want to be able to return other 'california' results i.e results from san francisco. I have the following fields in the index title state description... real estate san diego example 1 california some description real estate carlsbad example 2 california some desc so when i search for real estate san francisco, since there is no match, i want to be able to return the other real estate results in california instead of returning none. Because sometimes they might be searching for a real estate form and city probably doesn't matter. I have two things in mind. One is adding a synonym mapping san diego, california carlsbad, california san francisco, california (which probably isn't the best way) hoping that search for san francisco real estate would map san francisco to california and hence return the other two california results OR adding the mapping of city to state in the index itself like.. title state city description... real estate san diego eg 1california carlsbad, san francisco, san diegosome description real estate carlsbad eg 2 california carlsbad, san francisco, san diegosome description which of the above two is better. Does a huge synonym file affect performance. Or Is there a even better way? I'm sure there is but I can't put my finger on it yet I'm not familiar with java either. -- View this message in context: http://www.nabble.com/Is-there-Downside
Re: Dismax handler phrase matching question
I have to search over multiple fields so passing everything in the 'q' might not be neat. Can something be done with the facet.query to accomplish this. I'm using the facet parameters. I'm not familiar with java so not sure if a function query could be used to accomplish this. Any other thoughts? Shalin Shekhar Mangar wrote: On Tue, Jun 2, 2009 at 12:53 AM, anuvenk anuvenkat...@hotmail.com wrote: title state dui faq1 california dui faq2 florida dui faq3 federal Now I want to be able to return federal results irrespective of the state. For example dui california should return all federal results for 'dui' also along with california results. Perhaps you just need to create your query in such a way that both match? q=title:(dui california) state:(dui california) state:federal -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Dismax-handler-phrase-matching-question-tp23820340p23840154.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there Downside to a huge synonyms file?
In my index i have legal faqs, forms, legal videos etc with a state field for each resource. Now if i search for real estate san diego, I want to be able to return other 'california' results i.e results from san francisco. I have the following fields in the index title state description... real estate san diego example 1 california some description real estate carlsbad example 2 california some desc so when i search for real estate san francisco, since there is no match, i want to be able to return the other real estate results in california instead of returning none. Because sometimes they might be searching for a real estate form and city probably doesn't matter. I have two things in mind. One is adding a synonym mapping san diego, california carlsbad, california san francisco, california (which probably isn't the best way) hoping that search for san francisco real estate would map san francisco to california and hence return the other two california results OR adding the mapping of city to state in the index itself like.. title state city description... real estate san diego eg 1california carlsbad, san francisco, san diegosome description real estate carlsbad eg 2 california carlsbad, san francisco, san diegosome description which of the above two is better. Does a huge synonym file affect performance. Or Is there a even better way? I'm sure there is but I can't put my finger on it yet I'm not familiar with java either. -- View this message in context: http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23842527.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there Downside to a huge synonyms file?
I'm using query time synonyms. I have more fields in my index though. This is just an example or sample of data from my index. Yes, we don't have millions of documents. Could be around 300,000 and might increase in future. The reason i'm using query time synonyms is because of the nature of my data. I can't re-index the data everytime i add or remove a synonym. But for this particular requirement is it best to have index time synonyms because of the multi-word synonym nature. Again if i add more cities list to the synonym file, I can't be re-indexing all the data over and over again. anuvenk wrote: In my index i have legal faqs, forms, legal videos etc with a state field for each resource. Now if i search for real estate san diego, I want to be able to return other 'california' results i.e results from san francisco. I have the following fields in the index title state description... real estate san diego example 1 california some description real estate carlsbad example 2 california some desc so when i search for real estate san francisco, since there is no match, i want to be able to return the other real estate results in california instead of returning none. Because sometimes they might be searching for a real estate form and city probably doesn't matter. I have two things in mind. One is adding a synonym mapping san diego, california carlsbad, california san francisco, california (which probably isn't the best way) hoping that search for san francisco real estate would map san francisco to california and hence return the other two california results OR adding the mapping of city to state in the index itself like.. title state city description... real estate san diego eg 1california carlsbad, san francisco, san diegosome description real estate carlsbad eg 2 california carlsbad, san francisco, san diegosome description which of the above two is better. Does a huge synonym file affect performance. Or Is there a even better way? I'm sure there is but I can't put my finger on it yet I'm not familiar with java either. -- View this message in context: http://www.nabble.com/Is-there-Downside-to-a-huge-synonyms-file--tp23842527p23844761.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax handler phrase matching question
Hello, I'm using the dismax handler for the phrase matching. I have a few legal resources in my index in the following format for example title state dui faq1 california dui faq2 florida dui faq3 federal Now I want to be able to return federal results irrespective of the state. For example dui california should return all federal results for 'dui' also along with california results. i was thinking of a synonym mapping for the states like 'state name' = 'federal' (i.e california,federal florida, federal maine, federal etc ) Is there a better way though? -- View this message in context: http://www.nabble.com/Dismax-handler-phrase-matching-question-tp23820340p23820340.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about Query Phrase Slop (qs) in dismax
Somebody please help clear this doubt. What more could i do with the dismax handler to remove results that don't have 'word1'', 'word2', 'word3' etc in a search phrase not within 5 words of one another, to not come up in the results? anuvenk wrote: From the solr wiki, it sounded like if qs is set to 5 for example, if the search term is 'child custody', only docs with 'child' 'custody' within 5 words of one another would be returned in results. Is this correct? If so, it doesn't seem to be working for me. I see docs with 'child' 'custody' more than 5 words of one another (excluding stop words) which is resulting in bad user experience as those docs are not so relevant. What more could i do to improve quality in the results? -- View this message in context: http://www.nabble.com/Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20648109.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Please Help !! Question about Query Phrase Slop (qs) in dismax
Thanks for the response. Well my current ps setting works great for most search terms. But say this typical example, north dakota 1031 exchange lawyers - we don't have any relevant docs in the index. Solr is returning the irrelevant doc, just because it found 'lawyer', exchange, north dakota somewhere. I thought if there is a way to just not return any results if they are not within close proximity, it would be great. Yonik Seeley wrote: On Sun, Nov 23, 2008 at 11:51 PM, anuvenk [EMAIL PROTECTED] wrote: Please help someone...i've been waiting for an answer for the last couple of days no one seems to be helping out here. I did search the wiki this forum for an answer. But couldn't find an answer. I know if ps is set to 5 words within 5 words of one another receive a boost in score. But is there a way to not return results that have the words in search terms more than 5 words apart. ? Not with dismax. I'm not sure why it's a problem, given that with enough boost you should be able to ensure that all of the results with a slop less than 5 appear before other results. Anyway, if you want to restrict results to those with a slop of 5, use the standard query parser with an explicit sloppy phrase query: north dakota 1031 exchange lawyers~5 -Yonik Typical example: north dakota 1031 exchange lawyers My first result is absolutely ir-relevant. It returned a north dakota doc though but had an occurrence of attorney somewhere an occurrence of exchange (not related to 1031 exchange though). They were not within 5 words of one another. My guys have been hammering me reg this relevancy issue. Please help someone. anuvenk wrote: From the solr wiki, it sounded like if qs is set to 5 for example, if the search term is 'child custody', only docs with 'child' 'custody' within 5 words of one another would be returned in results. Is this correct? If so, it doesn't seem to be working for me. I see docs with 'child' 'custody' more than 5 words of one another (excluding stop words) which is resulting in bad user experience as those docs are not so relevant. What more could i do to improve quality in the results? -- View this message in context: http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20654906.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Please-Help-%21%21-Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20655014.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about Query Phrase Slop (qs) in dismax
From the solr wiki, it sounded like if qs is set to 5 for example, if the search term is 'child custody', only docs with 'child' 'custody' within 5 words of one another would be returned in results. Is this correct? If so, it doesn't seem to be working for me. I see docs with 'child' 'custody' more than 5 words of one another (excluding stop words) which is resulting in bad user experience as those docs are not so relevant. What more could i do to improve quality in the results? -- View this message in context: http://www.nabble.com/Question-about-Query-Phrase-Slop-%28qs%29-in-dismax-tp20643003p20643003.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question about dismax 'mm' - give boost to searches by location
Since i didn't receive any response, i think i my question wasn't very clear. If the phrase has 4 words (last will and testament florida - and will be removed by stopwordfilter), right now solr matches docs with atleast 3 out of those 4 words. So whats happening is last will and testament from all states are returned although user specifically asked for florida will. I don't want to alter the 'mm' either because its working fine for other searches. Just for the search terms with a 'location' , i want to be able to match all words. Any easy way to do this? Someone please? anuvenk wrote: I use the 'dismax handler' for my phrase matching. And i have the 'mm' set this way: Up to 3 words, match all up to 4, match 3 up to 4, match 3 so on Its been working fine, but for certain phrases like 'san diego drunk driving defense attorney', its brings up dui attorneys for other cities first because the way i've set the 'mm' its trying to match docs with any 4 words from the search phrase. In such cases how do i make solr return the san diego listing first. I don't want to make phrase matching stricter either (i.e don't want to change the current 'mm' configuration) Any way to solve for this? -- View this message in context: http://www.nabble.com/Question-about-dismax-%27mm%27---give-boost-to-searches-by-location-tp20606730p20628404.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about dismax 'mm' - give boost to searches by location
I use the 'dismax handler' for my phrase matching. And i have the 'mm' set this way: Up to 3 words, match all up to 4, match 3 up to 4, match 3 so on Its been working fine, but for certain phrases like 'san diego drunk driving defense attorney', its brings up dui attorneys for other cities first because the way i've set the 'mm' its trying to match docs with any 4 words from the search phrase. In such cases how do i make solr return the san diego listing first. I don't want to make phrase matching stricter either (i.e don't want to change the current 'mm' configuration) Any way to solve for this? -- View this message in context: http://www.nabble.com/Question-about-dismax-%27mm%27---give-boost-to-searches-by-location-tp20606730p20606730.html Sent from the Solr - User mailing list archive at Nabble.com.
solr sorting question
Question about sorting with solr. I want to group results in certain sort order so i can split them display in tabs easily. I want to be able to have a custom sort order instead of sort=cat asc score desc In the above mentioned way, categories are grouped in ascending order. But i want certain categories to come up first in the sort order. I don't want them to be grouped in ascending order. Please shed some light anyone. How to do it. Is it possible? -- View this message in context: http://www.nabble.com/solr-sorting-question-tp17498596p17498596.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
Thanks a lot for clearing my doubts. Would you know if the solr wiki is up to date with the documentation for the new features that are being added? I totally rely on the solr wiki documentation for my project. If you may, please send me the files you had mentioned and i'll be happy to test them. I appreciate your help !! scott.tabar wrote: Anuvenk, Sorry for this Third email, but I was reading your question below and I think it warrants yet another reply. Just some background from my focus and involvement, and hence the generation of the JavaDocs. I was primarily interested in having a Solr based spell checker that behaved more like a traditional spell checker. In my application, when I generated the input in to Solr for inclusion of the spell checker indexer, I was only interested in single words and not multi-word sets. My intentions was to send multiple words to the handler and have it return details on each word as it stands independently when the parameter multiWords was set, otherwise it was to use all input words as a single check against the handler. As such, in my original efforts, I had no multiple words in a single term, as you were asking below. That is not to say it is not possible, but I just wanted to let you know the original focus of my work. I did look a little closer at the JavaDocs and it looks like they have been updated from what I originally generated. So perhaps they may be up to date? One thing I would like to point out, is that I put some efforts in creating a test case for the SpellCheckerRequestHandler. If it still exists (I have not checked the head for a long time) then it would be a good starting point to do some simple testing with limited data sets of your own. Just make a copy of it, and then feed in multi-word terms and see how it responds do the different settings. This will also allow you to play around with the configuration settings in the schema and solrconfig files without impacting your actual Solr instance and the turn around time could be in the seconds and not minutes with each alteration of a new test. The locations in svn and file names of the unit tests that I created were: /test/test-files/solr/conf/schema-spellchecker.xml /test/test-files/solr/conf/solrconfig-spellchecker.xml /test/org/apache/solr/handler/SpellCheckerRequestHandlerTest.java If these do not existing in svn currently, let me know and I can pass along the contents and you can recreate them locally to test with. Best of luck, Scott Tabar anuvenk [EMAIL PROTECTED] wrote: Thanks. But i'm looking at this http://.../spellchecker?indent=ononlyMorePopular=trueaccuracy=.6suggestionCount=20q=facial+salophosphoprotein on http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html It seems to return results (well in the example) with and without extendedResults=true does it mean that 'facial salophosphoprotein' was a single term in the index. hossman wrote: : : I did try with the latest nightly build and followed the steps outlined in : http://wiki.apache.org/solr/SpellCheckerRequestHandler : with regards to creating new catchall field 'spell' of type 'spell' and : copied my text fields to 'spell' at index time. : Still q=grapics returns 'graphics' : but q=grapics card returns nothing. : But the same queries return the correct spelling with string fieldtypes. : Any fix available? I don't think Otis was suggesting any specific fix was available in the nightly builds, i believe he was just addressing specificly that if there was a bug someone commited a fix for you didnt' need to wait for 1.3 -- you can test it now using the nightly builds. That said: I don't see any currently open or recent resolved bugs related to spellchecking and multiple words ... i believe (but i'm not 100% positive) that multi word spell correction will work, as long as your dictionary contaisn those multiple words as individual terms ie: if you want graphics card to be a suggestion for grapics card then you need to use a termSourceField in which graphics card is a single term (either because it is untokenized, or maybe because you use a word-based ngram tokenfilter, etc...) alternately, if you want to get graphics asdfghjk as a suggestion for grapics asdfghjk (even though asdfghjk isn't in your index at all), hiting the spellcorrection handler for each input word individually is probably your best bet. : You don't need to wait for 1.3 to be released - you can simply use a : recent nightly build. -Hoss -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15100704.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15115105.html Sent from the Solr - User mailing list archive
Re: spellcheckhandler
Thanks. But i'm looking at this http://.../spellchecker?indent=ononlyMorePopular=trueaccuracy=.6suggestionCount=20q=facial+salophosphoprotein on http://lucene.apache.org/solr/api/org/apache/solr/handler/SpellCheckerRequestHandler.html It seems to return results (well in the example) with and without extendedResults=true does it mean that 'facial salophosphoprotein' was a single term in the index. hossman wrote: : : I did try with the latest nightly build and followed the steps outlined in : http://wiki.apache.org/solr/SpellCheckerRequestHandler : with regards to creating new catchall field 'spell' of type 'spell' and : copied my text fields to 'spell' at index time. : Still q=grapics returns 'graphics' : but q=grapics card returns nothing. : But the same queries return the correct spelling with string fieldtypes. : Any fix available? I don't think Otis was suggesting any specific fix was available in the nightly builds, i believe he was just addressing specificly that if there was a bug someone commited a fix for you didnt' need to wait for 1.3 -- you can test it now using the nightly builds. That said: I don't see any currently open or recent resolved bugs related to spellchecking and multiple words ... i believe (but i'm not 100% positive) that multi word spell correction will work, as long as your dictionary contaisn those multiple words as individual terms ie: if you want graphics card to be a suggestion for grapics card then you need to use a termSourceField in which graphics card is a single term (either because it is untokenized, or maybe because you use a word-based ngram tokenfilter, etc...) alternately, if you want to get graphics asdfghjk as a suggestion for grapics asdfghjk (even though asdfghjk isn't in your index at all), hiting the spellcorrection handler for each input word individually is probably your best bet. : You don't need to wait for 1.3 to be released - you can simply use a : recent nightly build. -Hoss -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15100704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Spell Check Handler
I followed your instructions exactly. But still have trouble with multiword queries for eg: q=grapics returns 'graphics' but q=grapics card returns nothing. I even tried with the latest nightly build but didn't solve the problem. Any solution available. scott.tabar wrote: Matthew, Thanks for the question. The answer is that they come from your own indexes so the dictionary is based upon the actual words that are already stored in Solr. This makes sense; if the spell checker is suggesting a word that is not in the Solr index, then it will not help the user find what they are looking for. You can control which fields in Solr can feed the spell checker. Also you can have more than one spell checker that is focused on a specific subjects. The following example of a SpellCheckerRequestHandler is based upon the one I created for the test case. You need to add this to yor solrconfig.xml file. You can view the whole thing within the Solr source code once it is commited in to the main stream. The path is: /src/test/test-files/solr/conf/solrconfig-spellchecker.xml and schema-spellchecker.xml in the same directory. !-- SpellCheckerRequestHandler takes in a word (or several words) as the value of the q parameter and returns a list of alternative spelling suggestions. If invoked with a ...cmd=rebuild, it will rebuild the spellchecker index. -- requestHandler name=spellchecker class=solr.SpellCheckerRequestHandler startup=lazy !-- default values for query parameters -- lst name=defaults int name=suggestionCount20/int float name=accuracy0.60/float /lst !-- Main init params for handler -- !-- The directory where your SpellChecker Index should live. -- !-- May be absolute, or relative to the Solr dataDir directory. -- !-- If this option is not specified, a RAM directory will be used -- str name=spellcheckerIndexDirspell/str !-- the field in your schema that you want to be able to build -- !-- your spell index on. This should be a field that uses a very -- !-- simple FieldType without a lot of Analysis (ie: string) -- str name=termSourceFieldspell/str /requestHandler Some comments: - The termSourceField should be a field you have defined within your solr schema file. See notes below about the use of this field. - The spellcheckeerIndexDir is the name of the directory that contain the spellchecker indexes. In my example, I used spell, and it will be at the same level of data and conf. You can name it what ever you would like to. - if you use the name of /spellchecker the url will be more RESTful - if you need to have more than one spell checker in use at a time, then you will need to change the name, spellcheckerIndexDir, and termSourceField - If you have more than one spell checker hitting the same index directory, then when you rebuild the index through one of the handlers the other handlers will not know it has been reindexed. To resolve this issue, you may have to restart Solr. The following components are from the schema-spellchecker.xml file: fieldType name=spellText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType field name=spell type=spellText indexed=true stored=true / Some comments on Schema items above: - The fieldType must be contained within the types - The spellText content can be named what every you want - The spellText fieldType should not be too aggressive on stemming or modifying the the contents of the field - Could use string instead of the defined fieldType of spellText, but it does not have to be that restrictive - The field spellText needs to be within the fields group with your other defined fields - You could always use the copyField to either copy another fields content into your spell field: copyField source=misc dest=spell/ Some notes on the name of the handler: - If you precede the name with / you can use the following url instead of the second one: - using the name of /spellchecker http://yourSolrSite/solr/spellchecker?q=sialophosphoprotein - using the name of
Re: Is it possible to add synonyms run time?
Here is what it means by injecting at query time: This is the text field definition i have in my schema fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.TrimFilterFactory / !--filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType and a catchall field of type 'text' field name=text type=text indexed=true stored=false multiValued=true/ and i do a copyfield to copy my important fields that needs to be searched during phrase matching to 'text' field at index time. and the default search field in my case is this catchall field defaultSearchFieldtext/defaultSearchField You can see that i've commented out the syn filter at index time but included it at query time. Just add your synonyms to the synonyms.txt file and they'll be taken in to account during query time. You can check that in the parsedquery_tostring using ur admin tool (localhost:8983/solr/admin/form.jsp) Like its been discussed here, injecting at index time is helpful in finding more matches except that everytime u add synonyms, u'll have to re-index data. It wasn't ideal in my case as my index is huge so i had to do it only at query time, but that poses some problems sometimes.. i have a couple of unanswered questions , if you may know the answer please help me. http://www.nabble.com/Index-time-synonyms-td15073889.html http://www.nabble.com/solr-synonyms-behaviour-td15051211.html Ravish Bhagdev wrote: I see, thanks a lot for this, makes things clear now. So just to make sure I understand this bit, by injecting synonyms at query time you mean basically adding terms implicitly to keywords behind the scenes before passing it to solr? Or is there are more conventional method or interface that is being suggested? Thanks for all the help! Ravish On Jan 25, 2008 3:59 PM, Erick Erickson [EMAIL PROTECTED] wrote: To me, it's really a question of where the work should be done given your problem space. Injecting synonyms at index time allows the queries to be simpler/faster. Injecting the synonyms at query time gets complex but is more flexible. As always, it's a time/space tradeoff. If you're willing to pay the space penalty for increased query speed, inject at index time. Otherwise you can inject at query time. And the query-time injection performance hit may not be trivial. Consider, for instance, span queries. Do you want to pay the price at query time for, say a BooleanQuery that is composed of 5 SpanQueries where each term in each SpanQuery consists of several OR clauses because of synonym injection? Perhaps you do and perhaps you don't. It all depends upon what your data looks like and what your performance criteria are. And you can do other tricks. Consider rather than indexing all the terms, only index the canonical term. That is, consider hit and the synonyms strike, popular, punch. you could index hit for any of the 4 terms, then do the same substitution for your query. Which would make your index smaller *and* your queries faster. But you're right. Injecting synonyms at index time really requires a fixed synonym list that doesn't vary by user. So if you want synonym lists on a per-user basis, you're probably going to have to inject synonyms at query time. Best Erick On Jan 25, 2008 9:46 AM, Ravish Bhagdev [EMAIL PROTECTED] wrote: Yes, I'm fairly new as well. So do you mean adding words to the query effectively doing an or between synonymous terms? That sounds simple way of doing it, if this works, what makes indexing with synonyms useful? Ravish On Jan 25, 2008 2:42 PM, Jon Lehto [EMAIL PROTECTED] wrote: Hi Ravish, You may want to think about the synonym dictionary as being a tool on the
Index time synonyms
I have a hard time understanding the synonyms behaviour..especially because i don't have the syn filter at index time. If i have this synonym at index time Alternative Sentence,Probation before Judgement,Pretrial Diversion does all occurrence of 'alternative sentence' also get indexed as 'probation judgement' and 'pretrial diversion' ? or does it do this wierd grouping (alternative probation pretrial)(sentence diversion)judgement so all occurrences of 'alternative' will be indexed as 'sentence' and 'diversion' ? Then what about the word 'judgement'? Please someone help me understand this. I have another question related to synonyms posted here http://www.nabble.com/solr-synonyms-behaviour-td15051211.html ..please help with that too... -- View this message in context: http://www.nabble.com/Index-time-synonyms-tp15073889p15073889.html Sent from the Solr - User mailing list archive at Nabble.com.
solr synonyms behaviour
I need to understand this synonym behaviour I have this synonym divorce mediation,alternative dispute resolution so when i do a debug this is the parsedquery_tostring i see: (((text:divorc^0.8 | name:divorc^2.0)~0.01 (text:mediat^0.8 | name:mediat^2.0)~0.01)~2) (text:(divorc altern) (disput mediat) resolut~5^0.8 | name:(divorc altern) (disput mediat) resolut~5^2.0)~0.01 I understand how its grouping the synonyms like this: (divorc altern) (disput mediat) resolut Now what i don't understand is how its doing the matching Does it mean it will find all matches with either of the words (divorc altern), either of the words (disput mediat) (and/or) resolut I have the synonym filter only at query time coz i can't re-index data (or portion of data) everytime i add a synonym and a couple of other reasons. Could someone please explain how the matching works in this case. thanks. -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p15051211.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
I did try with the latest nightly build and followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to creating new catchall field 'spell' of type 'spell' and copied my text fields to 'spell' at index time. Still q=grapics returns 'graphics' but q=grapics card returns nothing. But the same queries return the correct spelling with string fieldtypes. Any fix available? Otis Gospodnetic wrote: You don't need to wait for 1.3 to be released - you can simply use a recent nightly build. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:35:52 AM Subject: Re: spellcheckhandler I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15051336.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
I did try with the latest nightly build. The problem still exists. I tested with the example data that comes with solr package. 1)with termsourcefield set to 'word' which is string fieldtype q=iped nano returns 'ipod nano' which is good 2) with termsourcefield set to 'spell' (which is the catchall field of 'spell' fieldtype according to the tutorial http://wiki.apache.org/solr/SpellCheckerRequestHandler that has my text fields copied in to it at index time) q=grapics returns 'graphics' but q=grapics card returns nothing. Not sure if i'm missing something. Please help!! Otis Gospodnetic wrote: You don't need to wait for 1.3 to be released - you can simply use a recent nightly build. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:35:52 AM Subject: Re: spellcheckhandler I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15025889.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
I did try with the latest nightly build and followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to creating new catchall field 'spell' of type 'spell' and copied my text fields to 'spell' at index time. Still q=grapics returns 'graphics' but q=grapics card returns nothing. But the same queries return the correct spelling with string fieldtypes. Any fix available? Otis Gospodnetic wrote: You don't need to wait for 1.3 to be released - you can simply use a recent nightly build. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:35:52 AM Subject: Re: spellcheckhandler I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15026217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
I did try with the latest nightly build. The problem still exists. I tested with the example data that comes with solr package. 1)with termsourcefield set to 'word' which is string fieldtype q=iped nano returns 'ipod nano' which is good 2) with termsourcefield set to 'spell' (which is the catchall field of 'spell' fieldtype according to the tutorial http://wiki.apache.org/solr/SpellCheckerRequestHandler that has my text fields copied in to it at index time) q=grapics returns 'graphics' which is good but q=grapics card returns nothing. Not sure if i'm missing something. Please help!! Otis Gospodnetic wrote: You don't need to wait for 1.3 to be released - you can simply use a recent nightly build. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:35:52 AM Subject: Re: spellcheckhandler I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15002379.html Sent from the Solr - User mailing list archive at Nabble.com.
solr 1.3
when will this be released? where can i find the list of improvements/enhancements in 1.3 if its been documented already? -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989395.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.3
Thanks. Would this be the latest code from the trunk that you mentioned? http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip climbingrose wrote: I don't think they (Solr developers) have a time frame for 1.3 release. However, I've been using the latest code from the trunk and I can tell you it's quite stable. The only problem is the documentation sometimes doesn't cover lastest changes in the code. You'll probably have to dig into the code itself or post a question here and many people will be happy to help you. On Jan 21, 2008 12:07 PM, anuvenk [EMAIL PROTECTED] wrote: when will this be released? where can i find the list of improvements/enhancements in 1.3 if its been documented already? -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989395.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Cuong Hoang -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr 1.3
Could you please let me know the location from where i can get it. climbingrose wrote: I'm using code pulled directly from Subversion. On Jan 21, 2008 12:34 PM, anuvenk [EMAIL PROTECTED] wrote: Thanks. Would this be the latest code from the trunk that you mentioned? http://people.apache.org/builds/lucene/solr/nightly/solr-2008-01-19.zip climbingrose wrote: I don't think they (Solr developers) have a time frame for 1.3 release. However, I've been using the latest code from the trunk and I can tell you it's quite stable. The only problem is the documentation sometimes doesn't cover lastest changes in the code. You'll probably have to dig into the code itself or post a question here and many people will be happy to help you. On Jan 21, 2008 12:07 PM, anuvenk [EMAIL PROTECTED] wrote: when will this be released? where can i find the list of improvements/enhancements in 1.3 if its been documented already? -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989395.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Cuong Hoang -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989689.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Cuong Hoang -- View this message in context: http://www.nabble.com/solr-1.3-tp14989395p14989802.html Sent from the Solr - User mailing list archive at Nabble.com.
Term vector
what are term vectors? How do they help with mlt? -- View this message in context: http://www.nabble.com/Term-vector-tp14990408p14990408.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Update the index
http://wiki.apache.org/solr/UpdateXmlMessages Is this what you are looking for. Index the document again and it should overwrite the older one with the same id. Gavin-39 wrote: Hi, Can some one point me to a location where it describes how to update an already indexed document? I was thinking there is and update tag explained somewhere but cant find it. Thanks, -- Gavin Selvaratnam, Project Leader hSenid Mobile Solutions Phone: +94-11-2446623/4 Fax: +94-11-2307579 Web: http://www.hSenidMobile.com Make it happen Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. The content and opinions contained in this email are not necessarily those of hSenid Software International. If you have received this email in error please contact the sender. -- View this message in context: http://www.nabble.com/Update-the-index-tp14991443p14991551.html Sent from the Solr - User mailing list archive at Nabble.com.
spell check component
Is it possible to add a spell check component so i don't have to issue a separate request to solr to do the spell checking? Sorry if this question is naive..am just learning to use solr. searchComponent name=spellcheck class=org.apache.solr.handler.component.spellcheckComponent / and add it to the search handler like this arr name=spellcheck-components strspellcheck/str /arr what would the name of the spell check component be? -- View this message in context: http://www.nabble.com/spell-check-component-tp14973651p14973651.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: spellcheckhandler
I was going to do this create a new field(termsourcefield) called 'spell' field name=spell type=spell indexed=true stored=false multiValued=true/ of type 'spell' fieldType name=spell class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words= stopwords.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType copy my 'name' and 'body' fields to this 'spell' field at index time copyField source=name dest=spell/ copyField source=body dest=spell/ But like you had mentioned, the tutorial says we have to use it on a field thats not tokenized. Now how to use my tokenized fields 'body' and 'name' to build my spell index? How to use it effectively for spell checking on multi-word queries? anuvenk wrote: Is it possible to implement something like this with the spellcheckhandler Like how google does,.. say i search for 'chater 13 bakrupcy', should be able to display these.. did you search for 'chapter 13 bankruptcy' Has someone been able to do this? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14977717.html Sent from the Solr - User mailing list archive at Nabble.com.
phrase slop param in dismax handler
How does adding a phrase slop in the handler help? I tried ps=25 along with some pf values. I assumed that it means this..for eg: a search term, 'child custody battle' means documents which have the words 'child','custody','battle' within 25 words of one another will rank high. Is that correct? -- View this message in context: http://www.nabble.com/phrase-slop-param-in-dismax-handler-tp14631171p14631171.html Sent from the Solr - User mailing list archive at Nabble.com.
what are tf,idf,fieldNorm,queryNorm.?
I understand tf means term frequency. For eg: if the search term is 'chapter 7', does tf mean how frequently 'chapter 7' occurs in the docs? Does it take in to account the total number of words in a doc to determine frequency. Also what is idf, fieldNorm and queryNorm. Trying to understand how solr calculates the solr score. -- View this message in context: http://www.nabble.com/what-are-tf%2Cidf%2CfieldNorm%2CqueryNorm.--tp14639048p14639048.html Sent from the Solr - User mailing list archive at Nabble.com.
parsedquery_ToString
Is the parsedquery_ToString, the one passed to solr after all the tokenizing and analyzing of the query? For the search term 'chapter 7' i have this parsedquery_ToString str name=parsedquery_toString +(text:(bankruptci chap 7) (7 chapter chap) 7 bankruptci^0.8 | ((name:bankruptci name:chap)^2.0))~0.01 (text:(bankruptci chap 7) (7 chapter chap) 7 bankruptci~50^0.8 | ((name:bankruptci name:chap)^2.0))~0.01 /str I have these synonyms chap 7 = bankruptcy chapter = bankruptcy chap = chapter chapter 7 = bankruptcy bankrupcy = bankruptcy chap,7,chap7,chapter 7,chapter 7 bankruptcy,chap 7 But seem to have a little bit of trouble understanding how its building this parsedquery_Tostring Can someone explain. If i can understand this, i'll be able to debug better and analyze why i don't get expected results for some of the search terms and what change i could make to the associated synonyms. -- View this message in context: http://www.nabble.com/parsedquery_ToString-tp14627131p14627131.html Sent from the Solr - User mailing list archive at Nabble.com.
spellcheckhandler
Is it possible to implement something like this with the spellcheckhandler Like how google does,.. say i search for 'chater 13 bakrupcy', should be able to display these.. did you search for 'chapter 13 bankruptcy' Has someone been able to do this? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14627712.html Sent from the Solr - User mailing list archive at Nabble.com.
solr results debugging
I've been using the solr admin form with debug=true to do some in-depth analysis on some results. Could someone explain how to make sense of this..This is the debugging info for the first result i got. 10.201284 = (MATCH) sum of: 6.2467875 = (MATCH) max plus 0.01 times others of: 6.236769 = (MATCH) weight(text:(probat trust live inherit) testament^0.8 in 48784), product of: 0.7070911 = queryWeight(text:(probat trust live inherit) testament^0.8), product of: 0.8 = boost 18.032305 = idf(text:(probat trust live inherit) testament^0.8) 0.049015578 = queryNorm 8.820319 = (MATCH) fieldWeight(text:(probat trust live inherit) testament^0.8 in 48784), product of: 2.236068 = tf(phraseFreq=5.0) 18.032305 = idf(text:(probat trust live inherit) . and it continues some more.. search query: will synonyms that i have: will, living will, last will and testament, living trust, inheritance,probate here is my request handler: (portion of it) str name=echoParamsexplicit/str float name=tie0.01/float str name=qftext^0.8 name^2.0/str !-- until 3 all should match;4 - 3 shld match; 5 - 4 shld match; 6 - 5 shld match; above 6 - 90% match -- str name=mm3lt;-1 4lt;-1 5lt;-1 6lt;90%/str str name=pf text^0.8 name^2.0 /str int name=ps50/int -- View this message in context: http://www.nabble.com/solr-results-debugging-tp14628463p14628463.html Sent from the Solr - User mailing list archive at Nabble.com.
solr word delimiter
I have the word delimiter filter factory in the text field definition both at index and query time. But it does have some negative effects on some search terms like h1-b visa It splits this in to three tokens h,1,b. Now if i understand right, does solr look for matches for 'h' separately, '1' separately and 'b' separately because they are three different tokens. This is giving some undesired results..docs that have 'h' somewhere, '1' somewhere and 'b' somewhere. How to solve this problem? I tried adding synonym like h1-b = h1b visa It does filter some results, but i'm trying to find a global solution rather adding synonyms for all kinds of immigration forms like i-94, k-1 etc -- View this message in context: http://www.nabble.com/solr-word-delimiter-tp14630435p14630435.html Sent from the Solr - User mailing list archive at Nabble.com.