Re: spellchecking multiple fields?
One way would be to create a copyField containing both the fields and use it as the dictionary's source. If you do want to keep separate dictionaries for both the fields then I guess we can introduce per-dictionary overridable parameters like the per-field overridden facet parameters. That would be cleaner than json params. What do you think? On Wed, Jul 16, 2008 at 6:26 AM, Ryan McKinley <[EMAIL PROTECTED]> wrote: > I have a use case where I want to spellcheck the input query across > multiple fields: > Did you mean: location = washington > vs > Did you mean: person = washington > > The current parameter / response structure for the spellcheck component > does not support this kind of thing. Any thoughts on how/if the component > should handle this? Perhaps it could be in a requestHandler where the > params are passed in as json? > > spelling={ dictionary="location", onlyMorePopular=true}&spelling={ > dictionary="person", onlyMorePopular=false } > > Thoughts? > ryan > -- Regards, Shalin Shekhar Mangar.
spellchecking multiple fields?
I have a use case where I want to spellcheck the input query across multiple fields: Did you mean: location = washington vs Did you mean: person = washington The current parameter / response structure for the spellcheck component does not support this kind of thing. Any thoughts on how/if the component should handle this? Perhaps it could be in a requestHandler where the params are passed in as json? spelling={ dictionary="location", onlyMorePopular=true}&spelling={ dictionary="person", onlyMorePopular=false } Thoughts? ryan
Re: Slow deleteById request
Hi, I think the reason was indeed maxPendingDeletes which was configured to 1000. After having updated to a solr nightly build with Lucene 2.4, the issue seems to have disappeared. Thanks for your advices. -- Renaud Delbru Mike Klaas wrote: On 1-Jul-08, at 10:44 PM, Chris Hostetter wrote: > > : Yes, updating to a newer version of nightly Solr build could solve > the > : problem, but I am a little afraid to do it since solr-trunk has > switched to > : lucene 2.4-dev. > > but did you check wether or not you have maxPendingDeletes > configured as > yonik asked? > > That would explain exactly waht you are seeing ... after a certain > number > of deletes have passed, the next one would automaticly force a > commit (and > a newSearcher) and (i believe) subsequent deletes would block until > the > commit is done ... which sounds like exactly what you describe. It shouldn't cause a commit, just a flushing of deletes. However, deletes count toward both maxDocs and maxTime for purposes, so that is the likely explanation. -Mike
Re: solr synonyms behaviour
Yonik Seeley wrote: > > On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]> > wrote: >> To my understanding, this means I am using synonyms at index time and NOT >> query time. And yet, I am still having these problems with synonyms. > > Can you give a specific example? Use debugQuery=true to see what the > resulting query is. > You can also use the admin analysis page to see what the output of the > index and query analyzers. > > -Yonik > > So it sounds like using the '=>' operator for synonyms that may or may not contain multiple words causes problems. So I changed my synonyms.txt to the following: club,bar,night cabaret In schema.xml, I now have the following: As you can see, 'night cabaret' is my only multi-word synonym term. Searches for 'bar' and 'club' now behave as expected. However, if I search for JUST 'night' or JUST 'cabaret', it looks like it is still using the synonyms 'bar' and 'club', which is not what is desired. I only want 'bar' and 'club' to be returned if a search for the complete 'night cabaret' is submitted. Since query-time synonyms is turned "off", the resulting parsedquery_toString is simply "name:night", "name:cabaret", etc... Thanks! -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18476205.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 2 IDs in schema.xml
Multiple uniqueKeys are not supported. You must use only one field as the uniqueKey. On Tue, Jul 15, 2008 at 11:52 PM, dudes dudes <[EMAIL PROTECTED]> wrote: > > Hi > > With some strange reason hotmail doesn't send any XML tags through. I have > attached a file with all the necessary xml tags there , thanks :) > > I have a rare situation and I'm not too sure how to resolve it. > I have defined 2 fields.. one is call userID and the other one is called > companyID in schema.xml file Please see part 1 of the attached xml file. > > > Then I have both of them fields specified as uniquekeys . PLease see part 2 > of the attached document. > > > when I try to post a test6.xml ( ie java -jar post.jar test6.xml) it gives > me the following error: > > SimplePostTool:FATAL:Solr returned an error: > Document_null_missing_required_field_userID > > However; if I replace CompanyID with userID under test6.xml file, it > commits without any problems. > > any thoughts about this ? > > Many thanks to all > ak > > > _ > The John Lewis Clearance - save up to 50% with FREE delivery > http://clk.atdmt.com/UKM/go/101719806/direct/01/ -- Regards, Shalin Shekhar Mangar.
Re: FileBasedSpellChecker behavior?
Also see https://issues.apache.org/jira/browse/SOLR-622 On Wed, Jul 16, 2008 at 2:25 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On Tue, Jul 15, 2008 at 4:19 PM, Grant Ingersoll <[EMAIL PROTECTED]> > wrote: > > agreed, but there is a problem in Solr, AIUI, with regards to when the > > readers are available and when inform() gets called. The workaround is > to > > have a warming query, I believe. > > Right... see https://issues.apache.org/jira/browse/SOLR-593 > > -Yonik > -- Regards, Shalin Shekhar Mangar.
Re: FileBasedSpellChecker behavior?
On Tue, Jul 15, 2008 at 4:19 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > agreed, but there is a problem in Solr, AIUI, with regards to when the > readers are available and when inform() gets called. The workaround is to > have a warming query, I believe. Right... see https://issues.apache.org/jira/browse/SOLR-593 -Yonik
Re: FileBasedSpellChecker behavior?
On Jul 15, 2008, at 3:49 PM, Ryan McKinley wrote: Hi- I'm messing with spellchecking and running into behavior that seems peculiar. We have an index with many words including: "swim" and "slim" If I search for "slim", it returns "swim" as an option -- likewise, if I search for "slim" it returns "swim" why does it check words that are in the dictionary? This does not seem to be the behavior for IndexBasedSpellChecker. I think it can depend on your options, but there are reasons to check even if a word is in the dictionary (although w/ FileBased, it's not as obvious.) Namely, there can be "better" spellings available. The strange thing is, I believe, the Lucene Spell checker should be handling this, but your not the first to report the oddity. - - - - Perhaps the FileBasedSpellChecker should load the configs at startup. It is too strange to have to call load each time the index starts. It should just implement solrCoreAware() and then load the file at startup. agreed, but there is a problem in Solr, AIUI, with regards to when the readers are available and when inform() gets called. The workaround is to have a warming query, I believe. thanks ryan
FileBasedSpellChecker behavior?
Hi- I'm messing with spellchecking and running into behavior that seems peculiar. We have an index with many words including: "swim" and "slim" If I search for "slim", it returns "swim" as an option -- likewise, if I search for "slim" it returns "swim" why does it check words that are in the dictionary? This does not seem to be the behavior for IndexBasedSpellChecker. - - - - Perhaps the FileBasedSpellChecker should load the configs at startup. It is too strange to have to call load each time the index starts. It should just implement solrCoreAware() and then load the file at startup. thanks ryan
RE: Wiki for 1.3
THANKS!!! > Date: Tue, 15 Jul 2008 11:38:06 -0700> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: RE: Wiki for 1.3> > > : Thanks. Do we > expect the same some time soon. I agree that the user > : community have shed > light in with a lot of examples. Just wanna know if > : there was more that > could be done. I am looking at the java docs of the > : same too and that > helps to some extent. But have felt the wiki was very > : very useful in the > past for me.> > The wiki has never been (nor attempted to be) a comprehensive > list of > every "plugin" type available in Solr -- just a pointer to where > that info > can be found in the javadocs. The specific items listed on the > > AnalyzersTokenizersTokenFilters are just the ones that are particularly > > common, or have subtleties about them that people wanted to make notes > > aout.> > You can feel free to add any tips&tricks about any analysis plugin > you > want to that page.> > SOLR-555 is an attempt at generating more user > friendly docs about all > out-ot-the-box plugins. Once it's ready for prime > time, we'll still need > more class level javadocs for the various plugins to > really make it useful > - so any patches along that lines will eventually > help.> > > -Hoss> _ Wish to Marry Now? Click Here to Register FREE http://www.shaadi.com/registration/user/index.php?ptnr=mhottag
Re: solr synonyms behaviour
On Tue, Jul 15, 2008 at 2:27 PM, swarag <[EMAIL PROTECTED]> wrote: > To my understanding, this means I am using synonyms at index time and NOT > query time. And yet, I am still having these problems with synonyms. Can you give a specific example? Use debugQuery=true to see what the resulting query is. You can also use the admin analysis page to see what the output of the index and query analyzers. -Yonik
Re: Solr stops responding
Sorry for bunch of short self-replies, just trying to analyse... CPU may get overloaded by constantly running GC trying to defragment&optimize memory, in a loop (constant queue of requests); response time will be few minutes (in best cases) and contain 500... so that sometimes we can't see OOM in log files (overloaded CPU). At least during troubleshooting we need to comment this block out in SolrServlet: } catch (Throwable e) { SolrException.log(log,e); sendErr(500, SolrException.toStr(e), request, response); } I can't understand also why it happened several times yesterday with SUN Java 5 (AMD64), and does not happen yet BEA JRockit. I had different problems with JRockit (HttpClient didn't not work with it) so that I avoided it till now... == http://www.linkedin.com/in/liferay Quoting Fuad Efendi <[EMAIL PROTECTED]>: Just as a sample, SolrCore contains blocks like } catch (Throwable e) { SolrException.logOnce(log,null,e); } And SolrServlet: } catch (Throwable e) { SolrException.log(log,e); sendErr(500, SolrException.toStr(e), request, response); } What will happen with OutOfMemoryError? If memory is not 'enough'-enough it won't even output to catalina.out, and JVM/SOLR will stop responding instead of 'abnormal' exit... Quoting Fuad Efendi <[EMAIL PROTECTED]>: I suspect that SolrException is used to catch ALL exceptions in order to show "500 OutOfMemory" in HTML/XML/JSON etc., so that JVM simply hangs... weird HTTP understanding... Quoting Fuad Efendi <[EMAIL PROTECTED]>: Following lines are strange, looks like SOLR deals with OOM and rethrows own exception (so that in some cases JVM simply hangs instead of exit): Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space
RE: Wiki for 1.3
: Thanks. Do we expect the same some time soon. I agree that the user : community have shed light in with a lot of examples. Just wanna know if : there was more that could be done. I am looking at the java docs of the : same too and that helps to some extent. But have felt the wiki was very : very useful in the past for me. The wiki has never been (nor attempted to be) a comprehensive list of every "plugin" type available in Solr -- just a pointer to where that info can be found in the javadocs. The specific items listed on the AnalyzersTokenizersTokenFilters are just the ones that are particularly common, or have subtleties about them that people wanted to make notes aout. You can feel free to add any tips&tricks about any analysis plugin you want to that page. SOLR-555 is an attempt at generating more user friendly docs about all out-ot-the-box plugins. Once it's ready for prime time, we'll still need more class level javadocs for the various plugins to really make it useful - so any patches along that lines will eventually help. -Hoss
Re: solr synonyms behaviour
matt connolly wrote: > > You won't have the multiple word problem if you use synonyms at index time > instead of query time. > > > swarag wrote: >> >> Here is a basic example of some synonyms in my synonyms.txt: >> club=>club,bar,night cabaret >> bar=>bar,club >> >> As you can see, a search for 'bar' will return any documents with 'bar' >> or 'club' in the name. This works fine. However, a search for 'club' >> SHOULD return any documents with 'club', 'bar' or 'night cabaret' in the >> name, but it does not. It only returns 'bar' and 'club'. >> >> Interestingly, a search for 'night cabaret' gives me all 'night >> cabaret's, 'bar's and 'club's...which is quite unexpected since I'm using >> uni-directional synonym config (using the => symbol) >> >> Does your config give you my desired behavior? >> > > Is there something I am missing here? This is an excerpt from my schema.xml: To my understanding, this means I am using synonyms at index time and NOT query time. And yet, I am still having these problems with synonyms. -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18471922.html Sent from the Solr - User mailing list archive at Nabble.com.
2 IDs in schema.xml
Hi With some strange reason hotmail doesn't send any XML tags through. I have attached a file with all the necessary xml tags there , thanks :) I have a rare situation and I'm not too sure how to resolve it. I have defined 2 fields.. one is call userID and the other one is called companyID in schema.xml file Please see part 1 of the attached xml file. Then I have both of them fields specified as uniquekeys . PLease see part 2 of the attached document. when I try to post a test6.xml ( ie java -jar post.jar test6.xml) it gives me the following error: SimplePostTool:FATAL:Solr returned an error: Document_null_missing_required_field_userID However; if I replace CompanyID with userID under test6.xml file, it commits without any problems. any thoughts about this ? Many thanks to all ak _ The John Lewis Clearance - save up to 50% with FREE delivery http://clk.atdmt.com/UKM/go/101719806/direct/01/ //I have defined 2 fields as shown bellow: //UniqueKeys userID companyID //copy field commands 44
Re: Solr stops responding
Just as a sample, SolrCore contains blocks like } catch (Throwable e) { SolrException.logOnce(log,null,e); } And SolrServlet: } catch (Throwable e) { SolrException.log(log,e); sendErr(500, SolrException.toStr(e), request, response); } What will happen with OutOfMemoryError? If memory is not 'enough'-enough it won't even output to catalina.out, and JVM/SOLR will stop responding instead of 'abnormal' exit... Quoting Fuad Efendi <[EMAIL PROTECTED]>: I suspect that SolrException is used to catch ALL exceptions in order to show "500 OutOfMemory" in HTML/XML/JSON etc., so that JVM simply hangs... weird HTTP understanding... Quoting Fuad Efendi <[EMAIL PROTECTED]>: Following lines are strange, looks like SOLR deals with OOM and rethrows own exception (so that in some cases JVM simply hangs instead of exit): Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space
Re: Duplicate content
On Jul 15, 2008, at 10:31 AM, Fuad Efendi wrote: Thanks Ryan, Is really unique if we allow duplicates? I had similar problem... if you allowDups, then uniqueKey may not be unique... however, it is still used as the key for many items. Quoting Ryan McKinley <[EMAIL PROTECTED]>: On Jul 15, 2008, at 2:45 AM, Sunil wrote: Hi All, I want to change the duplicate content behavior in solr. What I want to do is: 1) I don't want duplicate content. 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. Can anyone suggest how to achieve it? Check the "allowDups" options for http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef Thanks, Sunil
Re: solr synonyms behaviour
You won't have the multiple word problem if you use synonyms at index time instead of query time. swarag wrote: > > Here is a basic example of some synonyms in my synonyms.txt: > club=>club,bar,night cabaret > bar=>bar,club > > As you can see, a search for 'bar' will return any documents with 'bar' or > 'club' in the name. This works fine. However, a search for 'club' SHOULD > return any documents with 'club', 'bar' or 'night cabaret' in the name, > but it does not. It only returns 'bar' and 'club'. > > Interestingly, a search for 'night cabaret' gives me all 'night cabaret's, > 'bar's and 'club's...which is quite unexpected since I'm using > uni-directional synonym config (using the => symbol) > > Does your config give you my desired behavior? > -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18471373.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter by Type increases search results.
On Tue, Jul 15, 2008 at 11:10 AM, Norberto Meijome <[EMAIL PROTECTED]> wrote: > On Tue, 15 Jul 2008 18:07:43 +0530 > "Preetam Rao" <[EMAIL PROTECTED]> wrote: > >> When I say filter, I meant q=fish&fq=type:idea > > btw, this *seems* to only work for me with standard search handler. dismax > and fq: dont' seem to get along nicely... but maybe, it is just late and i'm > not testing it properly.. It should work the same... the only thing dismax does differently now is change the type of the base query to "dismax". -Yonik
Re: solr synonyms behaviour
matt connolly wrote: > > > swarag wrote: >> >> Knowing the Lucene struggles with multi-word query-time synonyms, my >> question is, does this also affect index-time synonyms? What other >> alternatives do we have if we require there to be multiple word synonyms? >> > > No the multiple word problem doesn't happen with index synonyms, only > query synonyms. > > See: > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 > > I ended up using index time synonyms, but ideally, I'd like to see a > filter factory that does something like the SynsExpand tool does (which > was written for lucene, not solr). > I've tried this and it doesn't seem to work. Here are the basics of my config: ... Synonyms for queryTime is off Here is a basic example of some synonyms in my synonyms.txt: club=>club,bar,night cabaret bar=>bar,club As you can see, a search for 'bar' will return any documents with 'bar' or 'club' in the name. This works fine. However, a search for 'club' SHOULD return any documents with 'club', 'bar' or 'night cabaret' in the name, but it does not. It only returns 'bar' and 'club'. Interestingly, a search for 'night cabaret' gives me all 'night cabaret's, 'bar's and 'club's...which is quite unexpected since I'm using uni-directional synonym config (using the => symbol) Does your config give you my desired behavior? -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18469995.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr stops responding
I suspect that SolrException is used to catch ALL exceptions in order to show "500 OutOfMemory" in HTML/XML/JSON etc., so that JVM simply hangs... weird HTTP understanding... Quoting Fuad Efendi <[EMAIL PROTECTED]>: Following lines are strange, looks like SOLR deals with OOM and rethrows own exception (so that in some cases JVM simply hangs instead of exit): Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space
Re: WordDelimiterFilter splits at non-ASCII chars
On Tue, Jul 15, 2008 at 10:29 AM, Stefan Oestreicher <[EMAIL PROTECTED]> wrote: > as I understand the WordDelimiterFilter should split on case changes, word > delimiters and changes from character to digit, but it should not > differentiate between ASCII and multibyte chars. It does however. The word > "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which > unfortunately renders this filter quite unusable for me. Am i missing > something or is this a bug? > I'm using solr 1.3 built from trunk. Look for charset issues in communicating with Solr. I just tried this with the "text" field via Solr's analysis.jsp and it works fine. -Yonik
Re: WordDelimiterFilter splits at non-ASCII chars
Hi Stefan, I wrote a test case for the problem you described but it is working fine. I used the following definition: What configuration are you using? If it is different, please share it so that I can test with it. On Tue, Jul 15, 2008 at 7:59 PM, Stefan Oestreicher < [EMAIL PROTECTED]> wrote: > Hi, > > as I understand the WordDelimiterFilter should split on case changes, word > delimiters and changes from character to digit, but it should not > differentiate between ASCII and multibyte chars. It does however. The word > "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which > unfortunately renders this filter quite unusable for me. Am i missing > something or is this a bug? > I'm using solr 1.3 built from trunk. > > TIA, > > Stefan Oestreicher > > -- Regards, Shalin Shekhar Mangar.
Re: Solr stops responding
Following lines are strange, looks like SOLR deals with OOM and rethrows own exception (so that in some cases JVM simply hangs instead of exit): Apr 4, 2008 1:20:53 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space This is full Thread Dump after OOM, made in April with Tomcat 6. Deadlock at Tomcat? Looks like some queries succeed, but I was forced KILL -9. = SEVERE: Error allocating socket processor java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:57:36 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Exception in thread "catalina-exec-4" java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:58:18 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:59:01 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:59:01 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:59:01 PM org.apache.tomcat.util.net.AprEndpoint$Acceptor run SEVERE: Socket accept failed java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:59:39 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 1:59:53 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=webcam&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.excaliberpc.com"&hl=true 0 18 Apr 4, 2008 2:00:51 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=pepe+jeans&qt=dismax&version=2.2&facet.field=country&facet.field=host&hl=true 0 38544 Apr 4, 2008 2:02:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 2:02:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 2:02:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 2:02:11 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=10&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=category:"core"&qt=standard&version=2.2&facet.field=country&facet.field=host&hl=true 0 79439 Apr 4, 2008 2:02:21 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=robot&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.clickonit.com"&hl=true 0 17 Apr 4, 2008 2:02:35 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=Cognac&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.designersimports.com"&hl=true 0 19 Apr 4, 2008 2:03:12 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=prada&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.theprincessescloset.com"&hl=true 0 1 Apr 4, 2008 2:04:55 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space Apr 4, 2008 2:04:55 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&sort=price+desc&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=velodyne+DD+15&qt=dismax&version=2.2&facet.field=country&facet.field=host&hl=true 0 53 Apr 4, 2008 2:05:21 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=velodyne+DD+15&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.hometheaterstore.com"&hl=true 0 3 Apr 4, 2008 2:06:06 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=100&start=0&sort=id+asc&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=sex&qt=dismax&version=2.2&facet.field=country&facet.field=host&fq=host:"www.moviesunlimited.com"&fq=category:"video"&hl=true 0 39 Apr 4, 2008 2:06:24 PM org.apache.solr.core.SolrCore execute INFO: /select wt=xml&facet.limit=100&rows=10&start=0&facet=true&facet.mincount=1&fl=id,item_name,category,price,price_txt,url,host,country&q=id:[*+TO+*]&qt=standard&version=2.2&facet.field=country&facet.field=host&hl=true 0 859 Apr 4, 2008 2:07:03 PM org.apac
Best way to return ExternalFileField in the results
Hi all, I've been trying to return a field of type ExternalFileField in the search result. Upon examining XMLWriter class, it seems like Solr can't do this out of the box. Therefore, I've tried to hack Solr to enable this behaviour. The goal is to call to ExternalFileField.getValueSource(SchemaField field,QParser parser) in XMLWriter.writeDoc(String name, Document document,...) method. There are two issues with doing this: 1) I need to create an instance of QParser in writeDoc method. What is the best way to do this? What kind of overhead of creating a new QParser for every document returned? 2) I have to modify writeDoc method to include the internal Lucene document Id because I need it to retrieve the ExternalFileField: fileField.getValueSource(schemaField, qparser).getValues(request.getSearcher().getIndexReader()).floatVal(docId) The immediate affect is that it breaks writeVal() method (because this method references writeDoc()). Any comments? Thanks in advance. -- Regards, Cuong Hoang
Re: Solr stops responding
Can we collect more information. It would be nice to know what the threads are doing when it hangs. If you are using *nix issue kill -3 it would print out the stacktrace of all the threads in the VM . That may tell us what is the state of each thread which could help us suggest something On Tue, Jul 15, 2008 at 8:59 PM, Fuad Efendi <[EMAIL PROTECTED]> wrote: > I constantly have the same problem; sometimes I have OutOfMemoryError in > logs, sometimes > not. Not-predictable. I minimized all caches, it still happens even with > 8192M. CPU usage > is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA JRockit 5 > yesterday, > looks 30 times faster (25% CPU load with 4096M RAM); no any problem yet, > let's see... > > Strange: Tomcat simply hangs instead of exit(...) > > There are some posts related to OutOfMemoryError in solr-user list. > > > == > http://www.linkedin.com/in/liferay > > Quoting Doug Steigerwald <[EMAIL PROTECTED]>: > >> Since we pushed Solr out to production a few weeks ago, we've seen a >> few issues with Solr not responding to requests (searches or admin >> pages). There doesn't seem to be any reason for it from what we can >> tell. We haven't seen it in QA or development. >> >> We're running Solr with basically the example Solr setup with Jetty >> (6.1.3). We package our Solr install by using 'ant example' and >> replacing configs/etc. Whenever Solr stops responding, there are no >> messages in the logs, nothing. Requests just time out. >> >> We have also only seen this on our slaves. The master doesn't seem to >> be hitting this issue. All the boxes are the same, version of java is >> the same, etc. >> >> We don't have a stack trace and no JMX set up. Once we see this issue, >> our support folks just stop and start Solr on that machine. >> >> Has anyone else run into anything like this with Solr? >> >> Thanks. >> Doug > > > > -- --Noble Paul
Re: Solr stops responding
We haven't seen an OutOfMemoryError. The load on the server doesn't go up either (hovers around 1-2). We're on Java 1.6.0_03-b05. 4x3.8GHz Xeons, 8GB RAM. Doug On Jul 15, 2008, at 11:29 AM, Fuad Efendi wrote: I constantly have the same problem; sometimes I have OutOfMemoryError in logs, sometimes not. Not-predictable. I minimized all caches, it still happens even with 8192M. CPU usage is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA JRockit 5 yesterday, looks 30 times faster (25% CPU load with 4096M RAM); no any problem yet, let's see... Strange: Tomcat simply hangs instead of exit(...) There are some posts related to OutOfMemoryError in solr-user list. == http://www.linkedin.com/in/liferay Quoting Doug Steigerwald <[EMAIL PROTECTED]>: Since we pushed Solr out to production a few weeks ago, we've seen a few issues with Solr not responding to requests (searches or admin pages). There doesn't seem to be any reason for it from what we can tell. We haven't seen it in QA or development. We're running Solr with basically the example Solr setup with Jetty (6.1.3). We package our Solr install by using 'ant example' and replacing configs/etc. Whenever Solr stops responding, there are no messages in the logs, nothing. Requests just time out. We have also only seen this on our slaves. The master doesn't seem to be hitting this issue. All the boxes are the same, version of java is the same, etc. We don't have a stack trace and no JMX set up. Once we see this issue, our support folks just stop and start Solr on that machine. Has anyone else run into anything like this with Solr? Thanks. Doug
Re: Duplicate content
Thanks Ryan, Is really unique if we allow duplicates? I had similar problem... Quoting Ryan McKinley <[EMAIL PROTECTED]>: On Jul 15, 2008, at 2:45 AM, Sunil wrote: Hi All, I want to change the duplicate content behavior in solr. What I want to do is: 1) I don't want duplicate content. 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. Can anyone suggest how to achieve it? Check the "allowDups" options for http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef Thanks, Sunil
Re: Solr stops responding
I constantly have the same problem; sometimes I have OutOfMemoryError in logs, sometimes not. Not-predictable. I minimized all caches, it still happens even with 8192M. CPU usage is 375%-400% (two double-core Opterons), SUN Java 5. Moved to BEA JRockit 5 yesterday, looks 30 times faster (25% CPU load with 4096M RAM); no any problem yet, let's see... Strange: Tomcat simply hangs instead of exit(...) There are some posts related to OutOfMemoryError in solr-user list. == http://www.linkedin.com/in/liferay Quoting Doug Steigerwald <[EMAIL PROTECTED]>: Since we pushed Solr out to production a few weeks ago, we've seen a few issues with Solr not responding to requests (searches or admin pages). There doesn't seem to be any reason for it from what we can tell. We haven't seen it in QA or development. We're running Solr with basically the example Solr setup with Jetty (6.1.3). We package our Solr install by using 'ant example' and replacing configs/etc. Whenever Solr stops responding, there are no messages in the logs, nothing. Requests just time out. We have also only seen this on our slaves. The master doesn't seem to be hitting this issue. All the boxes are the same, version of java is the same, etc. We don't have a stack trace and no JMX set up. Once we see this issue, our support folks just stop and start Solr on that machine. Has anyone else run into anything like this with Solr? Thanks. Doug
RE: Wiki for 1.3
Thanks. Do we expect the same some time soon. I agree that the user community have shed light in with a lot of examples. Just wanna know if there was more that could be done. I am looking at the java docs of the same too and that helps to some extent. But have felt the wiki was very very useful in the past for me. > Date: Tue, 15 Jul 2008 11:26:16 +1000> From: [EMAIL PROTECTED]> To: > solr-user@lucene.apache.org> Subject: Re: Wiki for 1.3> > On Mon, 14 Jul 2008 > 23:25:25 +> sundar shankar <[EMAIL PROTECTED]> wrote:> > > Thanks for > your patient response. I dont wanna know the classes changed, but I wanna get > a hand on the wiki page for the same. I tried to search for these classes in > the solr wiki. I was getting a page does not exist. This is the result of the > search I did on solr wiki site.> > Hi Sundar,> indeed, some pages havent been > written yet. > > If you check the mail archives, there are a few exchanges > with working configurations on *NGram* .> > b> > _> > {Beto|Norberto|Numard} Meijome> > "At times, to be silent is to lie." > > Miguel de Unamuno> > I speak for myself, not my employer. Contents may be > hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them > is worse. You have been Warned. _ Wish to Marry Now? Join Shaadi.com FREE! http://www.shaadi.com/registration/user/index.php?ptnr=mhottag
Re: Solr stops responding
Doug Steigerwald pisze: > We're running Solr with basically the example Solr setup with Jetty > (6.1.3). We package our Solr install by using 'ant example' and > replacing configs/etc. Whenever Solr stops responding, there are no > messages in the logs, nothing. Requests just time out. > > We have also only seen this on our slaves. The master doesn't seem to > be hitting this issue. All the boxes are the same, version of java is > the same, etc. > > We don't have a stack trace and no JMX set up. Once we see this issue, > our support folks just stop and start Solr on that machine. > > Has anyone else run into anything like this with Solr? Yes, I saw such behaviour on many Ubuntu 6.06 servers running in virtual environments (like VMWare). Either Jetty was unable to bind to specified port (for unknown reason) or the whole process was lost somewhere in space (killable only by kill -9, not responding to signals, etc.). Though, I can only confirm, no advice here, as this was mystery to me too. -- We read Knuth so you don't have to. -- Tim Peters Jarek Zgoda re:define
Re: solr:sorting on what type is faster
If a sort is not specified then documents are returned in decreasing order of their score. You can get more details on the scoring at http://lucene.apache.org/java/docs/scoring.html On Tue, Jul 15, 2008 at 6:03 PM, sumantht <[EMAIL PROTECTED]> wrote: > > hi, > in databases, sorting based on text fields is faster and preferable, if i > am > not wrong. > similarly, which type of fields are to be chosen to sort in 'solr'? how the > ties are broken? > sorry for mistakes, if any .. > > thank you > -- > View this message in context: > http://www.nabble.com/solr%3Asorting-on-what-type-is-faster-tp18464118p18464118.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- Regards, Shalin Shekhar Mangar.
Solr stops responding
Since we pushed Solr out to production a few weeks ago, we've seen a few issues with Solr not responding to requests (searches or admin pages). There doesn't seem to be any reason for it from what we can tell. We haven't seen it in QA or development. We're running Solr with basically the example Solr setup with Jetty (6.1.3). We package our Solr install by using 'ant example' and replacing configs/etc. Whenever Solr stops responding, there are no messages in the logs, nothing. Requests just time out. We have also only seen this on our slaves. The master doesn't seem to be hitting this issue. All the boxes are the same, version of java is the same, etc. We don't have a stack trace and no JMX set up. Once we see this issue, our support folks just stop and start Solr on that machine. Has anyone else run into anything like this with Solr? Thanks. Doug
Re: Filter by Type increases search results.
On Tue, 15 Jul 2008 18:07:43 +0530 "Preetam Rao" <[EMAIL PROTECTED]> wrote: > When I say filter, I meant q=fish&fq=type:idea btw, this *seems* to only work for me with standard search handler. dismax and fq: dont' seem to get along nicely... but maybe, it is just late and i'm not testing it properly.. _ {Beto|Norberto|Numard} Meijome "Mix a little foolishness with your serious plans; it's lovely to be silly at the right moment." Horace I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Duplicate content
On Jul 15, 2008, at 2:45 AM, Sunil wrote: Hi All, I want to change the duplicate content behavior in solr. What I want to do is: 1) I don't want duplicate content. 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. Can anyone suggest how to achieve it? Check the "allowDups" options for http://wiki.apache.org/solr/UpdateXmlMessages#head-3dfbf90fbc69f168ab6f3389daf68571ad614bef Thanks, Sunil
WordDelimiterFilter splits at non-ASCII chars
Hi, as I understand the WordDelimiterFilter should split on case changes, word delimiters and changes from character to digit, but it should not differentiate between ASCII and multibyte chars. It does however. The word "hälse" (german plural of "neck") gets split into "h", "ä" and "lse", which unfortunately renders this filter quite unusable for me. Am i missing something or is this a bug? I'm using solr 1.3 built from trunk. TIA, Stefan Oestreicher
Re: which type of fields are to be compressed
Compression is only relevant for the original text, not the indexed part. So in terms of searching, it's irrelevant. Where it is relevant is when you *fetch* the document (e.g. doe = hits.doc(32)), the de-compression work is done (for stored documents). Depending upon your app, this may or may not matter. Here's a writeup I did that will shed some light on this, even though it talks about FieldSelector (which, if you really need to compress data you probably care about too). http://wiki.apache.org/lucene-java/FieldSelectorPerformance Best Erick On Tue, Jul 15, 2008 at 8:29 AM, sumantht <[EMAIL PROTECTED]> wrote: > > hi > is it preferable to compress each and every field, if not why.? > how exactly it helps? > -- > View this message in context: > http://www.nabble.com/which-type-of-fields-are-to-be-compressed-tp18464056p18464056.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
RE: Duplicate content
Thanks guys. -Original Message- From: Norberto Meijome [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 15, 2008 2:35 PM To: solr-user@lucene.apache.org Subject: Re: Duplicate content On Tue, 15 Jul 2008 10:48:14 +0200 Jarek Zgoda <[EMAIL PROTECTED]> wrote: > >> 2) I don't want to overwrite old content with new one. > >> > >> Means, if I add duplicate content in solr and the content already > >> exists, the old content should not be overwritten. > > > > before inserting a new document, query the index - if you get a result back, > > then don't insert. I don't know of any other way. > > This operation is not atomic, so you get a race condition here. Other > than that, it seems fine. ;) of course - but i am not sure you can control atomicity at the SOLR level (yet? ;) ) for /update handler - so it'd have to either be a custom handler, or your app being the only one accessing and controlling write access to it that way. It definitely gets more interesting if you start adding shards ;) _ {Beto|Norberto|Numard} Meijome "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use hammer." IBM maintenance manual, 1975 I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Filter by Type increases search results.
Of course - it's so obvious now. Thanks! -- View this message in context: http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18464457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter by Type increases search results.
Hi Matt, When I say filter, I meant q=fish&fq=type:idea What you are trying is a boolean OR of defaultsearchfield.:fish OR type:idea. Its not a filter, its an OR. Obviously you will get a union of results... -- Preetam On Tue, Jul 15, 2008 at 5:37 PM, matt connolly <[EMAIL PROTECTED]> wrote: > > Yes, the same, except for the filter. > > For example: > > http://localhost:8983/solr/select?q=fish > returns: > etc (followed by 2 > docs) > > http://localhost:8983/solr/select?q=fish+type:idea > returns: > . (followed by 9 > docs) > > > -Matt > > > Preetam Rao wrote: > > > > Hi Matt, > > > > Other than applying one more fq, is everything else remains same between > > the > > two queries, like q and all other parameters ? > > > > My understanding is that, fq is an intersection on the set of results > > returned from q. So it should always be a subset of results returned from > > q. > > So if one uses just q, and other uses q and fq, for the same q, the > second > > will have equal or less number of documents. > > > > > > Preetam > > > > > > -- > View this message in context: > http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18463448.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
solr:sorting on what type is faster
hi, in databases, sorting based on text fields is faster and preferable, if i am not wrong. similarly, which type of fields are to be chosen to sort in 'solr'? how the ties are broken? sorry for mistakes, if any .. thank you -- View this message in context: http://www.nabble.com/solr%3Asorting-on-what-type-is-faster-tp18464118p18464118.html Sent from the Solr - User mailing list archive at Nabble.com.
which type of fields are to be compressed
hi is it preferable to compress each and every field, if not why.? how exactly it helps? -- View this message in context: http://www.nabble.com/which-type-of-fields-are-to-be-compressed-tp18464056p18464056.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter by Type increases search results.
Yes, the same, except for the filter. For example: http://localhost:8983/solr/select?q=fish returns: etc (followed by 2 docs) http://localhost:8983/solr/select?q=fish+type:idea returns: . (followed by 9 docs) -Matt Preetam Rao wrote: > > Hi Matt, > > Other than applying one more fq, is everything else remains same between > the > two queries, like q and all other parameters ? > > My understanding is that, fq is an intersection on the set of results > returned from q. So it should always be a subset of results returned from > q. > So if one uses just q, and other uses q and fq, for the same q, the second > will have equal or less number of documents. > > > Preetam > > -- View this message in context: http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18463448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter by Type increases search results.
Hi Matt, Other than applying one more fq, is everything else remains same between the two queries, like q and all other parameters ? My understanding is that, fq is an intersection on the set of results returned from q. So it should always be a subset of results returned from q. So if one uses just q, and other uses q and fq, for the same q, the second will have equal or less number of documents. Preetam On Tue, Jul 15, 2008 at 4:10 PM, matt connolly <[EMAIL PROTECTED]> wrote: > > I'm using Solr with a Drupal site, and one of the fields in the schema is > "type". > > In my example development site, searching for the word "fish" returns 2 > documents, one type='story', and the other type='idea'. > > If I filter by type:idea then I get 9 results, the correct first result, > followed by 8 results that are of type='idea' but do not use the word > "fish" > at all. I have completely disabled synonyms (and rebuilt indexes) and this > makes no difference. > > Any ideas why filtering the type results in more search documents matched? > -- > View this message in context: > http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18462188.html > Sent from the Solr - User mailing list archive at Nabble.com. > >
Re: Dismax request handler and sub phrase matches... suggestion for another handler..
I agree. If we do decide to implement another kind of request handler, it should be through StandardRequesthandler def type attribute, which selects the registered QParser which generates appropriate queries for lucene. Preetam On Tue, Jul 15, 2008 at 3:59 PM, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote: > >> What are your thoughts on having one more request handler like dismax, but >> which uses a sub-phrase query instead of dismax query ? >> > > It'd be better to just implement a QParser(Plugin) such that the > StandardRequestHandler can use it (&defType=dismax, for example). > > No need to have additional actual request handlers just to swap out query > parsing logic anymore. > >Erik > >
Filter by Type increases search results.
I'm using Solr with a Drupal site, and one of the fields in the schema is "type". In my example development site, searching for the word "fish" returns 2 documents, one type='story', and the other type='idea'. If I filter by type:idea then I get 9 results, the correct first result, followed by 8 results that are of type='idea' but do not use the word "fish" at all. I have completely disabled synonyms (and rebuilt indexes) and this makes no difference. Any ideas why filtering the type results in more search documents matched? -- View this message in context: http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18462188.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax request handler and sub phrase matches... suggestion for another handler..
On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote: What are your thoughts on having one more request handler like dismax, but which uses a sub-phrase query instead of dismax query ? It'd be better to just implement a QParser(Plugin) such that the StandardRequestHandler can use it (&defType=dismax, for example). No need to have additional actual request handlers just to swap out query parsing logic anymore. Erik
RE: Solr searching issue..
thanks ! I think I fixed the issue and it's doing good :) > From: [EMAIL PROTECTED] > To: solr-user@lucene.apache.org > Subject: RE: Solr searching issue.. > Date: Mon, 14 Jul 2008 20:12:00 + > > Copy field dest="text". I am not sure if u can copy into text or something > like that. We copy it into a field of type text or string etc.. Plus what is > ur query string. what gives u no results. How do u index it?? > need more clues to figure out answer dude :) > > > >> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> Subject: RE: Solr >> searching issue..> Date: Mon, 14 Jul 2008 09:34:47 +0100>>> again whatever I >> have pasted it didn't work ! .. I have attached the schema.xml file >> instead,,, sorry for spamming you all>> thanks> ak> >> >> From: [EMAIL PROTECTED]>> To: >> solr-user@lucene.apache.org>> Subject: RE: Solr searching issue..>> Date: >> Mon, 14 Jul 2008 09:28:16 +0100>> with some strange reason my copy and >> paste didn't work !!! sorry to terrible you all.. hope you can see them >> now..>> >>> From: [EMAIL >> PROTECTED]>>> To: solr-user@lucene.apache.org>>> Subject: RE: Solr searching >> issue..>>> Date: Mon, 14 Jul 2008 09:17:32 +0100> Hi again,>> I >> have done the followings, but I do get zero replies .. please let me know >> what I have done wrong... thanks>> version type: nightly build >> solr-2008-07-07>> // for >> n-gram>> So, if i search >> for john,,, john will be found with out any problems... if I search for >> "joh" I'm not getting any results back,,,>> thanks>>> ak> >> Date: Fri, 11 Jul 2008 20:14:11 >> +0530 From: [EMAIL PROTECTED] To: solr-user@lucene.apache.org >> Subject: Re: Solr searching issue.. You can use EdgeNGramTokenizer >> available with Solr 1.3 to achieve this. But I'd think again about >> introducing this kind of search as n-grams can bloat your index >> size. On Fri, Jul 11, 2008 at 3:58 PM, dudes dudes >> wrote:>> Hi solr-users,>> version type: nightly build >> solr-2008-07-07>> If I search for name John, it finds it with out >> any issues On the> other hand if I search for Joh* , it also finds >> all the possible matches.> However, if> I search for "Joh".. it >> doesn't find any possible match in other word,> it doesn't find name >> john if you don't specify the exact name..>> Does anybody know what >> I'm missing here?>> thanks> ak> >> _> The >> John Lewis Clearance - save up to 50% with FREE delivery> >> http://clk.atdmt.com/UKM/go/101719806/direct/01/ -- >> Regards, Shalin Shekhar Mangar.>> >> _>>> 100’s >> of Nikon cameras to be won with Live Search>>> >> http://clk.atdmt.com/UKM/go/101719808/direct/01/ >> _>> Play and >> win great prizes with Live Search and Kung Fu Panda>> >> http://clk.atdmt.com/UKM/go/101719966/direct/01/>> >> _> The John >> Lewis Clearance - save up to 50% with FREE delivery> >> http://clk.atdmt.com/UKM/go/101719806/direct/01/ > _ > Missed your favourite programme? Stop surfing TV channels and start planning > your weekend TV viewing with our comprehensive TV Listing > http://entertainment.in.msn.com/TV/TVListing.aspx _ Invite your Facebook friends to chat on Messenger http://clk.atdmt.com/UKM/go/101719649/direct/01/
Re: solr synonyms behaviour
swarag wrote: > > Knowing the Lucene struggles with multi-word query-time synonyms, my > question is, does this also affect index-time synonyms? What other > alternatives do we have if we require there to be multiple word synonyms? > No the multiple word problem doesn't happen with index synonyms, only query synonyms. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-2c461ac74b4ddd82e453dc68fcfc92da77358d46 I ended up using index time synonyms, but ideally, I'd like to see a filter factory that does something like the SynsExpand tool does (which was written for lucene, not solr). -- View this message in context: http://www.nabble.com/solr-synonyms-behaviour-tp15051211p18461507.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr synonyms behaviour
Chris, On Sat, Jan 26, 2008 at 2:30 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote: > : I have the synonym filter only at query time coz i can't re-index data (or > : portion of data) everytime i add a synonym and a couple of other reasons. > > Use cases like yours will *never* work as a query time synonym ... hence > all of the information about multi-word synonyms and the caveats about > using them in the wiki... > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#SynonymFilter Considering these problems, it might be better to move the SynonymFilter from type="query" to type="index" in the example file. This file is very often used as a reference. Or perhaps we should just mention potential problems and a link to the documentation in the existing comment: "in this example, we will only use synonyms at query time". Thoughts? -- Guillaume
Re: Duplicate content
On Tue, 15 Jul 2008 10:48:14 +0200 Jarek Zgoda <[EMAIL PROTECTED]> wrote: > >> 2) I don't want to overwrite old content with new one. > >> > >> Means, if I add duplicate content in solr and the content already > >> exists, the old content should not be overwritten. > > > > before inserting a new document, query the index - if you get a result back, > > then don't insert. I don't know of any other way. > > This operation is not atomic, so you get a race condition here. Other > than that, it seems fine. ;) of course - but i am not sure you can control atomicity at the SOLR level (yet? ;) ) for /update handler - so it'd have to either be a custom handler, or your app being the only one accessing and controlling write access to it that way. It definitely gets more interesting if you start adding shards ;) _ {Beto|Norberto|Numard} Meijome "All parts should go together without forcing. You must remember that the parts you are reassembling were disassembled by you. Therefore, if you can't get them together again, there must be a reason. By all means, do not use hammer." IBM maintenance manual, 1975 I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Duplicate content
Norberto Meijome pisze: >> 2) I don't want to overwrite old content with new one. >> >> Means, if I add duplicate content in solr and the content already >> exists, the old content should not be overwritten. > > before inserting a new document, query the index - if you get a result back, > then don't insert. I don't know of any other way. This operation is not atomic, so you get a race condition here. Other than that, it seems fine. ;) -- We read Knuth so you don't have to. -- Tim Peters Jarek Zgoda re:define
Dismax request handler and sub phrase matches... suggestion for another handler..
Hi, Apologies if you are receiving it second time...having tough time with mail server.. I take a user entered query as it is and run it with dismax query handler. The documents fields have been filled from structured data, where different fields have different attributes like number of beds, number of baths, city name etc. A sample user query would look like "3 bed homes in new york". I would like this to match against city:new york and beds:3 beds. When I use dismax handler with boosts and tie parameter, I do not always get the most relevant top 10 results because there seem to be many factors in play one of which is not being able to recognize the presence of sub phrases and secondly not being able to ignore unwanted matches in unwanted fields. What are your thoughts on having one more request handler like dismax, but which uses a sub-phrase query instead of dismax query ? It would also provide the below parameters, on per field basis, to help customize the behavior of the request handler, and give more flexibility in different scenarios. . phraseBoost - how better is a 3 word sub phrase match than 2 word sub phrase match useOnlyMaxMatch - If many sub phrases match in the field, only the best score is used. ignoreDuplicates - If a field has duplicate matches, pick only one match for scoring. matchOnlyOneField - if match is found in the first field, remove the matched terms while querying the other fields. For example, for me city match is more important than in other fields. So,, I do not want the"new" in new york to match all other fields and skew the results, which is what i am seeing with dismax, irrespective of the high boosts. ignoreSomeLuceneScorefactors - Ignore the lucene tf, idf, query norm or any such criteria which is not needed for this field., since if I want exact matches only, they are really not important. They also seem to play a big role in me not being to get most relevant top 10 results. I see this handler might be useful in the below use cases - a) data is mostly exact in that, I am not trying to search on free text like, mails, reviews, articles, web pages etc b) numbers and their binding are important c) exact phrase or sub phrase matches are more important than rankings derived from tf, idf, query norm etc. d) need to make sure that in some cases some fields affect the scoring and in some they don't. I found this was the most difficult task, to trace the noise matches from the required ones for my use case. Your thoughts and suggestions on alternatives are welcome. Have also posted a question on sub phrase matching in lucene-user which is not related to having a solr handler with additional features like sub-phrase matching, for user entered queries. Thanks Preetam
Re: Duplicate content
On Tue, 15 Jul 2008 13:15:41 +0530 "Sunil" <[EMAIL PROTECTED]> wrote: > 1) I don't want duplicate content. SOLR uses the field you define as the unique field to determine whether a document should be replaced or added. The rest of the fields are in your hands. You could devise a setup whereby the document id is generated by hashing all the other fields in your schema, thereby ensuring that a unique document id means unique content (of course, for a meaning of 'uniqueness' that is "different bytes" ;) ) > 2) I don't want to overwrite old content with new one. > > Means, if I add duplicate content in solr and the content already > exists, the old content should not be overwritten. before inserting a new document, query the index - if you get a result back, then don't insert. I don't know of any other way. b _ {Beto|Norberto|Numard} Meijome "The real voyage of discovery consists not in seeking new landscapes, but in having new eyes." Marcel Proust I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Duplicate content
You must do a check before adding documents On Tue, Jul 15, 2008 at 1:15 PM, Sunil <[EMAIL PROTECTED]> wrote: > Hi All, > > I want to change the duplicate content behavior in solr. What I want to > do is: > > 1) I don't want duplicate content. > 2) I don't want to overwrite old content with new one. > > Means, if I add duplicate content in solr and the content already > exists, the old content should not be overwritten. > > Can anyone suggest how to achieve it? > > > Thanks, > Sunil > > > -- --Noble Paul
Duplicate content
Hi All, I want to change the duplicate content behavior in solr. What I want to do is: 1) I don't want duplicate content. 2) I don't want to overwrite old content with new one. Means, if I add duplicate content in solr and the content already exists, the old content should not be overwritten. Can anyone suggest how to achieve it? Thanks, Sunil