Re: Problem with words thats amost similar
Den 17. des. 2009 kl. 13.48 skrev Shalin Shekhar Mangar: 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar: For specific cases like this, you can add the word to a file and specify it in schema, for example: filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ Ty Shalin. This is my schema.xml file fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no dice. Do I need to use the SnowballPorterFilterFactory? And do I need to reindex the documents? Actually EnglishPorterFilterFactory is the same as SnowballPorterFilterFactory with language=English. Both will work. You will need to re-index the documents. What I've done so far is to add both restaurant and restaurering to protwords.txt. I've also refeed a single document (with the keyword restaurering) to check that it no longer appears in a search result for restaurant. Do i have to refeed every document in the index? Or restart so that solr re-reads the protwords.txt-file (this is on a different installation(prod) then the one i restarted earlier(dev))? Steinar
Problem with words thats amost similar
Hi all. I have a delicate problem when it comes to two words that are rather similar in the way they are typed, but when it comes to the meaning of the word they are completely different. The actual words are restaurant (as in restaurant) and restaurering (as in restoration). Solr seems to think these words are similar enough to present hits on both of them in the same search result. Obviously this is not desirable. Is there a way to take care of such spesific cases without disabling solr functionality for stemming and/or plurals? Or would I need to disable stemming to make this special case disapear? I'm using the dismax query handler The field im querying against is of type text. Any help is apreciated :) Regards, Steinar
Re: Problem with words thats amost similar
Den 17. des. 2009 kl. 12.42 skrev Shalin Shekhar Mangar: 2009/12/17 Steinar Asbjørnsen steinar...@gmail.com Hi all. I have a delicate problem when it comes to two words that are rather similar in the way they are typed, but when it comes to the meaning of the word they are completely different. The actual words are restaurant (as in restaurant) and restaurering (as in restoration). Solr seems to think these words are similar enough to present hits on both of them in the same search result. Obviously this is not desirable. Is there a way to take care of such spesific cases without disabling solr functionality for stemming and/or plurals? Or would I need to disable stemming to make this special case disapear? For specific cases like this, you can add the word to a file and specify it in schema, for example: filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ Ty Shalin. This is my schema.xml file fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType I added restaurant and restaurering to protwords.txt, restarted Tomcat, but no dice. Do I need to use the SnowballPorterFilterFactory? And do I need to reindex the documents? Steinar
New solr-driven site
Hi all. Just wanted to inform you that a new solr-driven website is up and running, and to say thanks to you guys for helping out. A little info: Bedriftsøket.no (http://www.bedriftsoket.no/) is a Norwegian company catalogue using solr and SolrNet for all search/faceting functionality. The main index, of three in total, contains ~800k documents I just added the site to http://wiki.apache.org/solr/PublicServers. I did this project as a consultant, and I really hope I can introduce solr to future customers as well. Regards, Steinar
copyField from multiple fields into one
Hi all. I'm currently working on setting up spelling suggestion functionality. What I'd like is to put the values of two fields (keyword and name) into the spell-field. Something like (from schema.xml): field name=spell type=textSpell indexed=true stored=true multiValued=true/ ... copyField source=keyword dest=spell/ copyField source=name dest=spell/ As far as i can see I only get suggestions from the keyword-field, and not from the name-field. So my question is: Is it possible to copy both keyword and name into the spell-field? Thanks, Steinar
Re: Only one usage of each socket address error
Hi. This situation is still bugging me. I thought i had it fixed yday, but no... Seems like this goes both for deleting and adding, but I'll explain the delete-situation here: When I'm deleting documents(~5k) from a index, i get a error message saying Only one usage of each socket address (protocol/network address/port) is normally permitted 127.0.0.1:8983. I've tried both delete by id and delete by query, and both gives me the same error. The command that is giving me the errormessage is solr.Delete(id) and solr.Delete(new SolrQuery(id:+id)). The command is issued with SolrNet, and I'm not sure if this is SolrNet or solr related. I cannot find anything that helps me out in the catalina-log. Are there any other logs that should be checked? I'm grateful for any pointers :) Thanks, Steinar Den 29. sep. 2009 kl. 11.15 skrev Steinar Asbjørnsen: Seems like the post in the SolrNet group: http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 helped me get trough. Thanks you solr-user's for helping out too! Steinar Videresendt melding: Fra: Steinar Asbjørnsen steinar...@gmail.com Dato: 28. september 2009 17.07.15 GMT+02.00 Til: solr-user@lucene.apache.org Emne: Re: Only one usage of each socket address error I'm using the add(MyObject) command form ()in a foreach loop to add my objects to the index. In the catalina-log i cannot see anything that helps me out. It stops at: 28.sep.2009 08:58:40 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[12345]} 0 187 28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187 Whitch indicates nothing wrong. Are there any other logs that should be checked? What it seems like to me at the moment is that the foreach is passing objects(documents) to solr faster then solr can add them to the index. As in I'm eventually running out of connections (to solr?) or something. I'm running another incremental update that with other objects where the foreachs isn't quite as fast. This job has added over 100k documents without failing, and still going. Whereas the problematic job fails after ~3k. What I've learned trough the day tho, is that the index where my feed is failing is actually redundant. I.e I'm off the hook for now. Still I'd like to figure out whats going wrong. Steinar There's nothing in that output that indicates something we can help with over in solr-user land. What is the call you're making to Solr? Did Solr log anything anomalous? Erik On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote: I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet- group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip] Whats happening is that i get this error message (in VS): A first chance exception of type 'SolrNet.Exceptions.SolrConnectionException' occurred in SolrNet.DLL And the web browser (which i use to start the feed says: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. At the time of writing my index contains 15k docs, and lacks ~700k docs that the incremental feed should take care of adding to the index. The error message appears after 3k docs are added, and before 4k docs are added. I'm committing each 1%1000==0. In addittion autocommit is set to: autoCommit maxDocs1/maxDocs /autoCommit More info: From schema.xml: field name=id type=text indexed=true stored=true required=true / field name=name type=string indexed=true stored=true required=false / I'm fetching data from a (remote) Sql 2008 Server, using sqljdbc4.jar. And Solr is running on a local Tomcat-installation. SolrNet version: 0.2.3.0 Solr Specification Version: 1.3.0.2009.08.29.08.05.39 [/snip] Any suggestions on how to fix this would be much apreceiated. Regards, Steinar
Fwd: Only one usage of each socket address error
Seems like the post in the SolrNet group: http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 helped me get trough. Thanks you solr-user's for helping out too! Steinar Videresendt melding: Fra: Steinar Asbjørnsen steinar...@gmail.com Dato: 28. september 2009 17.07.15 GMT+02.00 Til: solr-user@lucene.apache.org Emne: Re: Only one usage of each socket address error I'm using the add(MyObject) command form ()in a foreach loop to add my objects to the index. In the catalina-log i cannot see anything that helps me out. It stops at: 28.sep.2009 08:58:40 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[12345]} 0 187 28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187 Whitch indicates nothing wrong. Are there any other logs that should be checked? What it seems like to me at the moment is that the foreach is passing objects(documents) to solr faster then solr can add them to the index. As in I'm eventually running out of connections (to solr?) or something. I'm running another incremental update that with other objects where the foreachs isn't quite as fast. This job has added over 100k documents without failing, and still going. Whereas the problematic job fails after ~3k. What I've learned trough the day tho, is that the index where my feed is failing is actually redundant. I.e I'm off the hook for now. Still I'd like to figure out whats going wrong. Steinar There's nothing in that output that indicates something we can help with over in solr-user land. What is the call you're making to Solr? Did Solr log anything anomalous? Erik On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote: I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet- group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip] Whats happening is that i get this error message (in VS): A first chance exception of type 'SolrNet.Exceptions.SolrConnectionException' occurred in SolrNet.DLL And the web browser (which i use to start the feed says: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. At the time of writing my index contains 15k docs, and lacks ~700k docs that the incremental feed should take care of adding to the index. The error message appears after 3k docs are added, and before 4k docs are added. I'm committing each 1%1000==0. In addittion autocommit is set to: autoCommit maxDocs1/maxDocs /autoCommit More info: From schema.xml: field name=id type=text indexed=true stored=true required=true / field name=name type=string indexed=true stored=true required=false / I'm fetching data from a (remote) Sql 2008 Server, using sqljdbc4.jar. And Solr is running on a local Tomcat-installation. SolrNet version: 0.2.3.0 Solr Specification Version: 1.3.0.2009.08.29.08.05.39 [/snip] Any suggestions on how to fix this would be much apreceiated. Regards, Steinar
Re: Only one usage of each socket address error
I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet-group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip] Whats happening is that i get this error message (in VS): A first chance exception of type 'SolrNet.Exceptions.SolrConnectionException' occurred in SolrNet.DLL And the web browser (which i use to start the feed says: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. At the time of writing my index contains 15k docs, and lacks ~700k docs that the incremental feed should take care of adding to the index. The error message appears after 3k docs are added, and before 4k docs are added. I'm committing each 1%1000==0. In addittion autocommit is set to: autoCommit maxDocs1/maxDocs /autoCommit More info: From schema.xml: field name=id type=text indexed=true stored=true required=true / field name=name type=string indexed=true stored=true required=false / I'm fetching data from a (remote) Sql 2008 Server, using sqljdbc4.jar. And Solr is running on a local Tomcat-installation. SolrNet version: 0.2.3.0 Solr Specification Version: 1.3.0.2009.08.29.08.05.39 [/snip] Any suggestions on how to fix this would be much apreceiated. Regards, Steinar