Spellchecker index rebuild error
Lately I've been having issues with the spellchecker failing to properly rebuild my spell index. I used to be able to delete the spell directory and reload the core and build the index fine if it ever crapped out, but now I can't even build it. java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:70) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) ... Here's the query: /solr/dsteiger/select/?q=testqt=spellcheckercmd=rebuild Here's my config snippet: requestHandler name=spellchecker class=solr.SpellCheckerRequestHandler startup=lazy lst name=defaults int name=suggestionCount1/int float name=accuracy0.5/float /lst str name=spellcheckerIndexDirspell/str str name=termSourceFieldspell/str /requestHandler Anyone have any ideas? Doug
field:(-null) returns records where field was not specified
Hi all, We are indexing different types of documents, some with certain fields set and some without, some fields sometimes in both. If a particular field is missing in a newly added record, I would have expected the query: field_name:(-null) not to return this particular record in the response, ie, I'm assuming the field is set to null. But the response we see includes empty docs: .. .. doc /doc doc /doc doc /doc etc, etc .. Can someone explain why field_name:(-null) returns the records where field_name is missing ? We note that if we do the range operation we can get a response without the records with no field_name: field_name:[* TO *] Many thanks Karen
Re: field:(-null) returns records where field was not specified
Have you seen this page? http://lucene.apache.org/java/docs/queryparsersyntax.html From that page: Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT jakarta apache Erick On Jan 14, 2008 9:30 AM, Karen Loughran [EMAIL PROTECTED] wrote: Hi all, We are indexing different types of documents, some with certain fields set and some without, some fields sometimes in both. If a particular field is missing in a newly added record, I would have expected the query: field_name:(-null) not to return this particular record in the response, ie, I'm assuming the field is set to null. But the response we see includes empty docs: .. .. doc /doc doc /doc doc /doc etc, etc .. Can someone explain why field_name:(-null) returns the records where field_name is missing ? We note that if we do the range operation we can get a response without the records with no field_name: field_name:[* TO *] Many thanks Karen
Re: LNS - or - now i know we've succeeded
Yes, they are reputable. They've been doing consulting with Verity, Ultraseek, and other platforms for many years. --wunder On 1/12/08 1:22 AM, Chris Hostetter [EMAIL PROTECTED] wrote: It is pretty cool to see a reputable Search company (is ideaeng.com a reputable search consulting company?
batch indexing takes more time than shown on SOLR output -- something to do with IO?
I have a batch program which inserts items in a solr/lucene index. all is going fine and I get update messages in the console like: 14-jan-2008 16:40:52 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42 more) ]} 0 875 However, when timing this instruction on the client-side (I use SOlrJ -- req.process(server)) I get totally different numbers (in the beginning the client-side measured time is about 2 seconds on average but after some time this time goes up to about 30-40 seconds, altough the solr-outputted time stays between 0.8-1.3 seconds? Does this have anything to do with costly IO-activity that is accounted for in the SOLR output? If this is true, what tool do you recommend using to monitor IO-activity? Thanks, Geert-Jan -- View this message in context: http://www.nabble.com/batch-indexing-takes-more-time-than-shown-on-SOLR-output%3E-something-to-do-with-IO--tp14804471p14804471.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: field:(-null) returns records where field was not specified
Hi Erik, thanks for your reply, I had read this page. But I'm not using the NOT operator, I'm using the - operator. I'm assuming there is a subtle difference between them in that NOT qualifies something else, hence needs 2 terms. Isn't the - operator supposed to be a complement to the + operator, ie. excludes something rather than requiring it ? thanks Karen On Monday 14 January 2008 15:14:05 Erick Erickson wrote: Have you seen this page? http://lucene.apache.org/java/docs/queryparsersyntax.html From that page: Note: The NOT operator cannot be used with just one term. For example, the following search will return no results: NOT jakarta apache Erick On Jan 14, 2008 9:30 AM, Karen Loughran [EMAIL PROTECTED] wrote: Hi all, We are indexing different types of documents, some with certain fields set and some without, some fields sometimes in both. If a particular field is missing in a newly added record, I would have expected the query: field_name:(-null) not to return this particular record in the response, ie, I'm assuming the field is set to null. But the response we see includes empty docs: .. .. doc /doc doc /doc doc /doc etc, etc .. Can someone explain why field_name:(-null) returns the records where field_name is missing ? We note that if we do the range operation we can get a response without the records with no field_name: field_name:[* TO *] Many thanks Karen
new to solr
Hello, I am new to solr. I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Thanks, Xiaohui
Re: new to solr
Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
RE: new to solr
Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
Re: new to solr
the example.xsl is an example using XSLT to format results. Check: http://wiki.apache.org/solr/XsltResponseWriter For php, check: http://wiki.apache.org/solr/SolPHP ryan Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
RE: new to solr
Thanks very much, Ryan. I really appreciate it. I will take a look on both. Best regards, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:56 AM To: solr-user@lucene.apache.org Subject: Re: new to solr the example.xsl is an example using XSLT to format results. Check: http://wiki.apache.org/solr/XsltResponseWriter For php, check: http://wiki.apache.org/solr/SolPHP ryan Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Thanks so much for your reply! Please tell me what example.xsl is for in conf/xslt. Please let me know where the search result is located. I can use php or .net to display the result in web. Is it created on fly? Thanks, Xiaohui -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Monday, January 14, 2008 11:37 AM To: solr-user@lucene.apache.org Subject: Re: new to solr Ma, Xiaohui (NIH/NLM/LHC) [C] wrote: Hello, I am new to solr. Welcome! I followed solr online tutorial to get the example work. The search result is xml. I wonder if there is a way to show result in a form. I saw there is example.xsl in conf/xslt directory. I really don't know how to do it. Anyone has some ideas for me. I really appreciate it! Are you asking how to display results for people to see? A nicely formatted website? Solr (a database) does not aim to solve the display side... but there are lots of clients to help integrate with your website. php/java/.net/ruby/etc ryan
Re: new to solr
On Jan 14, 2008 11:55 AM, Ryan McKinley [EMAIL PROTECTED] wrote: the example.xsl is an example using XSLT to format results. Check: http://wiki.apache.org/solr/XsltResponseWriter To add to the above: I think the XsltResponseWriter is not intended for formatting results for display on your web site. Normally you would use your server-side language (PHP, Python, etc.) to query the Solr server, get the results, and format them for display. Solr doesn't provide the front-end search interface for your web site -- you have to create that yourself. -Stuart altlaw.org
Re: Documents with One-to-many
On Jan 11, 2008 10:44 AM, Evgeniy Strokin [EMAIL PROTECTED] wrote: Hello. If I need documents which has number of fields but also I have number of other documents which related to the first one one-to-many. For example a person, could have several addresses. I want to have all of them in search result if I look for people. Also I want to search people by address. How it could be done in Solr? It may be easier to perform this type of query in a relational database. With Solr, I think you would have to copy all of the many fields into a single field in your one document. So, a person document would have a single address field containing all the addresses for that person. -Stuart altlaw.org
Re: Spellchecker index rebuild error
I haven't looked at the Spellchecker in a while, but it sounds like you are deleting the index files manually. Any reason for that? Shouldn't that rebuild command run smoothly even with a pre-existing index there (funny that I ask this, considering this was my doing). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 8:31:06 AM Subject: Spellchecker index rebuild error Lately I've been having issues with the spellchecker failing to properly rebuild my spell index. I used to be able to delete the spell directory and reload the core and build the index fine if it ever crapped out, but now I can't even build it. java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506) at org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445) at org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:70) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167) ... Here's the query: /solr/dsteiger/select/?q=testqt=spellcheckercmd=rebuild Here's my config snippet: requestHandler name=spellchecker class=solr.SpellCheckerRequestHandler startup=lazy lst name=defaults int name=suggestionCount1/int float name=accuracy0.5/float /lst str name=spellcheckerIndexDirspell/str str name=termSourceFieldspell/str /requestHandler Anyone have any ideas? Doug
Text Summarizer
Hi! I'm looking for a good way to get a good text summarizer for my personal search engine based Solr. Actually, I'm using ots (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \ -no-references 2/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google text snippet (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with elinks (the text browser) like in the previous example. Thanks in adavance. cheers Younès
MoreLikeThis similarity field boosting
Hello. I'm using Solr for searching our system. Using MoreLikeThis for related content searching. Now url used for search is like this: http://localhost:8983/solr/mlt?q=nid:7280mlt=truemlt.fl=title,teaser,bodymlt.mindf=1mlt.mintf=1fl=nid,title,score Where nid is uniqueKey and title,teaser,body are stored fields with multiValued set to true. The question is: Is it possible to boost terms for one or more similarity fields? For example I'd like something like mlt.fl=title^3,teaser^10,body - terms from teaser will have highest weight, then title terms and the lowest terms weight for body. Thanks.
Re: Text Summarizer
Hi Otis, Don't know really what's the name for that. cheers Y. Otis Gospodnetic a écrit : Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good text summarizer for my personal search engine based Solr. Actually, I'm using ots (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \ -no-references 2/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google text snippet (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with elinks (the text browser) like in the previous example. Thanks in adavance. cheers Younès
Re: Text Summarizer
Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good text summarizer for my personal search engine based Solr. Actually, I'm using ots (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \ -no-references 2/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google text snippet (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with elinks (the text browser) like in the previous example. Thanks in adavance. cheers Younès
unique ID question
If I make one of my field as a unique ID, id doesn't increase/decrease performance of searching by this field. Right? For example if I have two fields, I know for sure both of them are unique, both the same type, and make one of them as a Solr Unique ID. The general performance should be the same if I want to retrieve a document by first field or by the second. Am I correct? Any general ideas or comments on this topic would be helpful to better understand how unique ID works. Thank you Gene
Re: unique ID question
Evgeniy Strokin wrote: If I make one of my field as a unique ID, id doesn't increase/decrease performance of searching by this field. Right? For example if I have two fields, I know for sure both of them are unique, both the same type, and make one of them as a Solr Unique ID. The general performance should be the same if I want to retrieve a document by first field or by the second. Am I correct? Any general ideas or comments on this topic would be helpful to better understand how unique ID works. correct - search performance only depends on the lucene index characteristics. The field you declare as: uniqueKeyid/uniqueKey is just a marker to solr to say what field it should use to check if the document overwrites another one. From the searching side, there is nothing special about the uniqueKey field, it is only for /update that it gets used. ryan
index out of disk space, CorruptIndexException
We had an index run out of disk space. Queries work fine but commits return h1500 doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _18lu: fieldsReader shows 104 but segmentInfo shows 212 at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:191) I've made room, restarted resin, and now solr won't start. No useful messages in the startup, just a [21:01:49.105] Could not start SOLR. Check solr/home property [21:01:49.105] java.lang.NullPointerException [21:01:49.105] at org .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 100) What can I do from here?
Re: index out of disk space, CorruptIndexException
On Jan 14, 2008, at 4:08 PM, Ryan McKinley wrote: ug -- maybe someone else has better ideas, but you can try: http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java thanks for the tip, i did run that, but I stopped it 30 minutes in, as it was still on the first (out of 46) segment.. The index is (was) 129GB. I just restored to an older index and made this ticket, https://issues.apache.org/jira/browse/SOLR-455
Re: Text Summarizer
See http://wiki.apache.org/solr/HighlightingParameters . The default behaviour will provide snippets like google does. Note that you need to store the text of fields you want to highlight for this to work. cheers, -Mike On 14-Jan-08, at 2:17 PM, Ycrux wrote: Maybe the right name is Snippet. Like Google snippets. cheers Y. Otis Gospodnetic a écrit : Sounds like you are looking for a highlighter/KWIC, not a summarizer? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Ycrux [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 14, 2008 2:45:09 PM Subject: Text Summarizer Hi! I'm looking for a good way to get a good text summarizer for my personal search engine based Solr. Actually, I'm using ots (Open Text Summurizer) but the result is far from perfection. Here's an example of usage: $ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \ -no-references 2/dev/null | ots -r 40 | less -S The result is OK for this site, but I would like to obtain something similar to google text snippet (a real excerpt). Advices are welcome? N.B: all the HTML pages I'm indexing are converted to text with elinks (the text browser) like in the previous example. Thanks in adavance. cheers Younès
RE: field:(-null) returns records where field was not specified
Several things in this thread should be clarified (note: order of quotations munged for clarity)... : I had read this page. But I'm not using the NOT operator, I'm using the : - operator. I'm assuming there is a subtle difference between them in : that NOT qualifies something else, hence needs 2 terms. Isn't the - : operator supposed to be a complement to the + operator, ie. excludes : something rather than requiring it ? The NOT operator and the - operator are in fact the same thing ... the duplicate syntax comes from Lucene trying to appease people that want boolean style operator synta (AND/OR/NOT) even though the query parser is not a boolean syntax. : Have you seen this page? : http://lucene.apache.org/java/docs/queryparsersyntax.html : : From that page: : Note: The NOT operator cannot be used with just one term. For example, : the following search will return no results: : NOT jakarta apache In Solr, the query parser can in fact support purely negative queries, by internally transforming the query, this is noted on the Solr query syntax wiki... http://wiki.apache.org/solr/SolrQuerySyntax : field_name:(-null) null is not a special keyword, if you look at the debugging output when doing that query you'll see that it is the same as: -field_name:null ... which is a search for all docs containing the string null in the field field_name. : The *:* (star colon star) means all records. The trick is to use (*:* AND : -field:[* TO *]). It's silly, but there it is. as i mentioned, you can do pure wildcard queries now, so a simple search for -field_name:[* TO *] will find all docs that have no indexed values for that field at all. : A performance note: we switched from empty fields to fields with a standard : 'empty' value. This way we don't have to do a range check to find records : with empty fields. Your milage may vary depending on how many docs you have with no value ... this also issn't practical when dealing with numeric, boolean, or date based fields. (and depending on how much churn there is in your index, the filterCache can probably make the difference negliable on average anyway). -Hoss