Spellchecker index rebuild error

2008-01-14 Thread Doug Steigerwald
Lately I've been having issues with the spellchecker failing to properly rebuild my spell index.  I 
used to be able to delete the spell directory and reload the core and build the index fine if it 
ever crapped out, but now I can't even build it.


java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs (No such 
file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
at 
org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
at 
org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:70)
at 
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
...

Here's the query: /solr/dsteiger/select/?q=testqt=spellcheckercmd=rebuild

Here's my config snippet:

requestHandler name=spellchecker class=solr.SpellCheckerRequestHandler 
startup=lazy
lst name=defaults
int name=suggestionCount1/int
float name=accuracy0.5/float
/lst
str name=spellcheckerIndexDirspell/str
str name=termSourceFieldspell/str
/requestHandler

Anyone have any ideas?

Doug


field:(-null) returns records where field was not specified

2008-01-14 Thread Karen Loughran


Hi all,

We are indexing different types of documents, some with certain fields set and 
some without, some fields sometimes in both.

If a particular field is missing in a newly added record, I would have 
expected the query:

field_name:(-null)

not to return this particular record in the response, ie, I'm assuming the 
field is set to null.

But the response we see includes empty docs:

..

..
doc
 /doc
doc
 /doc
doc
 /doc
etc, etc
..


Can someone explain why field_name:(-null) returns the records where 
field_name is missing ?

We note that if we do the range operation we can get a response without the 
records with no field_name:

field_name:[* TO *]

Many thanks
Karen


Re: field:(-null) returns records where field was not specified

2008-01-14 Thread Erick Erickson
Have you seen this page?
http://lucene.apache.org/java/docs/queryparsersyntax.html

From that page:
Note: The NOT operator cannot be used with just one term. For example, the
following search will return no results:
NOT jakarta apache


Erick


On Jan 14, 2008 9:30 AM, Karen Loughran [EMAIL PROTECTED] wrote:



 Hi all,

 We are indexing different types of documents, some with certain fields set
 and
 some without, some fields sometimes in both.

 If a particular field is missing in a newly added record, I would have
 expected the query:

 field_name:(-null)

 not to return this particular record in the response, ie, I'm assuming the
 field is set to null.

 But the response we see includes empty docs:

 ..
 
 ..
 doc
  /doc
 doc
  /doc
 doc
  /doc
 etc, etc
 ..
 

 Can someone explain why field_name:(-null) returns the records where
 field_name is missing ?

 We note that if we do the range operation we can get a response without
 the
 records with no field_name:

 field_name:[* TO *]

 Many thanks
 Karen



Re: LNS - or - now i know we've succeeded

2008-01-14 Thread Walter Underwood
Yes, they are reputable. They've been doing consulting with Verity,
Ultraseek, and other platforms for many years.  --wunder

On 1/12/08 1:22 AM, Chris Hostetter [EMAIL PROTECTED] wrote:

 It is pretty cool to see a reputable
 Search company (is ideaeng.com a reputable search consulting company?



batch indexing takes more time than shown on SOLR output -- something to do with IO?

2008-01-14 Thread Britske

I have a batch program which inserts items in a solr/lucene index. 
all is going fine and I get update messages in the console like: 

14-jan-2008 16:40:52 org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42
more)
]} 0 875

However, when timing this instruction on the client-side (I use SOlrJ --
req.process(server)) I get totally different numbers (in the beginning the
client-side measured time is about 2 seconds on average but after some time
this time goes up to about 30-40 seconds, altough the solr-outputted time
stays between 0.8-1.3 seconds? 

Does this have anything to do with costly IO-activity that is accounted for
in the SOLR output? If this is true, what tool do you recommend using to
monitor IO-activity?

Thanks, 
Geert-Jan 
-- 
View this message in context: 
http://www.nabble.com/batch-indexing-takes-more-time-than-shown-on-SOLR-output%3E-something-to-do-with-IO--tp14804471p14804471.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: field:(-null) returns records where field was not specified

2008-01-14 Thread Karen Loughran

Hi Erik, thanks for your reply,

I had read this page.  But I'm not using the NOT operator,  I'm using 
the - operator.  I'm assuming there is a subtle difference between them in 
that NOT qualifies something else, hence needs 2 terms.  Isn't the - 
operator supposed to be a complement to the + operator, ie. excludes 
something rather than requiring it ?

thanks
Karen



On Monday 14 January 2008 15:14:05 Erick Erickson wrote:
 Have you seen this page?
 http://lucene.apache.org/java/docs/queryparsersyntax.html

 From that page:
 Note: The NOT operator cannot be used with just one term. For example, the
 following search will return no results:
 NOT jakarta apache


 Erick

 On Jan 14, 2008 9:30 AM, Karen Loughran [EMAIL PROTECTED] wrote:
  Hi all,
 
  We are indexing different types of documents, some with certain fields
  set and
  some without, some fields sometimes in both.
 
  If a particular field is missing in a newly added record, I would have
  expected the query:
 
  field_name:(-null)
 
  not to return this particular record in the response, ie, I'm assuming
  the field is set to null.
 
  But the response we see includes empty docs:
 
  ..
  
  ..
  doc
   /doc
  doc
   /doc
  doc
   /doc
  etc, etc
  ..
  
 
  Can someone explain why field_name:(-null) returns the records where
  field_name is missing ?
 
  We note that if we do the range operation we can get a response without
  the
  records with no field_name:
 
  field_name:[* TO *]
 
  Many thanks
  Karen




new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Hello,

I am new to solr. I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!

Thanks,
Xiaohui 


Re: new to solr

2008-01-14 Thread Ryan McKinley

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

Hello,

I am new to solr. 


Welcome!


I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!



Are you asking how to display results for people to see?  A nicely 
formatted website?


Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc


ryan





RE: new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks so much for your reply! Please tell me what example.xsl is for in
conf/xslt.

Please let me know where the search result is located. I can use php or
.net to display the result in web. Is it created on fly?

Thanks,
Xiaohui 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: new to solr

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 Hello,
 
 I am new to solr. 

Welcome!

 I followed solr online tutorial to get the example
 work. The search result is xml. I wonder if there is a way to show
 result in a form. I saw there is example.xsl in conf/xslt directory. I
 really don't know how to do it. Anyone has some ideas for me. I really
 appreciate it!
 

Are you asking how to display results for people to see?  A nicely 
formatted website?

Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc

ryan





Re: new to solr

2008-01-14 Thread Ryan McKinley

the example.xsl is an example using XSLT to format results.  Check:
http://wiki.apache.org/solr/XsltResponseWriter

For php, check:
http://wiki.apache.org/solr/SolPHP

ryan



Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

Thanks so much for your reply! Please tell me what example.xsl is for in
conf/xslt.

Please let me know where the search result is located. I can use php or
.net to display the result in web. Is it created on fly?

Thanks,
Xiaohui 


-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:37 AM

To: solr-user@lucene.apache.org
Subject: Re: new to solr

Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

Hello,

I am new to solr. 


Welcome!


I followed solr online tutorial to get the example
work. The search result is xml. I wonder if there is a way to show
result in a form. I saw there is example.xsl in conf/xslt directory. I
really don't know how to do it. Anyone has some ideas for me. I really
appreciate it!



Are you asking how to display results for people to see?  A nicely 
formatted website?


Solr (a database) does not aim to solve the display side... but there 
are lots of clients to help integrate with your website. 
php/java/.net/ruby/etc


ryan








RE: new to solr

2008-01-14 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thanks very much, Ryan. I really appreciate it. I will take a look on
both.

Best regards,
Xiaohui 

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Monday, January 14, 2008 11:56 AM
To: solr-user@lucene.apache.org
Subject: Re: new to solr

the example.xsl is an example using XSLT to format results.  Check:
http://wiki.apache.org/solr/XsltResponseWriter

For php, check:
http://wiki.apache.org/solr/SolPHP

ryan



Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 Thanks so much for your reply! Please tell me what example.xsl is for
in
 conf/xslt.
 
 Please let me know where the search result is located. I can use php
or
 .net to display the result in web. Is it created on fly?
 
 Thanks,
 Xiaohui 
 
 -Original Message-
 From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
 Sent: Monday, January 14, 2008 11:37 AM
 To: solr-user@lucene.apache.org
 Subject: Re: new to solr
 
 Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:
 Hello,

 I am new to solr. 
 
 Welcome!
 
 I followed solr online tutorial to get the example
 work. The search result is xml. I wonder if there is a way to show
 result in a form. I saw there is example.xsl in conf/xslt directory.
I
 really don't know how to do it. Anyone has some ideas for me. I
really
 appreciate it!

 
 Are you asking how to display results for people to see?  A nicely 
 formatted website?
 
 Solr (a database) does not aim to solve the display side... but there 
 are lots of clients to help integrate with your website. 
 php/java/.net/ruby/etc
 
 ryan
 
 
 
 



Re: new to solr

2008-01-14 Thread Stuart Sierra
On Jan 14, 2008 11:55 AM, Ryan McKinley [EMAIL PROTECTED] wrote:
 the example.xsl is an example using XSLT to format results.  Check:
 http://wiki.apache.org/solr/XsltResponseWriter

To add to the above: I think the XsltResponseWriter is not intended
for formatting results for display on your web site.  Normally you
would use your server-side language (PHP, Python, etc.) to query the
Solr server, get the results, and format them for display.  Solr
doesn't provide the front-end search interface for your web site --
you have to create that yourself.

-Stuart
altlaw.org


Re: Documents with One-to-many

2008-01-14 Thread Stuart Sierra
On Jan 11, 2008 10:44 AM, Evgeniy Strokin [EMAIL PROTECTED] wrote:
 Hello. If I need documents which has number of fields but also I have number 
 of other documents which related to the first one one-to-many. For example a 
 person, could have several addresses. I want to have all of them in search 
 result if I look for people. Also I want to search people by address.
 How it could be done in Solr?

It may be easier to perform this type of query in a relational
database.  With Solr, I think you would have to copy all of the many
fields into a single field in your one document.  So, a person
document would have a single address field containing all the
addresses for that person.

-Stuart
altlaw.org


Re: Spellchecker index rebuild error

2008-01-14 Thread Otis Gospodnetic
I haven't looked at the Spellchecker in a while, but it sounds like you are 
deleting the index files manually.  Any reason for that?  Shouldn't that 
rebuild command run smoothly even with a pre-existing index there (funny that I 
ask this, considering this was my doing).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Doug Steigerwald [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 8:31:06 AM
Subject: Spellchecker index rebuild error

Lately I've been having issues with the spellchecker failing to
 properly rebuild my spell index.  I 
used to be able to delete the spell directory and reload the core and
 build the index fine if it 
ever crapped out, but now I can't even build it.

java.io.FileNotFoundException: /home/dsteiger/solr/data/spell/_8c.cfs
 (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccessFile.java:212)
at
 
org.apache.lucene.store.FSDirectory$FSIndexInput$Descriptor.init(FSDirectory.java:506)
at
 org.apache.lucene.store.FSDirectory$FSIndexInput.init(FSDirectory.java:536)
at
 org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:445)
at
 org.apache.lucene.index.CompoundFileReader.init(CompoundFileReader.java:70)
at
 org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:181)
at
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:167)
...

Here's the query:
 /solr/dsteiger/select/?q=testqt=spellcheckercmd=rebuild

Here's my config snippet:

 requestHandler name=spellchecker
 class=solr.SpellCheckerRequestHandler startup=lazy
 lst name=defaults
 int name=suggestionCount1/int
 float name=accuracy0.5/float
 /lst
 str name=spellcheckerIndexDirspell/str
 str name=termSourceFieldspell/str
 /requestHandler

Anyone have any ideas?

Doug





Text Summarizer

2008-01-14 Thread Ycrux

Hi!

I'm looking for a good way to get a good text summarizer
for my personal search engine based Solr.

Actually, I'm using ots (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \
-no-references  2/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar

to google text snippet (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with elinks 
(the text browser)

like in the previous example.

Thanks in adavance.

cheers
Younès


MoreLikeThis similarity field boosting

2008-01-14 Thread Vladimir Garvardt

Hello.

I'm using Solr for searching our system.
Using MoreLikeThis for related content searching.
Now url used for search is like this:
http://localhost:8983/solr/mlt?q=nid:7280mlt=truemlt.fl=title,teaser,bodymlt.mindf=1mlt.mintf=1fl=nid,title,score
Where nid is uniqueKey and title,teaser,body are stored fields with 
multiValued set to true.


The question is:
Is it possible to boost terms for one or more similarity fields?
For example I'd like something like mlt.fl=title^3,teaser^10,body - 
terms from teaser will have highest weight, then title terms and the 
lowest terms weight for body.


Thanks.


Re: Text Summarizer

2008-01-14 Thread Ycrux

Hi Otis,

Don't know really what's the name for that.

cheers
Y.

Otis Gospodnetic a écrit :

Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis 


--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer 


Hi!

I'm looking for a good way to get a good text summarizer
for my personal search engine based Solr.

Actually, I'm using ots (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \
-no-references  2/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar

to google text snippet (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 elinks 
(the text browser)

like in the previous example.

Thanks in adavance.

cheers
Younès



 

  




Re: Text Summarizer

2008-01-14 Thread Otis Gospodnetic
Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis 

--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer 

Hi!

I'm looking for a good way to get a good text summarizer
for my personal search engine based Solr.

Actually, I'm using ots (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \
-no-references  2/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain something 
similar
to google text snippet (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 elinks 
(the text browser)
like in the previous example.

Thanks in adavance.

cheers
Younès





unique ID question

2008-01-14 Thread Evgeniy Strokin
If I make one of my field as a unique ID, id doesn't increase/decrease 
performance of searching by this field. Right?
For example if I have two fields, I know for sure both of them are unique, both 
the same type, and make one of them as a Solr Unique ID. The general 
performance should be the same if I want to retrieve a document by first field 
or by the second.
Am I correct? Any general ideas or comments on this topic would be helpful to 
better understand how unique ID works.
 
Thank you
Gene

Re: unique ID question

2008-01-14 Thread Ryan McKinley

Evgeniy Strokin wrote:

If I make one of my field as a unique ID, id doesn't increase/decrease 
performance of searching by this field. Right?
For example if I have two fields, I know for sure both of them are unique, both 
the same type, and make one of them as a Solr Unique ID. The general 
performance should be the same if I want to retrieve a document by first field 
or by the second.
Am I correct? Any general ideas or comments on this topic would be helpful to 
better understand how unique ID works.
 


correct - search performance only depends on the lucene index 
characteristics.


The field you declare as: uniqueKeyid/uniqueKey is just a marker to 
solr to say what field it should use to check if the document overwrites 
another one.


From the searching side, there is nothing special about the uniqueKey 
field, it is only for /update that it gets used.


ryan


index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman
We had an index run out of disk space. Queries work fine but commits  
return


h1500 doc counts differ for segment _18lu: fieldsReader shows 104  
but segmentInfo shows 212


org.apache.lucene.index.CorruptIndexException: doc counts differ for  
segment _18lu: fieldsReader shows 104 but segmentInfo shows 212
	at  
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:191)


I've made room, restarted resin, and now solr won't start. No useful  
messages in the startup, just a


[21:01:49.105] Could not start SOLR. Check solr/home property
[21:01:49.105] java.lang.NullPointerException
[21:01:49.105]  at  
org 
.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java: 
100)


What can I do from here?







Re: index out of disk space, CorruptIndexException

2008-01-14 Thread Brian Whitman


On Jan 14, 2008, at 4:08 PM, Ryan McKinley wrote:

ug -- maybe someone else has better ideas, but you can try:
http://svn.apache.org/repos/asf/lucene/java/trunk/src/java/org/apache/lucene/index/CheckIndex.java


thanks for the tip, i did run that, but I stopped it 30 minutes in, as  
it was still on the first (out of 46) segment.. The index is (was)  
129GB.

I just restored to an older index and made this ticket, 
https://issues.apache.org/jira/browse/SOLR-455





Re: Text Summarizer

2008-01-14 Thread Mike Klaas
See http://wiki.apache.org/solr/HighlightingParameters .  The default  
behaviour will provide snippets like google does.


Note that you need to store the text of fields you want to  
highlight for this to work.


cheers,
-Mike

On 14-Jan-08, at 2:17 PM, Ycrux wrote:


Maybe the right name is Snippet. Like Google snippets.

cheers
Y.

Otis Gospodnetic a écrit :

Sounds like you are looking for a highlighter/KWIC, not a summarizer?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: Ycrux [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, January 14, 2008 2:45:09 PM
Subject: Text Summarizer
Hi!

I'm looking for a good way to get a good text summarizer
for my personal search engine based Solr.

Actually, I'm using ots (Open Text Summurizer) but the result
is far from perfection.

Here's an example of usage:
$ elinks http://lucene.apache.org/solr/; -force-html -no-numbering \
-no-references  2/dev/null | ots -r 40 | less -S

The result is OK for this site, but I would like to obtain  
something similar

to google text snippet (a real excerpt).

Advices are welcome?

N.B: all the HTML pages I'm indexing are converted to text with
 elinks (the text browser)
like in the previous example.

Thanks in adavance.

cheers
Younès











RE: field:(-null) returns records where field was not specified

2008-01-14 Thread Chris Hostetter

Several things in this thread should be clarified (note: order of 
quotations munged for clarity)...

: I had read this page.  But I'm not using the NOT operator,  I'm using the
: - operator.  I'm assuming there is a subtle difference between them in
: that NOT qualifies something else, hence needs 2 terms.  Isn't the - 
: operator supposed to be a complement to the + operator, ie. excludes
: something rather than requiring it ?

The NOT operator and the - operator are in fact the same thing ... the 
duplicate syntax comes from Lucene trying to appease people that 
want boolean style operator synta (AND/OR/NOT) even though the query 
parser is not a boolean syntax.

:  Have you seen this page?
:  http://lucene.apache.org/java/docs/queryparsersyntax.html
: 
:  From that page:
:  Note: The NOT operator cannot be used with just one term. For example, 
:  the following search will return no results:
:  NOT jakarta apache

In Solr, the query parser can in fact support purely negative queries, by 
internally transforming the query, this is noted on the Solr query syntax 
wiki...

http://wiki.apache.org/solr/SolrQuerySyntax

:   field_name:(-null)

null is not a special keyword, if you look at the debugging output when 
doing that query you'll see that it is the same as:   -field_name:null  
... which is a search for all docs containing the string null in the 
field field_name.

: The *:* (star colon star) means all records. The trick is to use (*:* AND
: -field:[* TO *]). It's silly, but there it is.

as i mentioned, you can do pure wildcard queries now, so a simple search 
for -field_name:[* TO *] will find all docs that have no indexed values 
for that field at all.

: A performance note: we switched from empty fields to fields with a standard
: 'empty' value. This way we don't have to do a range check to find records
: with empty fields.

Your milage may vary depending on how many docs you have with no value 
... this also issn't practical when dealing with numeric, boolean, or date 
based fields.  (and depending on how much churn there is in your index, 
the filterCache can probably make the difference negliable on average 
anyway).




-Hoss