Re: Highlighting and fields

2008-10-29 Thread Lars Kotthoff
 I'm doing the following query:
 q=text:abc AND type:typeA
 And I ask to return highlighting (query.setHighlight(true);). The search 
 term for field type (typeA) is also highlighted in the text field.
 Anyway to avoid this ?

Use setHighlightRequireFieldMatch(true) on the query object [1].

Lars


[1] 
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrQuery.html#setHighlightRequireFieldMatch(boolean)


Re: Hardware config for SOLR

2008-09-20 Thread Lars Kotthoff
 I have not worked with SSDs, though I've read all the good information that's
 trickling to us from Denmark.  One thing that I've been wondering all along is
 - what about writes?  That is, what about writes wearing out the SSD?  How
 quickly does that happen and when it does happen, what are the symptoms?  For
 example, does it happen after N write operations?  Do writes start failing and
 one starts getting IOExceptions in case of Lucene and Solr?

With modern SSDs you get something in the region of 500,000 to 1,000,000 write
cycles per memory cell. Additionally they all use wear leveling, i.e. the writes
are spread over the whole disk -- you can write to a file system block many
times more. One of the manufacturers of high-end SSDs [1] claims that at a
sustained write rate of 50GB per day their drives will last more than 140 years,
i.e. it's much more likely that something else will fail before ;)

When the write cycles are exhausted much the same thing as with a bad
conventional disk happens -- you'll see lots of write errors. If the wear
leveling is perfect (i.e. all memory locations have exactly the same number of
writes) it's even possible that the whole disk will fail at once.

Lars

[1] http://www.mtron.net


Re: Pains upgrading from 1.2 to 1.3, any help appreciated

2008-09-19 Thread Lars Kotthoff
I'll shamelessly take this opportunity to plug the long-neglected SOLR-657.

Lars


Re: Hardware config for SOLR

2008-09-19 Thread Lars Kotthoff
  As for HDDs, people have noted some nice speedups in Lucene using  
  Solid-state drives, if you can afford them.
 
 I've seen the average response time cut in 5-10 times when switching  
 to SSD. 64GB SSD is starting at EUR 200 so that can be a lot cheaper  
 to do replace the disk than getting more servers, given you can fit  
 your index on of those.

For some concrete numbers, see http://wiki.statsbiblioteket.dk/summa/Hardware

Lars


Re: Solr 1.3 and Lucene 2.4 dev

2008-09-16 Thread Lars Kotthoff
Some highlighting stuff, most notably maxAnalyzedChars=-1 (SOLR-610) requires
Lucene 2.4 to work correctly.

Lars


Re: Search 'proxy' when using multiple 'shards'

2008-09-12 Thread Lars Kotthoff
 Does anything like this exist, or do I have to write it?

It doesn't come with Solr, but it should be quite easy to implement a proxy e.g.
with Apache httpd mod_rewrite [1].

Lars

[1] http://httpd.apache.org/docs/2.2/rewrite/


Re: still looking for multicore.xml?

2008-09-03 Thread Lars Kotthoff
 So it appears to be looking for multicore.xml, still.  If I put my old
 multicore.xml in the multicore directory, it runs fine.  solr.xml is
 ignored.  Do I have an odd configuration somewhere that might cause
 this?

Looking at the code in trunk everything appears to be fine. Did you run ant
example before starting the server? Otherwise it's probably picking up some old
jars/class files.

Lars


Solr user interface

2008-07-10 Thread Lars Kotthoff
Hi all,

 I've written a user interface for Solr (Spring web application) which I'd be
willing to donate if people are interested.

You can see a demo here http://larsko.dyndns.org:8080/solr-ui/search.html, SVN
repository is here http://larsko.dyndns.org/svn/solr-ui/. Note in particular
http://larsko.dyndns.org/svn/solr-ui/documentation/manual.pdf for a short
manual. Please be patient, the server this is running on doesn't have a lot of
processing power or upstream bandwidth ;)

The purpose of adding this user interface to Solr would be twofold; first, serve
as a demonstration of Solr's capabilities (running on a server linked to from
the website, probably like the demo above), and second, give people a starting
point/inspiration for implementing their own user interfaces.

The special feature is that it supports some form of hierarchical faceting
(explained in the manual). The data the demo searches comes from the wikipedia
selection for schools. The subject index pages are used to build the hierarchy.

Let me know what you think.

Thanks,

Lars


Re: AND vs. OR query performance

2008-05-12 Thread Lars Kotthoff
Thanks for the clarification. The behaviour I'm seeing is that OR queries are
almost *twice* as performant as AND queries, so that's probably down to my
specific setup/data. I'll try to investigate further.

Lars

On Mon, 12 May 2008 19:35:00 -0400
Yonik Seeley [EMAIL PROTECTED] wrote:

 In general, AND will perform better than OR (because of skipping in
 the scorers).  But if the number of documents matching the AND is
 close to that matching the OR query, then skipping doesn't gain you
 much and probably has a little more overhead.
 
 -Yonik
 
 On Sun, May 11, 2008 at 4:04 AM, Lars Kotthoff [EMAIL PROTECTED] wrote:
  Dear list,
 
during some performance experiments I have found that queries with ORed 
  search
   terms are significantly faster than queries with ANDed search terms, 
  everything
   else being equal.
 
   Does anybody know whether this is the generally expected behaviour?
 
   Thanks,
 
   Lars