Re: Highlighting and fields
I'm doing the following query: q=text:abc AND type:typeA And I ask to return highlighting (query.setHighlight(true);). The search term for field type (typeA) is also highlighted in the text field. Anyway to avoid this ? Use setHighlightRequireFieldMatch(true) on the query object [1]. Lars [1] http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrQuery.html#setHighlightRequireFieldMatch(boolean)
Re: Hardware config for SOLR
I have not worked with SSDs, though I've read all the good information that's trickling to us from Denmark. One thing that I've been wondering all along is - what about writes? That is, what about writes wearing out the SSD? How quickly does that happen and when it does happen, what are the symptoms? For example, does it happen after N write operations? Do writes start failing and one starts getting IOExceptions in case of Lucene and Solr? With modern SSDs you get something in the region of 500,000 to 1,000,000 write cycles per memory cell. Additionally they all use wear leveling, i.e. the writes are spread over the whole disk -- you can write to a file system block many times more. One of the manufacturers of high-end SSDs [1] claims that at a sustained write rate of 50GB per day their drives will last more than 140 years, i.e. it's much more likely that something else will fail before ;) When the write cycles are exhausted much the same thing as with a bad conventional disk happens -- you'll see lots of write errors. If the wear leveling is perfect (i.e. all memory locations have exactly the same number of writes) it's even possible that the whole disk will fail at once. Lars [1] http://www.mtron.net
Re: Pains upgrading from 1.2 to 1.3, any help appreciated
I'll shamelessly take this opportunity to plug the long-neglected SOLR-657. Lars
Re: Hardware config for SOLR
As for HDDs, people have noted some nice speedups in Lucene using Solid-state drives, if you can afford them. I've seen the average response time cut in 5-10 times when switching to SSD. 64GB SSD is starting at EUR 200 so that can be a lot cheaper to do replace the disk than getting more servers, given you can fit your index on of those. For some concrete numbers, see http://wiki.statsbiblioteket.dk/summa/Hardware Lars
Re: Solr 1.3 and Lucene 2.4 dev
Some highlighting stuff, most notably maxAnalyzedChars=-1 (SOLR-610) requires Lucene 2.4 to work correctly. Lars
Re: Search 'proxy' when using multiple 'shards'
Does anything like this exist, or do I have to write it? It doesn't come with Solr, but it should be quite easy to implement a proxy e.g. with Apache httpd mod_rewrite [1]. Lars [1] http://httpd.apache.org/docs/2.2/rewrite/
Re: still looking for multicore.xml?
So it appears to be looking for multicore.xml, still. If I put my old multicore.xml in the multicore directory, it runs fine. solr.xml is ignored. Do I have an odd configuration somewhere that might cause this? Looking at the code in trunk everything appears to be fine. Did you run ant example before starting the server? Otherwise it's probably picking up some old jars/class files. Lars
Solr user interface
Hi all, I've written a user interface for Solr (Spring web application) which I'd be willing to donate if people are interested. You can see a demo here http://larsko.dyndns.org:8080/solr-ui/search.html, SVN repository is here http://larsko.dyndns.org/svn/solr-ui/. Note in particular http://larsko.dyndns.org/svn/solr-ui/documentation/manual.pdf for a short manual. Please be patient, the server this is running on doesn't have a lot of processing power or upstream bandwidth ;) The purpose of adding this user interface to Solr would be twofold; first, serve as a demonstration of Solr's capabilities (running on a server linked to from the website, probably like the demo above), and second, give people a starting point/inspiration for implementing their own user interfaces. The special feature is that it supports some form of hierarchical faceting (explained in the manual). The data the demo searches comes from the wikipedia selection for schools. The subject index pages are used to build the hierarchy. Let me know what you think. Thanks, Lars
Re: AND vs. OR query performance
Thanks for the clarification. The behaviour I'm seeing is that OR queries are almost *twice* as performant as AND queries, so that's probably down to my specific setup/data. I'll try to investigate further. Lars On Mon, 12 May 2008 19:35:00 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: In general, AND will perform better than OR (because of skipping in the scorers). But if the number of documents matching the AND is close to that matching the OR query, then skipping doesn't gain you much and probably has a little more overhead. -Yonik On Sun, May 11, 2008 at 4:04 AM, Lars Kotthoff [EMAIL PROTECTED] wrote: Dear list, during some performance experiments I have found that queries with ORed search terms are significantly faster than queries with ANDed search terms, everything else being equal. Does anybody know whether this is the generally expected behaviour? Thanks, Lars