Re: Strategy for handling large (and growing) index: horizontal partitioning?

2008-03-03 Thread Kevin Lewandowski
How many documents are in the index? If you haven't already done this I'd take a really close look at your schema and make sure you're only storing the things that should really be stored, same with the indexed fields. I drastically reduced my index size just by changing some indexed/stored

solr not finding all results

2007-10-12 Thread Kevin Lewandowski
I've found an odd situation where solr is not returning all of the documents that I think it should. A search for Geckoplp4-M returns 3 documents but I know that there are at least 100 documents with that string. Here is an example query for that phrase and the result set:

Re: solr not finding all results

2007-10-12 Thread Kevin Lewandowski
Sorry, I've figured out my own problem. There is a problem with the way I create the xml document for indexing that was causing some of the comments fields to not be listed correctly in the default search field, content. On 10/12/07, Kevin Lewandowski [EMAIL PROTECTED] wrote: I've found an odd

Re: index size

2007-10-11 Thread Kevin Lewandowski
small in comparison (about 27 mb approx) but it still returns snippets! Are you storing the complete html? If so I think you should strip out the html then index the document. On 10/9/07, Kevin Lewandowski [EMAIL PROTECTED] wrote: Late reply on this but I just wanted to say thanks

Re: index size

2007-10-09 Thread Kevin Lewandowski
On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote: On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote: Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb

index size

2007-08-17 Thread Kevin Lewandowski
Are there any tips on reducing the index size or what factors most impact index size? My index has 2.7 million documents and is 200 gigabytes and growing. Most documents are around 2-3kb and there are about 30 indexed fields. thanks, Kevin

Re: Snapshooting or replicating recently indexed data

2007-04-21 Thread Kevin Lewandowski
snapshooter does create incremental builds of the index. It doesn't appear so if you look at the contents because the existing files are hard links. But it is incremental. On 4/20/07, Doss [EMAIL PROTECTED] wrote: Hi Yonik, Thanks for your quick response, my question is this, can we take

Re: Facet Browsing

2007-04-19 Thread Kevin Lewandowski
I recommend you build your query with facet options in raw format and make sure you're getting back the data you want. Then build it into your app. On 4/18/07, Jennifer Seaman [EMAIL PROTECTED] wrote: Does anyone have any sample code (php, perl, etc) how to setup facet browsing with paging? I

Re: Incremental replication...

2007-02-14 Thread Kevin Lewandowski
snapshooter copies all files but most files in the snapshot directories are hard links pointing to segments in the main index directory. So only new segments end up getting copied. We've been running replication on discogs.com for several months and it works great. On 2/13/07, escher2k [EMAIL

Re: replication

2007-01-23 Thread Kevin Lewandowski
This should explain most everything: http://wiki.apache.org/solr/CollectionDistribution I've been running solr replication on discogs.com for a few months and it works great! Kevin On 1/23/07, S Edirisinghe [EMAIL PROTECTED] wrote: Hi, I just started looking into solr. I like the features

Re: solr/tomcat stops responding

2006-12-03 Thread Kevin Lewandowski
Hmmm, on most Linux/UNIX systems, sending the QUIT signal does nothing else but generate a stack trace to the console or a log file. If you don't start tomcat by hand, the stack trace may go somewhere else I suppose. This would be useful to learn how to do on your particular system (and we

Re: solr/tomcat stops responding

2006-12-02 Thread Kevin Lewandowski
accept connections for 3 or 4 hours ... did you try taking some thread dumps like yonik suggested to see what all the threads were doing? A kill -3 will not kill the process. It does nothing and there's no thread dump on the console. kill -9 does kill it though. btw, this has been a bigger

Re: solr/tomcat stops responding

2006-12-01 Thread Kevin Lewandowski
My solr installation has been running fine for a few weeks but now after a server reboot it starts and runs for a few seconds, then stops responding. I don't see any errors in the logfiles, apart from snapinstaller not being able to issue a commit. Also, the process is using 100% cpu and

Re: Cache stats

2006-11-29 Thread Kevin Lewandowski
In the admin interface, if you click statistics, there's a cache section. On 11/29/06, Tom [EMAIL PROTECTED] wrote: Hi - I'm starting to try to tune my installation a bit, and I'm looking for cache statistics. Is there a way to peek into a running installation, and see what my cache stats are?

Minimum time between distributions

2006-11-21 Thread Kevin Lewandowski
On Discogs I'm running Solr with two slaves and one master, using the distribution scripts. The slaves pull and install a new snapshot every five minutes and this is working very well so far. Are there any risks with reducing this window to every one or two minutes? With large caches could the

Re: Spellchecker in Solr?

2006-10-30 Thread Kevin Lewandowski
I have not done one but have been planning to do it based on this article: http://today.java.net/pub/a/today/2005/08/09/didyoumean.html With Solr it would be much simpler than the java examples they give. On 10/30/06, Michael Imbeault [EMAIL PROTECTED] wrote: Hello everyone, Has anybody

Re: Spellchecker in Solr?

2006-10-30 Thread Kevin Lewandowski
I had the very same article in mind - how would it be simpler in Solr than in Lucene? A spellchecker is pretty much standard in every major I meant it would be a simpler implementation in Solr because you don't have to deal with java or any Lucene API's. You just create a document for each

Re: Solr use case

2006-10-11 Thread Kevin Lewandowski
No, after you add new documents you simply issue a commit/ command and the new docs are searchable. On Discogs.com we have just over 1 million docs in the index and do about 20,000 updates per day. Every 15 minutes we read a queue and add new documents, then commit. And we optimize once per day.

Re: Couple of problems

2006-10-11 Thread Kevin Lewandowski
I've had a problem similar to this and it was because of the schema.xml. It was valid XML but there were some incorrect field definitions and/or the default field listed was not a defined field. I'd suggest you start with the default schema and build on it piece by piece, each time testing for

Re: Can't get q.op working

2006-09-27 Thread Kevin Lewandowski
with the tutorial example data and ensure things work as I've stated here. Let us know more details if the problem persists. Erik On Sep 26, 2006, at 11:02 PM, Kevin Lewandowski wrote: I'm running the latest nightly build (2006-09-27) and cannot seem to get the q.op parameter working. I have

How much ram can Solr use?

2006-09-27 Thread Kevin Lewandowski
On the performace wiki page it mentions a test box with 16GB ram. Did anything special need to be done to use that much ram (with the OS or java)? Would Solr on a system with Linux x86_64 and Tomcat be able to use that much ram? (sorry, I don't know Java so I don't know if there are any

Solr now used on Discogs.com

2006-09-06 Thread Kevin Lewandowski
I just wanted to say thanks to the Solr developers. I'm now using Solr for the main search engine on Discogs.com. I've been through five revisions of the search engine and this was definitely the least painful. Solr gives me the power of Lucene without having to deal with the guts. It made for a

Re: acts_as_solr

2006-08-30 Thread Kevin Lewandowski
You might want to look at acts_as_searchable for Ruby: http://rubyforge.org/projects/ar-searchable That's a similar plugin for the Hyperestraier search engine using its REST interface. On 8/28/06, Erik Hatcher [EMAIL PROTECTED] wrote: I've spent a few hours tinkering with an Ruby ActiveRecord