How many documents are in the index?
If you haven't already done this I'd take a really close look at your
schema and make sure you're only storing the things that should really
be stored, same with the indexed fields. I drastically reduced my
index size just by changing some indexed/stored
I've found an odd situation where solr is not returning all of the
documents that I think it should. A search for Geckoplp4-M returns 3
documents but I know that there are at least 100 documents with that
string.
Here is an example query for that phrase and the result set:
Sorry, I've figured out my own problem. There is a problem with the
way I create the xml document for indexing that was causing some of
the comments fields to not be listed correctly in the default search
field, content.
On 10/12/07, Kevin Lewandowski [EMAIL PROTECTED] wrote:
I've found an odd
small in comparison (about 27 mb
approx) but it still returns snippets!
Are you storing the complete html? If so I think you should strip out
the html then index the document.
On 10/9/07, Kevin Lewandowski [EMAIL PROTECTED] wrote:
Late reply on this but I just wanted to say thanks
On 8/20/07, Mike Klaas [EMAIL PROTECTED] wrote:
On 17-Aug-07, at 2:03 PM, Kevin Lewandowski wrote:
Are there any tips on reducing the index size or what factors most
impact index size?
My index has 2.7 million documents and is 200 gigabytes and growing.
Most documents are around 2-3kb
Are there any tips on reducing the index size or what factors most
impact index size?
My index has 2.7 million documents and is 200 gigabytes and growing.
Most documents are around 2-3kb and there are about 30 indexed fields.
thanks,
Kevin
snapshooter does create incremental builds of the index. It doesn't
appear so if you look at the contents because the existing files are
hard links. But it is incremental.
On 4/20/07, Doss [EMAIL PROTECTED] wrote:
Hi Yonik,
Thanks for your quick response, my question is this, can we take
I recommend you build your query with facet options in raw format and
make sure you're getting back the data you want. Then build it into
your app.
On 4/18/07, Jennifer Seaman [EMAIL PROTECTED] wrote:
Does anyone have any sample code (php, perl, etc) how to setup facet
browsing with paging? I
snapshooter copies all files but most files in the snapshot
directories are hard links pointing to segments in the main index
directory. So only new segments end up getting copied.
We've been running replication on discogs.com for several months and
it works great.
On 2/13/07, escher2k [EMAIL
This should explain most everything:
http://wiki.apache.org/solr/CollectionDistribution
I've been running solr replication on discogs.com for a few months and
it works great!
Kevin
On 1/23/07, S Edirisinghe [EMAIL PROTECTED] wrote:
Hi,
I just started looking into solr. I like the features
Hmmm, on most Linux/UNIX systems, sending the QUIT signal does nothing
else but generate a stack trace to the console or a log file. If you
don't start tomcat by hand, the stack trace may go somewhere else I
suppose. This would be useful to learn how to do on your particular
system (and we
accept connections for 3 or 4 hours ... did you try taking some thread
dumps like yonik suggested to see what all the threads were doing?
A kill -3 will not kill the process. It does nothing and there's no
thread dump on the console. kill -9 does kill it though.
btw, this has been a bigger
My solr installation has been running fine for a few weeks but now
after a server reboot it starts and runs for a few seconds, then stops
responding. I don't see any errors in the logfiles, apart from
snapinstaller not being able to issue a commit. Also, the process is
using 100% cpu and
In the admin interface, if you click statistics, there's a cache section.
On 11/29/06, Tom [EMAIL PROTECTED] wrote:
Hi -
I'm starting to try to tune my installation a bit, and I'm looking
for cache statistics. Is there a way to peek into a running
installation, and see what my cache stats are?
On Discogs I'm running Solr with two slaves and one master, using the
distribution scripts. The slaves pull and install a new snapshot every
five minutes and this is working very well so far.
Are there any risks with reducing this window to every one or two
minutes? With large caches could the
I have not done one but have been planning to do it based on this article:
http://today.java.net/pub/a/today/2005/08/09/didyoumean.html
With Solr it would be much simpler than the java examples they give.
On 10/30/06, Michael Imbeault [EMAIL PROTECTED] wrote:
Hello everyone,
Has anybody
I had the very same article in mind - how would it be simpler in Solr
than in Lucene? A spellchecker is pretty much standard in every major
I meant it would be a simpler implementation in Solr because you don't
have to deal with java or any Lucene API's. You just create a document
for each
No, after you add new documents you simply issue a commit/ command
and the new docs are searchable.
On Discogs.com we have just over 1 million docs in the index and do
about 20,000 updates per day. Every 15 minutes we read a queue and add
new documents, then commit. And we optimize once per day.
I've had a problem similar to this and it was because of the
schema.xml. It was valid XML but there were some incorrect field
definitions and/or the default field listed was not a defined field.
I'd suggest you start with the default schema and build on it piece by
piece, each time testing for
with the tutorial example
data and ensure things work as I've stated here. Let us know more
details if the problem persists.
Erik
On Sep 26, 2006, at 11:02 PM, Kevin Lewandowski wrote:
I'm running the latest nightly build (2006-09-27) and cannot seem to
get the q.op parameter working. I have
On the performace wiki page it mentions a test box with 16GB ram. Did
anything special need to be done to use that much ram (with the OS or
java)? Would Solr on a system with Linux x86_64 and Tomcat be able to
use that much ram? (sorry, I don't know Java so I don't know if there
are any
I just wanted to say thanks to the Solr developers.
I'm now using Solr for the main search engine on Discogs.com. I've
been through five revisions of the search engine and this was
definitely the least painful. Solr gives me the power of Lucene
without having to deal with the guts. It made for a
You might want to look at acts_as_searchable for Ruby:
http://rubyforge.org/projects/ar-searchable
That's a similar plugin for the Hyperestraier search engine using its
REST interface.
On 8/28/06, Erik Hatcher [EMAIL PROTECTED] wrote:
I've spent a few hours tinkering with an Ruby ActiveRecord
23 matches
Mail list logo