I understand that, and that makes sense. But, coming back to the
orginal question:
When performing searches,
I need to be able to search against any combination of sites.
Does anybody have suggestions what the best practice for a scenario
like that would be, considering both
Dietrich,
I don't think there are established practices in the open (yet). You could
design your application with a site(s)-shard mapping and then, knowing which
sites are involved in the query, search only the relevant shards. This will be
efficient, but it would require careful management
Makes sense, nut probably overkill for my requirements. I wasn't
really talking 275*20, more likely the total would be something
like four million documents. I was under the assumption that a single
machine, or a simple distributed index, should be able to handle that,
is that wrong?
-ds
On
Ah, that's a very different number. Yes, assuming your docs are web pages, a
single reasonably equipped machine should be able to handle that and a few
dozen QPS.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dietrich [EMAIL PROTECTED]
To:
In fact, 55m records works fine in Solr; assuming they are small records.
The problem is that the index files wind up in the tens of gigabytes. The
logistics of doing backups, snapping to query servers, etc. is what makes
this index unwieldy, and why multiple shards are useful.
Lance
Sounds like SOLR-303 is a must for you. Have you looked at Nutch?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dietrich [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, March 25, 2008 4:15:23 PM
Subject: How to index
On Tue, Mar 25, 2008 at 6:12 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Sounds like SOLR-303 is a must for you.
Why? I see the benefits of using a distributed architecture in
general, but why do you recommend it specifically for this scenario.
Have you looked at Nutch?
I don't want to (or
Dietrich,
I pointed to SOLR-303 because 275 * 200,000 looks like a too big of a number
for a single machine to handle.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Dietrich [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday,