On Tue, 9 Oct 2007 10:12:51 -0400
"David Whalen" <[EMAIL PROTECTED]> wrote:

> So, how would you build it if you could?  Here are the specs:
> 
> a) the index needs to hold at least 25 million articles
> b) the index is constantly updated at a rate of 10,000 articles
> per minute
> c) we need to have faceted queries

Hi David,
Others with more experience than I have given you good answers , so I won't go
there....

One thing you want to consider when you have lots of ongoing updates is,
how fast do you want your latest changes to show up in your results. 

Yes, everyone wants the latest to be live the second it hits the index, but 
balancing that act with having a responsive search within certain budget (and
architectural, maybe ? ) constrains isn't always that easy. 

In all seriousness, not everyone is  in a situation where every one of their
users would really need (or benefit hugely) from having each of the 200 docs
posted in the last second come up the ms. they hit "Search". Can they tell if
it was posted within the last 3, 5 or 10 minutes?

I think that tuning the  values for cache warming should yield some good
results. Your probably don't want to have all your searches held until your
cache fully warms...or have to warm too often.

I was thinking that you could even split your indexes, have the latest entries
on smaller, faster index,and the rest of your 25M in another index which gets
updated , say, hourly. But if you have 10K updates (not new docs, but
changes),then maybe the idea of splitting the index is not that useful...

anyway, there many ways to skin a cat :)

good luck,
B
_________________________
{Beto|Norberto|Numard} Meijome

"Everything is interesting if you go into it deeply enough"
  Richard Feynman

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.

Reply via email to