There were two main reasons we went with multi-core solution,

1) We found the indexing speed starts dipping once the index grow to a
certain size - in our case around 50G. We don't optimize, but we have
to maintain a consistent index speed. The only way we could do that
was keep creating new cores (on the same box, though we do use
multiple boxes to scale horizontally as well) once it reaches its
capacity. The old core is not written to again once it reaches its
capacity.

2) Be able to drop the whole core for pruning purposes. We didn't want
to delete records from the index, so the best solution was to simply
delete the complete core directory (we do maintain the time period for
each core), which is much faster and easy to maintain.

So far things have been working fine. I'm not sure if there is any
inherent problem with this architecture given the above limitations
and requirements.

-vivek

On Tue, Aug 25, 2009 at 10:57 AM, Lance Norskog<goks...@gmail.com> wrote:
> One problem is the IT logistics of handling the file set. At 200 million
> records you have at least 20G of data in one Lucene index. It takes hours to
> optimize this, and 10s of minutes to copy the optimized index around to
> query servers.
> Another problem is that indexing speed drops off after the index reaches a
> certain size. When making multiple indexes, you want to stop indexing before
> that size.
> Lance
>
> On Tue, Aug 25, 2009 at 10:44 AM, Chris Hostetter
> <hossman_luc...@fucit.org>wrote:
>
>>
>> :   We're doing similar thing with multi-core - when a core reaches
>> : capacity (in our case 200 million records) we start a new core. We are
>> : doing this via web service call (Create web service),
>>
>> this whole thread perplexes me ... while i can understand not wanting to
>> let an index grow without bound becuase of hardware limitation, i don't
>> understand what value you are gaining by creating a new core on the same
>> box -- you're using the same physical resources to search the same number
>> of documents, making multiple cores for this actaully seems like it would
>> take up *more* resources to search the same amount of content, because the
>> individual cores will be isolated and the term dictionaries can't be
>> shared (not to mention you have to do a multi-shard query to get results
>> from all the cores)
>>
>> are you doing something special with the old cores vs the new ones? (ie:
>> create the new cores on new machines, shutdown cores after a certian
>> amount of time has expired, etc...)
>>
>>
>> : > Hi there,
>> : >
>> : > currently we want to add cores dynamically when the active one reaches
>> : > some capacity,
>> : > can anyone give me some hints to achieve such this functionality? (Just
>> : > wondering if you have used shell-scripting or you have code some 100%
>> : > Java based solution)
>> : >
>> : > Thx
>> : >
>> : >
>> : > --
>> : > Lici
>> : >
>> :
>>
>>
>>
>> -Hoss
>>
>>
>
>
> --
> Lance Norskog
> goks...@gmail.com
>

Reply via email to