[Tech] Semi-distributed searching (and maybe fully distributed too)

Dan Stevens Wed, 28 Jul 2010 23:00:21 -0500

Michael Rogers wrote:
> On 28/07/10 18:29, Matthew Toseland wrote:
>   
>>> Maybe we could do something even simpler than that. Each indexer
>>> publishes her index in the form USK at blah,blah/index/123/keyword.
>>> Retrieve only the keywords you're interested in, from only the indexers
>>> you trust.
>>>       
>> That's what we do now. It doesn't scale.
>>     
>
> I thought the current scheme used one file per indexer, containing all
> the indexer's words, whereas I'm suggesting one file per word per
> indexer. Or does that come to the same thing due to the use of containers?
>
> Cheers,
> Michae
The site could be constructed to insert large keyword indexes separately 
using redirects, and consolidate small indexes (for uncommon keywords) 
inside a common container.  Whether or not this would be more effective 
or efficient obviously depends on how large your indexes are and how 
much data could be shared between indexes for different keywords.  I can 
imagine a search for 'techno music' being [relatively] fast, as it would 
only have to load the /techno and /music indexes (relatively small 
files, and relatively common search terms, as search terms go) from each 
index site rather than the entire index.  Of course this only makes 
sense if you're using a small number of index sites, the bottleneck when 
running searches is the time it takes to download large index files, and 
the time it takes to download the index for one word is a very small 
fraction of the time it takes to download the entire index.  If indexes 
are small you'd probably lose more performance this way than you gain 
(relative to the current scheme) due to uncommonly searched words not 
having their indexes dispersed throughout the network.


Disclaimer:  I haven't been keeping much attention on the Freenet search 
party, so I may be totally off on what about the current scheme doesn't 
scale well.  :)

[Tech] Semi-distributed searching (and maybe fully distributed too)

Reply via email to