Re: [Tech] freenet search

David Allen Fri, 27 Sep 2002 09:13:01 -0700

Hello, I'd like to respond to a few points about your searching scheme.  I
don't mean to criticize you, but I would like to point out that searching
freenet is a non-trivial thing, and AFAIK nobody has come up with a system
yet that doesn't have rather large drawbacks.


> I have come up with a scheme that would allow
> searching of the freenet network and as an added
> benefit works entirely with in the network. No
> external daemons or schemes are required to support
> this search method. It does have the draw back of
> falling victom to DSB and garbage collection just like
> all data on freenet. I believe the benefits will out
> weigh the drawbacks however. 

Well don't worry about the DSB - that's not a drawback to 
any system you propose, that's a drawback to the current freenet
datastore, and will hopefully be fixed soon by a new datastore.

<snip>

> The first step is to rip apart each page into distinct
> words. We are going to use each word as a freenet key
> and the data associated to each key is the list of
> locations that contained that word. In our simple
> example we would have the following key/value pairs:
> 
> hello: pageA pageB
> this: pageA pageB
> is: pageA pageB
> not: pageB
> a: pageA pageB
> test: pageA pageB
> 
> Of course in the real world examples the data would be
> complete freenet URIs. Once our lists are complete we
> compress the text files and insert them under a SSK to
> avoid tampering. A DBR could also be used to allow us
> to update the indexs on a set interval - probably on
> the order of a week or so to avoid killing the network
> with trafic.

Who controls the spider program?  Who are they, and why should
we trust them?  Freenet's stability and power is in it's decentralized
and anonymous nature.  This sounds like it would centralize the search
process, which would either create a bottleneck, a single point of attack,
or both.

> When a client wants to perform a search they simply
> request freenet keys. Lets say Frank the freenet user
> wants to search for "not test". He is going to request
> the following keys:
> 
> SSK@abc/index/not and then SSK@abc/index/test - all of
> the URIs that are common between them are the results
> of his search. This offloads all boolean operations
> and complex search operation onto the client and uses
> freenet only as a storage medium for the indexs - very
> favorable. 

Well, I assume that the spider would end up filtering common
words out of pages, otherwise you'd end up inserting more than
200,000 keys into this 'index' subspace since there would be 
many, many, many unique words.

But what if I want to search for 'TEST'?  or 'tESt'?  What if
what I'm specifically looking for is '  test'?  (With two spaces)
If you upload the indexes case insensitive, then that kinda screws
the people who need case sensitivity.  If you upload the indexes
case sensitive, that screws people who need case insensitivity.  If
you upload both, well then it's good that you're only proposing indexing
once every week, because it's going to take that bloody long just to 
insert all of the keys.

> This search scheme could be implemented easily as
> external applications that request information from
> the node or could even be integrated into fproxy. It
> requires no modification to the node or existing
> infrastructure and is fairly simple to implement. Best
> of all it fills a need that has existed on freenet
> since it's creation. I hope there is sufficent
> interest to pull this project off.

I think it's a good idea if you can find somebody that everybody can
trust, which you can't.  People don't even trust the authors of the 
software.  If the index were compromised, then the new operator could
put whatever he wanted in there, which could be potentially very bad.

For a localized freenet network where having one central authority is
acceptable, I think this strategy would work fine.  But for the wider
public freenet where no one trusts anyone else, I think you'd have a 
hard time selling this.  Again, the entire point of freenet is to put
the control of the network in the hands of....nobody.  Somebody has to
have that public/private keypair for the index SSK.  Somebody has to 
run the spider and upload the requisite keys for searching.  Building
the trust in that somebody is pretty much impossible, since even if it
were Ian Clarke, he would be vulnerable to takeover by forces who don't
want freenet used, or he could simply be thrown in jail.

Any search idea that will be acceptable to the public anonymous network
is probably going to have to be controlled by nobody, like the rest of
the network.  And that's really hard - which is probably why we don't
already have a search mechanism.  :)

-- David


_______________________________________________
Tech mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech

Re: [Tech] freenet search

Reply via email to