Hello, I'd like to respond to a few points about your searching scheme. I don't mean to criticize you, but I would like to point out that searching freenet is a non-trivial thing, and AFAIK nobody has come up with a system yet that doesn't have rather large drawbacks.
> I have come up with a scheme that would allow > searching of the freenet network and as an added > benefit works entirely with in the network. No > external daemons or schemes are required to support > this search method. It does have the draw back of > falling victom to DSB and garbage collection just like > all data on freenet. I believe the benefits will out > weigh the drawbacks however. Well don't worry about the DSB - that's not a drawback to any system you propose, that's a drawback to the current freenet datastore, and will hopefully be fixed soon by a new datastore. <snip> > The first step is to rip apart each page into distinct > words. We are going to use each word as a freenet key > and the data associated to each key is the list of > locations that contained that word. In our simple > example we would have the following key/value pairs: > > hello: pageA pageB > this: pageA pageB > is: pageA pageB > not: pageB > a: pageA pageB > test: pageA pageB > > Of course in the real world examples the data would be > complete freenet URIs. Once our lists are complete we > compress the text files and insert them under a SSK to > avoid tampering. A DBR could also be used to allow us > to update the indexs on a set interval - probably on > the order of a week or so to avoid killing the network > with trafic. Who controls the spider program? Who are they, and why should we trust them? Freenet's stability and power is in it's decentralized and anonymous nature. This sounds like it would centralize the search process, which would either create a bottleneck, a single point of attack, or both. > When a client wants to perform a search they simply > request freenet keys. Lets say Frank the freenet user > wants to search for "not test". He is going to request > the following keys: > > SSK@abc/index/not and then SSK@abc/index/test - all of > the URIs that are common between them are the results > of his search. This offloads all boolean operations > and complex search operation onto the client and uses > freenet only as a storage medium for the indexs - very > favorable. Well, I assume that the spider would end up filtering common words out of pages, otherwise you'd end up inserting more than 200,000 keys into this 'index' subspace since there would be many, many, many unique words. But what if I want to search for 'TEST'? or 'tESt'? What if what I'm specifically looking for is ' test'? (With two spaces) If you upload the indexes case insensitive, then that kinda screws the people who need case sensitivity. If you upload the indexes case sensitive, that screws people who need case insensitivity. If you upload both, well then it's good that you're only proposing indexing once every week, because it's going to take that bloody long just to insert all of the keys. > This search scheme could be implemented easily as > external applications that request information from > the node or could even be integrated into fproxy. It > requires no modification to the node or existing > infrastructure and is fairly simple to implement. Best > of all it fills a need that has existed on freenet > since it's creation. I hope there is sufficent > interest to pull this project off. I think it's a good idea if you can find somebody that everybody can trust, which you can't. People don't even trust the authors of the software. If the index were compromised, then the new operator could put whatever he wanted in there, which could be potentially very bad. For a localized freenet network where having one central authority is acceptable, I think this strategy would work fine. But for the wider public freenet where no one trusts anyone else, I think you'd have a hard time selling this. Again, the entire point of freenet is to put the control of the network in the hands of....nobody. Somebody has to have that public/private keypair for the index SSK. Somebody has to run the spider and upload the requisite keys for searching. Building the trust in that somebody is pretty much impossible, since even if it were Ian Clarke, he would be vulnerable to takeover by forces who don't want freenet used, or he could simply be thrown in jail. Any search idea that will be acceptable to the public anonymous network is probably going to have to be controlled by nobody, like the rest of the network. And that's really hard - which is probably why we don't already have a search mechanism. :) -- David _______________________________________________ Tech mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech
