Hi, I worked on and developed a distributed search engine for Freenet called FASD (Fast anonymous scalable distributed search). It automatically builds and distributes metadata information throughought a Freenet network. FASD adapts over time so that queries are routed to machines that store metadata about the relevant documents. I will post the paper and all the relevant documentation tonight so that you can take a look. The abstract is as follows:
This paper introduces FASD, a fault-tolerant, adaptive, scalable, and distributed search layer designed to augment existing peer-to-peer applications. The FASD layer operates as a network of identical nodes that collectively pool their storage space to cache "metadata keys" and cooperatively route queries to the nodes most likely to satisfy them. A "metadata key" is a list of weighted terms that describe the information content of a document in the underlying network. Although completely decentralized, FASD's approach is able to efficiently match the recall and precision of a centralized search engine. Simulation results indicate that latency and bandwidth consumption scale logarithmically with the size of a FASD network. Regards, Amr -----Original Message----- From: Jeremy Smith [mailto:[EMAIL PROTECTED]] Sent: Sunday, September 08, 2002 5:36 PM To: [EMAIL PROTECTED] Subject: [Tech] Keyword searching on Freenet (reprise?) Hi! Does the Freenet architecture allow for keyword searches of the text bodies of the documents on there? I think this would really popularise anonymous publishing networks by allowing people to search other than URLs. I am working on a system which will allow for total anonymity and keyword searching. However, before I start work on writing this system, I would like to know if the alternatives could feasibly offer this. Quick background on searching by keyword is, to make it less processor intensive than loading up all the files and searching through them one at a time is, you make an index file first of all those files, which merely indicates that the word "rhubarb" is in file 10 ("rhubarb.txt"), the index is small and quick to access. Okay, so you publish the index on Freenet. Then what? The only way to use it is to get it and look at it, and it must point to documents on Freenet or on a particular server (usually it is tied up with a document). Then you can censor those documents as the index will tell you what's in them, merely by going on keyword. I cannot see a way round this, and my system stores unencrypted documents and is self-censoring by whoever stores the information. However, if a document is sufficiently censored, you can publish off your own machine, anonymously, but really that's not so hot if the bad guys knock on the door (although how the bad guys would *know* to knock on your door, I don't know). Not giving too much away at this stage because it's not all worked out yet, but randomising initial number of hops (a random number of 2-3 hops would make the chance that the host before is the originator about 25%, which would make it impractical to check if a certain host is a publisher) is my basic anonymising idea, as is passing along packets that have been read (to give the impression to any omnipotent network overseer that the packet is not for you - it eventually times out in the network when the number of hops hits a certain number). In a nutshell, it's like Gnutella with encryption and where the document is passed along like search results, except with random initial hops. Again, Freenet has its purpose, as does Publius, but mine in theory covers other goals and aims. It shouldn't be seen as a replacement, but it is (will be!) a simple protocol. No politics here, Freenet is great, but what I want out of such a system is keyword searching of documents. Anyway, I'm just posting here to ask about the keyword searching on Freenet. Maybe there's some ultra-clever way of doing it without giving away the contents of the documents, but I can't see it myself. Search engines like Altavista contributed to making the web a huge hit, it's an important aspect of a publishing system. Jeremy. PS. I have checked the archives and haven't found anything on keyword searches. I wish those archives were searchable! PPS. I will of course be writing this program myself, although maybe with some help on the underlying crypto to prevent a rubbish system. _______________________________________________ Tech mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech _______________________________________________ Tech mailing list [EMAIL PROTECTED] http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech
