Hi,

I worked on and developed a distributed search engine for Freenet called
FASD (Fast anonymous scalable distributed search).  It automatically builds
and distributes metadata information throughought a Freenet network.  FASD
adapts over time so that queries are routed to machines that store metadata
about the relevant documents.  I will post the paper and all the relevant
documentation tonight so that you can take a look. The abstract is as
follows:

This paper introduces FASD, a fault-tolerant, adaptive, scalable, and
distributed search layer designed to augment existing peer-to-peer
applications.  The FASD layer operates as a network of identical nodes that
collectively pool their storage space to cache "metadata keys" and
cooperatively route queries to the nodes most likely to satisfy them. A
"metadata key" is a list of weighted terms that describe the information
content of a document in the underlying network. Although completely
decentralized, FASD's approach is able to efficiently match the recall and
precision of a centralized search engine.  Simulation results indicate that
latency and bandwidth consumption scale logarithmically with the size of a
FASD network.


Regards,

Amr

-----Original Message-----
From: Jeremy Smith [mailto:[EMAIL PROTECTED]]
Sent: Sunday, September 08, 2002 5:36 PM
To: [EMAIL PROTECTED]
Subject: [Tech] Keyword searching on Freenet (reprise?)


Hi!

Does the Freenet architecture allow for keyword searches of the text bodies
of the
documents on there? I think this would really popularise anonymous
publishing networks
by allowing people to search other than URLs.

I am working on a system which will allow for total anonymity and keyword
searching.
However, before I start work on writing this system, I would like to know if
the
alternatives could feasibly offer this.

Quick background on searching by keyword is, to make it less processor
intensive than
loading up all the files and searching through them one at a time is, you
make an index
file first of all those files, which merely indicates that the word
"rhubarb" is in file
10 ("rhubarb.txt"), the index is small and quick to access.

Okay, so you publish the index on Freenet. Then what? The only way to use it
is to get
it and look at it, and it must point to documents on Freenet or on a
particular server
(usually it is tied up with a document). Then you can censor those documents
as the
index will tell you what's in them, merely by going on keyword. I cannot see
a way round
this, and my system stores unencrypted documents and is self-censoring by
whoever stores
the information.

However, if a document is sufficiently censored, you can publish off your
own machine,
anonymously, but really that's not so hot if the bad guys knock on the door
(although
how the bad guys would *know* to knock on your door, I don't know). Not
giving too much
away at this stage because it's not all worked out yet, but randomising
initial
number of hops (a random number of 2-3 hops would make the chance that the
host before
is the originator about 25%, which would make it impractical to check if a
certain host 
is a publisher) is my basic anonymising idea, as is passing along packets
that have been 
read (to give the impression to any omnipotent network overseer that the
packet is not 
for you - it eventually times out in the network when the number of hops
hits a certain 
number).

In a nutshell, it's like Gnutella with encryption and where the document is
passed along 
like search results, except with random initial hops.

Again, Freenet has its purpose, as does Publius, but mine in theory covers
other goals
and aims. It shouldn't be seen as a replacement, but it is (will be!) a
simple protocol.
No politics here, Freenet is great, but what I want out of such a system is
keyword 
searching of documents.

Anyway, I'm just posting here to ask about the keyword searching on Freenet.
Maybe
there's some ultra-clever way of doing it without giving away the contents
of the
documents, but I can't see it myself. Search engines like Altavista
contributed to
making the web a huge hit, it's an important aspect of a publishing system.

Jeremy.

PS. I have checked the archives and haven't found anything on keyword
searches. I wish 
those archives were searchable!

PPS. I will of course be writing this program myself, although maybe with
some help on 
the underlying crypto to prevent a rubbish system.

_______________________________________________
Tech mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech

_______________________________________________
Tech mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech

Reply via email to