Re: [Tech] Keyword searching on Freenet (reprise?)

Jeremy Smith Mon, 09 Sep 2002 14:19:38 -0700

> I'm kinda confused as to what you're asking.

I'll try and explain a bit below.


>  > Does the Freenet architecture allow for keyword searches of the text
> bodies of the
> >documents on there? I think this would really popularise anonymous
> >publishing networks
> >by allowing people to search other than URLs.
> 
> I agree.  It would.  I don't think there's currently a way to search
> Freenet at all.  Currently, if you want to know what's out there, you have
> to have somebody else tell you, either by posting to Frost ("my freesite is
> at such-and-such") or by submitting the freesite to The Freedom Engine
> freesite.  There's a file search option on Frost.  I'm not sure how it
> works, but it only allows for searching by file name.  One could submit an
> index file along with whatever document being inserted.  People actually do
> this, with images at least.

This sounds like an interesting idea, the index file going with the document. But if 
it's encrypted, then it's going to be hard to search using it.

> >Quick background on searching by keyword is, to make it less processor
> >intensive than
> >loading up all the files and searching through them one at a time is, you
> >make an index
> >file first of all those files, which merely indicates that the word
> >"rhubarb" is in file
> >10 ("rhubarb.txt"), the index is small and quick to access.
> 
> Perhaps I'm missing something.  It seems to me that if you wanted to index
> the words that appear in a given text document, you'd have something only
> marginally smaller than the original document, unless you come up with a
> way to index only important words, or words that represent the unique value
> of the document.

Next paragraph - technical warning ;-)

The simplest storage system is to have 1 bit per file, per word. So 8 files, with a 
total of 10 words between them (say about 6 letters long), would equal an index size 
of 
8 files / 8 bits (= 1 byte) * (6 letters * 10 words), which is about 60 bytes.

However, if you compress this index file with Gzip, it compresses to about 1/10th, 
which 
is pretty good. You can get about 38mb of text files indexed into an index of 1.2mb (I 
have one right here ;-)

> >Okay, so you publish the index on Freenet. Then what? The only way to use
> >it is to get
> >it and look at it, and it must point to documents on Freenet or on a
> >particular server
> >(usually it is tied up with a document).
> 
> It wouldn't point to any particular server.  The content of Freenet is
> location inspecific.  It floats in the ether.  In Freenet, a document is
> never tied up with a specific server.
> 
> >Then you can censor those documents as the
> >index will tell you what's in them, merely by going on keyword. I cannot
> >see a way round
> >this, and my system stores unencrypted documents and is self-censoring by
> >whoever stores
> >the information.
> 
> This is where you lost me.  What do you mean "you can censor those
> documents"?  Your system stores unencrypted documents?  Is it supposed
> to?  What do you mean by "self-censoring"?
> 
> You couldn't censor a document on Freenet.  The best you can do is just not
> request that document, yourself.  You couldn't, say, write code that would
> block that document from passing through your Freenet node, because there's
> no way to tell what document any given pile of bits is.

Well, I was just saying that, the only kind of system I can envisage with word/search 
indexes, would have to make the documents plaintext. This would make it possible for 
people to simply not host them.

However, there is probably a way to make the indexes encrypted too on Freenet,  but in 
a 
way that they searchable.

> >No politics here, Freenet is great, but what I want out of such a system
> >is keyword
> >searching of documents.
> 
> Freenet *is* great!
> 
> I think keyword searching could be implemented.  Not at the Freenet code
> level, though.  It'd be nice if the programs used to insert stuff into
> Freenet would (at the inserter's option) index inserted documents, and put
> that index into Freenet as well.

Well, I can make a document indexer available (I'm going to rewrite it in C++ I think) 
for use. It's very fast for even a lot of documents, and is only a few kb in size.

> Out of curiosity, what is this project you're working on?  Will it be free
> software?

Yes, it will. I don't really see the need to commercialise it. And the source, well, a 
lot of that I wrote a while ago.

Thanks for your comments,

Jeremy.


_______________________________________________
Tech mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org/cgi-bin/mailman/listinfo/tech

Re: [Tech] Keyword searching on Freenet (reprise?)

Reply via email to