Freenet 0.7.5 build 1248 is out, *AND* a new version of Library and a new 
Spider plugin, which effectively enable use of the new index format that 
infinity0 developed last summer (XMLSpider is also updated; XMLSpider only 
supports making old format indexes, Spider only supports making new format 
indexes).

1248 major changes:
- HTML filter now avoids rearranging attributes if it can.
- Make announcement happen a little earlier.
- Run web interface threads at higher priority when possible.
- Re-add the 3 actively maintained developer flogs.
- Russian translation update.
- Lots of internal changes for plugins.
- Logger changes, we now buffer for a configurable period, defaulting to 60 
seconds.
- Code cleanups.
- Require the new Library plugin, upgrade XMLSpider if loaded, and offer Spider.
- More datastore stats.

Thanks to:
toad
hungrid
p01air3
xor
sajack
zidel
nikotyan

The new version of Library and new Spider plugin together allow creating new 
format indexes by spidering Freenet, in much the same way as XMLSpider allows 
creating old format indexes by spidering Freenet. The new index format is far 
more scalable than the old format, and should improve performance both of the 
spider and the client (Library) doing the search. It has other benefits which I 
have discussed elsewhere e.g. it is possible to fork a new format index, 
merging your data to somebody else's index to create a whole new index without 
having to reinsert everything.

New format indexes are automatically inserted by Library on behalf of Spider. 
Data from Spider is first written to library.index.data.<number>, then merged 
into an on-disk index in library-temp-index-<number>, then merged from disk to 
Freenet. Only the data that has changed is uploaded, and once the index has 
been updated, a USK is inserted, which is logged in wrapper.log (you may have 
to grep for it, it's rather chatty).

New format indexes include ranking data, and are functional now in much the 
same way as old format indexes are, although they should be faster and the user 
interface during loading isn't quite the same.

Both new and old format indexes now index numbers of 3 characters or more. Also 
as of the work from the previous build we have reasonable Chinese support.

To use the new Spider, you need to load the Spider plugin (not XMLSpider); 
Library will already have been upgraded by installing 1248. Then configure it. 
The only essential options are the number of requests to run, which are two 
options right at the top. You can also configure keyword blacklists if you 
want; these are useful both for shameless censorship (or being able to sleep at 
night depending on your point of view) but also for excluding known spider 
traps and spam sites. The code will do pretty much all the rest of the work, as 
you can see by watching wrapper.log. It will spider Freenet, write the results 
to disk, then merge the big on-disk trees to Freenet, then upload a USK for 
each one. All of this happens in parallel. Your first USK should be uploaded 
within a day or two. Expect the process to slow down significantly over time as 
the index gets bigger (currently my test indexes are taking 7 hours to finish 
uploading but I expect it will be several days when it gets a bit bigger) - but 
then the spider itself slows down significantly over time as it runs out of 
easily findable stuff. I have been testing with max memory set to 1024 (1GB) in 
wrapper.conf so that should be enough. Note also that we cache everything we 
upload in library-spider-pushed-data-cache/ , which is currently not garbage 
collected; you can remove it but then you are relying on the data being 
retrievable when we need to update it.

A note to would-be standard index authors:

YOU MUST BE ANONYMOUS. I have been running the spider for testing purposes but 
I will NOT publish any produced indexes. You should publish anonymously, and 
take your anonymity very seriously, as you are a clear single point of 
attack/vulnerability; if you are taken down this affects the experience of 
Freenet for everyone but especially newbies. Create an identity (in Frost, FMS 
or Freetalk) solely for announcing your index and post the USK there. When an 
index is popular enough and complete and consistent enough, we will add it to 
the default indexes list.

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Support mailing list
Support@freenetproject.org
http://news.gmane.org/gmane.network.freenet.support
Unsubscribe at http://emu.freenetproject.org/cgi-bin/mailman/listinfo/support
Or mailto:support-requ...@freenetproject.org?subject=unsubscribe

Reply via email to